utils¶
cmtcatalog¶
- dsmpy.utils.cmtcatalog.read_catalog()¶
Get the GCMT catalog. :returns: ndarray of pydsm.Event objects :rtype: cat (ndarray)
modelutils¶
Utilities to build various model meshes.
- dsmpy.utils.modelutils.single_layer_dpp()¶
Create objects for a single-layer D’’ model.
- Returns
reference model ModelParameters: model parameters dict: range dict
- Return type
scardec¶
sklearnutils¶
- dsmpy.utils.sklearnutils.get_XY(model, dataset, windows, tlen, nspc, freq, freq2, filter_type='bandpass', sampling_hz=5, var=2.5, ratio=2.5, corr=0.0, phase_ref=None, buffer=10.0, mode=0) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶
Compute the feature matrix X and target vector y to be used as input to scikit-learn linear models.
X and y are linked by the equation Xm = y, where m is the model parameter vector. The order for m is given by the order in SeismicModel.gradient_models(), and is: [[radial_nodes_for_type_1] + [radial_nodes_for_type_2] + …].
This method should be able to scale to large dataset, since the computations are done in the frequency domain (typically approx. 256 to 512 np.Complex64 per synthetic), the transformation to time domain is done event by event (the data is freed after), and the gradient matrix X contains windowed time series with typically a few hundreth to thousands of floats. Furthermore, only the frequency-domain synthetics are replicated on all cores. All the time domain operations, as well as X and y are defined on thread 0 only. For instance, 10,000 records sampled at 5 Hz for 50 s windows for one seismic component with 100 model parameters should not take more than approx. 1e4 * 5 * 50 * 101 * 6.4e-8 = 16.2 Gb.
- Parameters
model (SeismicModel) – model at which the gradient is evaluated. Must be a mesh and have model._model_params not None.
dataset (Dataset) – dataset
windows (list of Window) – time windows
tlen (float) – length of time series for synthetics
nspc (int) – number of points in frequency domain for synthetics
sampling_hz (int) – sampling frequency of synthetics in time domain. Better to divide 20.
var (float) – variance cutoff. Records with variance > var will be excluded (default is 2.5).
ratio (float) – amplitude ratio cutoff. Records with 1/(obs/syn) < ratio or obs/syn > ratio will be excluded (default is 2.5).
corr (float) – correlation coefficient cutoff. Records with correlation < corr will be excluded (default is 0).
phase_ref (str) – reference phase for static correction (default is None).
buffer (float) – time buffer in seconds for static correction (default is 10).
mode (int) – commputation mode. 0: P-SV + SH, 1: P-SV, 2: SH
- Returns
- X, the waveform gradient with respec to model.
The shape is (n_time_points, n_model_parameters).
- np.ndarray: y, the waveform residual vector.
The shape is (n_time_points,).
- Return type
np.ndarray
- dsmpy.utils.sklearnutils.misfits(data, syn)¶
Returns variance, corr, ratio.