utils

cmtcatalog

dsmpy.utils.cmtcatalog.read_catalog()

Get the GCMT catalog. :returns: ndarray of pydsm.Event objects :rtype: cat (ndarray)

modelutils

Utilities to build various model meshes.

dsmpy.utils.modelutils.single_layer_dpp()

Create objects for a single-layer D’’ model.

Returns

reference model ModelParameters: model parameters dict: range dict

Return type

SeismicModel

scardec

dsmpy.utils.scardec.get_stf(event)

Returns a source time function in time domain.

Parameters

event (Event) – event

Returns

source time function, normalized so

that its integral is 1. The shape is (2, npts).

Return type

ndarray

sklearnutils

dsmpy.utils.sklearnutils.get_XY(model, dataset, windows, tlen, nspc, freq, freq2, filter_type='bandpass', sampling_hz=5, var=2.5, ratio=2.5, corr=0.0, phase_ref=None, buffer=10.0, mode=0) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Compute the feature matrix X and target vector y to be used as input to scikit-learn linear models.

X and y are linked by the equation Xm = y, where m is the model parameter vector. The order for m is given by the order in SeismicModel.gradient_models(), and is: [[radial_nodes_for_type_1] + [radial_nodes_for_type_2] + …].

This method should be able to scale to large dataset, since the computations are done in the frequency domain (typically approx. 256 to 512 np.Complex64 per synthetic), the transformation to time domain is done event by event (the data is freed after), and the gradient matrix X contains windowed time series with typically a few hundreth to thousands of floats. Furthermore, only the frequency-domain synthetics are replicated on all cores. All the time domain operations, as well as X and y are defined on thread 0 only. For instance, 10,000 records sampled at 5 Hz for 50 s windows for one seismic component with 100 model parameters should not take more than approx. 1e4 * 5 * 50 * 101 * 6.4e-8 = 16.2 Gb.

Parameters
  • model (SeismicModel) – model at which the gradient is evaluated. Must be a mesh and have model._model_params not None.

  • dataset (Dataset) – dataset

  • windows (list of Window) – time windows

  • tlen (float) – length of time series for synthetics

  • nspc (int) – number of points in frequency domain for synthetics

  • sampling_hz (int) – sampling frequency of synthetics in time domain. Better to divide 20.

  • var (float) – variance cutoff. Records with variance > var will be excluded (default is 2.5).

  • ratio (float) – amplitude ratio cutoff. Records with 1/(obs/syn) < ratio or obs/syn > ratio will be excluded (default is 2.5).

  • corr (float) – correlation coefficient cutoff. Records with correlation < corr will be excluded (default is 0).

  • phase_ref (str) – reference phase for static correction (default is None).

  • buffer (float) – time buffer in seconds for static correction (default is 10).

  • mode (int) – commputation mode. 0: P-SV + SH, 1: P-SV, 2: SH

Returns

X, the waveform gradient with respec to model.

The shape is (n_time_points, n_model_parameters).

np.ndarray: y, the waveform residual vector.

The shape is (n_time_points,).

Return type

np.ndarray

dsmpy.utils.sklearnutils.misfits(data, syn)

Returns variance, corr, ratio.