dataset

class dsmpy.dataset.Dataset(lats, lons, phis, thetas, eqlats, eqlons, r0s, mts, nrs, stations, events, data=None, sampling_hz=20, is_cut=False)

Represents a dataset of events and stations.

The data array is not None only if the dataset was defined using Dataset.read_from_sac(headonly=False). In this case, the data array is of shape (1, 3, n_records, npts), where n_records is the number of seismic records, or event-station pairs, and npts is the number of time points for the longest record. Dimension 1 corresponds to the 3 seismic components (Z, R, T). Dimension 0 has length >= 1 only after dataset.apply_windows(). In this case, dimension 0 encodes the number of time windows (i.e., the number of different phases).

Parameters
  • lats (ndarray) – stations latitudes for each record (nr,).

  • lons (ndarray) – stations longitudes for each record (nr,).

  • phis (ndarray) – stations phis for each record (nr,).

  • thetas (ndarray) – stations thetas for each record (nr,).

  • eqlats (ndarray) – centroids latitudes (nev,).

  • eqlons (ndarray) – centroids longitudes (nev,).

  • r0s (ndarray) – centroids radii (nev,).

  • mts (ndarray of MomentTensor) – array of moment tensors (nev,).

  • nrs (ndarray of int) – number of stations for each event (nev,).

  • nr (int) – total number of event-station pairs.

  • stations (ndarray of Station) – seismic stations (nr,).

  • events (ndarray of Event)) – seismic events (nev,).

  • data (ndarray) – 3-components waveform data.

  • nw – number of windows used to cut the data (nw,3,nr,npts). If self.cut_data() hasn’t been called, then nw=1.

  • sampling_hz (int) – sampling frequency for data. Used for computation with pydsm.

append(dataset)

Append dataset to self.

apply_windows(windows, n_phase, npts_max, buffer=0.0, t_before_noise=100.0, inplace=True, shift=True)

Cut the data using provided windows.

Parameters
  • windows (list of Window) – time windows.

  • n_phase (int) – number of distinct seismic phase-component pairs: if ScS (SH) and ScS (SV), then n_phase=2.

  • npts_max (int) – number of time points in the longest window.

  • buffer (float) – default is 0.

  • t_before_noise (float) – default is 50.

  • inplace (bool) – if True, performs the operation in-place (i.e., modifies self.data)

  • shift (bool) – use the time shift coded into time windows (default is True).

Returns

if inplace is True, else None.

Return type

Dataset

copy()

Return a deep copy of self.

Returns

deep copy of self

Return type

Dataset

classmethod dataset_from_arrays(events, stations, sampling_hz=20)

Create a Dataset object from a list of events and stations. This dataset does not contain waveform data (self.data is None), and is used only to compute synthetics.

Parameters
  • events (iterable of Event) – earthquake events

  • stations (iterable of Station) – seismic stations

  • sampling_hz (float) – waveform sampling that will be inherited by the synthetics (default is 20)

Returns

Dataset

classmethod dataset_from_files(parameter_files, file_mode=1)

Create a Dataset object from a list of DSM input files. This dataset does not contain waveform data (self.data is None), and is used only to compute synthetics.

Parameters
  • parameter_file (str) – path of a DSM input file.

  • file_mode (int) – The kind of DSM input file. 1: P-SV, 2: SH.

Returns

Dataset

classmethod dataset_from_sac(sac_files, verbose=0, headonly=True, broadcast_data=False)

Creates a dataset from a list of sac files. With headonly=False, time series data from the sac_files will be stored in self.data.

For parallel applications using MPI, headonly=False (i.e., reading the data from sac files) only applies to rank 0, so as not to saturate the memory.

Parameters
  • sac_files (list of str) – list of paths to sac files.

  • verbose (int) – 0: quiet, 1: debug.

  • headonly (bool) – if True, read only the metadata. If False, includes data.

  • broadcast_data (bool) – default is False

Returns

dataset

Return type

Dataset

Examples

>>> sac_files = ['FCC.CN.201205280204A.T']
>>> dataset = Dataset.dataset_from_sac(
...        sac_files, headonly=False)
classmethod dataset_from_sac_process(sac_files, windows, freq, freq2, filter_type='bandpass', shift=True, verbose=0)

Creates a dataset from a list of sac files. Data are read from sac files, cut using the time windows, and stored in self.data. The sac file data are read and cut event by event, which allows to read large dataset.

This method should be used instead of dataset_from_sac() when large amount of data is to be read. It has the same effect of using dataset_from_sac() followed by apply_windows(), but is much more memory efficient. For instance,10,000 3-components records with 20 Hz sampling and 1 hour of recording take approx. 138 Gb in memory. The same dataset cut in 100 s windows around a single phase (e.g., ScS) takes approx 1.9 Gb in memory.

Parameters
  • sac_files (list of str) – list of paths to sac files.

  • windows (list of Window) – time windows

  • freq (float) – minimum filter frequency

  • freq2 (float) – maximum filter frequency

  • filter_type (str) – ‘bandpass’ or ‘lowpass’ (default is ‘bandpass’)

  • shift (bool) – use the time shift coded into time windows (default is True)

  • verbose (int) – 0: quiet, 1: debug.

Returns

dataset

Return type

Dataset

filter(freq, freq2=0.0, type='bandpass', zerophase=False, inplace=True)

Filter waveforms using obspy.signal.filter.

Parameters
  • freq (float) – filter frequency.

  • freq2 (float) – filter maximum frequency. For bandpass filters only.

  • type (str) – type of filter. ‘lowpass’ or ‘bandpass’.

  • zerophase (bool) – use zero phase filter.

  • inplace (bool) – if True, performs the operation in-place (i.e., modifies self.data).

Returns

if inplace is True, else None

Return type

Dataset

get_bounds_from_event_index(ievent: int) -> (<class 'int'>, <class 'int'>)

Return the start, end indices to slice self.stations[start:end].

Parameters

ievent (int) – index of the event as in self.events

Returns

index of the first station recording event ievent int: index of the last station

Return type

int

plot_event(ievent, windows=None, align_zero=False, component=<Component.T: 2>, ax=None, dist_min=0, dist_max=360, shift=True, **kwargs)

Plot a record section for event ievent.

Parameters
  • ievent (int) – index of the event as in self.events

  • windows (list of Window) – time windows used to cut the waveforms if specified (default is None)

  • align_zero (bool) – if True, set the start of time windows as t=0 (default is False)

  • component (Component) – seismic component (default is Component.T)

  • ax (Axes) – matplotlib Axes object

  • dist_min (float) – minimum epicentral distance (default is 0)

  • dist_max (float) – maximum epicentral distances (default is 360)

  • shift (bool) – use shift coded into time windows (default is True)

  • **kwargs – key-value arguments for the pyplot.plot function

Returns

matplotlib Figure object Axes: matplotlib Axes object

Return type

Figure

set_source_time_functions(type, catalog_path=None)

Set the catalog for source time functions. By default, source time functions specified in the GCMT catalog are used.

Parameters
  • type (str) – ‘scardec’ or ‘user’

  • catalog_path – path to a custom catalog. Must be specified if type=’user’

split(n: int)

Split self into n datasets.

Parameters

n – number of datasets into which to split

Returns

n datasets

Return type

list of Datastet

dsmpy.dataset.filter_abnormal_data(sac_files, f, threshold=5)

Filter sac data using the boolean function f.

Parameters
  • sac_files (list of str) – paths to sac files

  • f (function) – (event_id: str, station: Station) -> bool

  • threshold (float) – number of standard deviations of the distribution of the log of max of data within which to keep the data (default is 5).

Returns

filtered list of paths to sac files

Return type

list of str

dsmpy.dataset.filter_sac_files(sac_files, f)list

Filter sac files using the boolean function f.

Parameters
  • sac_files (list of str) – paths to sac files

  • f (function) – (event_id: str, station: Station) -> bool

Returns

filtered list of paths to sac files

Return type

list of str

dsmpy.dataset.get_event_id(trace)

Return event GCMT ID from obspy Trace.

Parameters

trace (Trace) – obspy Trace object

Returns

GCMT ID

Return type

str

dsmpy.dataset.get_station(trace)

Return Station object from obspy Trace.

Parameters

trace (Trace) – obspy Trace object

Returns

station

Return type

Station

dsmpy.dataset.read_sac_from_windows(sac_files: list, windows: list, headonly=False)list
Parameters
  • sac_files (list of str) – paths to potential SAC files. Only the files contained in windows will be read

  • windows (list of Window) – time windows indicating which SAC files should be read

Returns

traces from SAC files contained in windows list of str: SAC files contained in windows windows: the windows which had SAC files

Return type

list of obspy traces

dsmpy.dataset.read_sac_meta(sac_files: list)list

Returns a list of dict with SAC and other metadata.

The available keys are: ‘stnm’, ‘netwk’, ‘stla’, ‘stlo’, ‘evnm’, ‘evla’, ‘evlo’, ‘evdp’, ‘stcount’, ‘evcount’. evcount and stcount give, for each record, the number of times that event appears in other records, and the number of times that station appears in other records, respectively.

Parameters

sac_files (list of str) – list of paths to sac files.

Returns

list of traces: list of obspy traces

Return type

list of dict

dsmpy.dataset.read_traces(sac_files: list)list

Return a list of obspy traces read from the sac files without including waveform data.

Parameters

sac_files (list of str) – list of paths to SAC files

Returns

list of obspy traces without data

Return type

list of Trace