dataset¶
- class dsmpy.dataset.Dataset(lats, lons, phis, thetas, eqlats, eqlons, r0s, mts, nrs, stations, events, data=None, sampling_hz=20, is_cut=False)¶
Represents a dataset of events and stations.
The data array is not None only if the dataset was defined using Dataset.read_from_sac(headonly=False). In this case, the data array is of shape (1, 3, n_records, npts), where n_records is the number of seismic records, or event-station pairs, and npts is the number of time points for the longest record. Dimension 1 corresponds to the 3 seismic components (Z, R, T). Dimension 0 has length >= 1 only after dataset.apply_windows(). In this case, dimension 0 encodes the number of time windows (i.e., the number of different phases).
- Parameters
lats (ndarray) – stations latitudes for each record (nr,).
lons (ndarray) – stations longitudes for each record (nr,).
phis (ndarray) – stations phis for each record (nr,).
thetas (ndarray) – stations thetas for each record (nr,).
eqlats (ndarray) – centroids latitudes (nev,).
eqlons (ndarray) – centroids longitudes (nev,).
r0s (ndarray) – centroids radii (nev,).
mts (ndarray of MomentTensor) – array of moment tensors (nev,).
nrs (ndarray of int) – number of stations for each event (nev,).
nr (int) – total number of event-station pairs.
stations (ndarray of Station) – seismic stations (nr,).
events (ndarray of Event)) – seismic events (nev,).
data (ndarray) – 3-components waveform data.
nw – number of windows used to cut the data (nw,3,nr,npts). If self.cut_data() hasn’t been called, then nw=1.
sampling_hz (int) – sampling frequency for data. Used for computation with pydsm.
- append(dataset)¶
Append dataset to self.
- apply_windows(windows, n_phase, npts_max, buffer=0.0, t_before_noise=100.0, inplace=True, shift=True)¶
Cut the data using provided windows.
- Parameters
windows (list of Window) – time windows.
n_phase (int) – number of distinct seismic phase-component pairs: if ScS (SH) and ScS (SV), then n_phase=2.
npts_max (int) – number of time points in the longest window.
buffer (float) – default is 0.
t_before_noise (float) – default is 50.
inplace (bool) – if True, performs the operation in-place (i.e., modifies self.data)
shift (bool) – use the time shift coded into time windows (default is True).
- Returns
if inplace is True, else None.
- Return type
- classmethod dataset_from_arrays(events, stations, sampling_hz=20)¶
Create a Dataset object from a list of events and stations. This dataset does not contain waveform data (self.data is None), and is used only to compute synthetics.
- Parameters
events (iterable of Event) – earthquake events
stations (iterable of Station) – seismic stations
sampling_hz (float) – waveform sampling that will be inherited by the synthetics (default is 20)
- Returns
Dataset
- classmethod dataset_from_files(parameter_files, file_mode=1)¶
Create a Dataset object from a list of DSM input files. This dataset does not contain waveform data (self.data is None), and is used only to compute synthetics.
- Parameters
parameter_file (str) – path of a DSM input file.
file_mode (int) – The kind of DSM input file. 1: P-SV, 2: SH.
- Returns
Dataset
- classmethod dataset_from_sac(sac_files, verbose=0, headonly=True, broadcast_data=False)¶
Creates a dataset from a list of sac files. With headonly=False, time series data from the sac_files will be stored in self.data.
For parallel applications using MPI, headonly=False (i.e., reading the data from sac files) only applies to rank 0, so as not to saturate the memory.
- Parameters
sac_files (list of str) – list of paths to sac files.
verbose (int) – 0: quiet, 1: debug.
headonly (bool) – if True, read only the metadata. If False, includes data.
broadcast_data (bool) – default is False
- Returns
dataset
- Return type
Examples
>>> sac_files = ['FCC.CN.201205280204A.T'] >>> dataset = Dataset.dataset_from_sac( ... sac_files, headonly=False)
- classmethod dataset_from_sac_process(sac_files, windows, freq, freq2, filter_type='bandpass', shift=True, verbose=0)¶
Creates a dataset from a list of sac files. Data are read from sac files, cut using the time windows, and stored in self.data. The sac file data are read and cut event by event, which allows to read large dataset.
This method should be used instead of dataset_from_sac() when large amount of data is to be read. It has the same effect of using dataset_from_sac() followed by apply_windows(), but is much more memory efficient. For instance,10,000 3-components records with 20 Hz sampling and 1 hour of recording take approx. 138 Gb in memory. The same dataset cut in 100 s windows around a single phase (e.g., ScS) takes approx 1.9 Gb in memory.
- Parameters
sac_files (list of str) – list of paths to sac files.
windows (list of Window) – time windows
freq (float) – minimum filter frequency
freq2 (float) – maximum filter frequency
filter_type (str) – ‘bandpass’ or ‘lowpass’ (default is ‘bandpass’)
shift (bool) – use the time shift coded into time windows (default is True)
verbose (int) – 0: quiet, 1: debug.
- Returns
dataset
- Return type
- filter(freq, freq2=0.0, type='bandpass', zerophase=False, inplace=True)¶
Filter waveforms using obspy.signal.filter.
- Parameters
freq (float) – filter frequency.
freq2 (float) – filter maximum frequency. For bandpass filters only.
type (str) – type of filter. ‘lowpass’ or ‘bandpass’.
zerophase (bool) – use zero phase filter.
inplace (bool) – if True, performs the operation in-place (i.e., modifies self.data).
- Returns
if inplace is True, else None
- Return type
- get_bounds_from_event_index(ievent: int) -> (<class 'int'>, <class 'int'>)¶
Return the start, end indices to slice self.stations[start:end].
- Parameters
ievent (int) – index of the event as in self.events
- Returns
index of the first station recording event ievent int: index of the last station
- Return type
int
- plot_event(ievent, windows=None, align_zero=False, component=<Component.T: 2>, ax=None, dist_min=0, dist_max=360, shift=True, **kwargs)¶
Plot a record section for event ievent.
- Parameters
ievent (int) – index of the event as in self.events
windows (list of Window) – time windows used to cut the waveforms if specified (default is None)
align_zero (bool) – if True, set the start of time windows as t=0 (default is False)
component (Component) – seismic component (default is Component.T)
ax (Axes) – matplotlib Axes object
dist_min (float) – minimum epicentral distance (default is 0)
dist_max (float) – maximum epicentral distances (default is 360)
shift (bool) – use shift coded into time windows (default is True)
**kwargs – key-value arguments for the pyplot.plot function
- Returns
matplotlib Figure object Axes: matplotlib Axes object
- Return type
Figure
- set_source_time_functions(type, catalog_path=None)¶
Set the catalog for source time functions. By default, source time functions specified in the GCMT catalog are used.
- Parameters
type (str) – ‘scardec’ or ‘user’
catalog_path – path to a custom catalog. Must be specified if type=’user’
- split(n: int)¶
Split self into n datasets.
- Parameters
n – number of datasets into which to split
- Returns
n datasets
- Return type
list of Datastet
- dsmpy.dataset.filter_abnormal_data(sac_files, f, threshold=5)¶
Filter sac data using the boolean function f.
- Parameters
sac_files (list of str) – paths to sac files
f (function) – (event_id: str, station: Station) -> bool
threshold (float) – number of standard deviations of the distribution of the log of max of data within which to keep the data (default is 5).
- Returns
filtered list of paths to sac files
- Return type
list of str
- dsmpy.dataset.filter_sac_files(sac_files, f) → list¶
Filter sac files using the boolean function f.
- Parameters
sac_files (list of str) – paths to sac files
f (function) – (event_id: str, station: Station) -> bool
- Returns
filtered list of paths to sac files
- Return type
list of str
- dsmpy.dataset.get_event_id(trace)¶
Return event GCMT ID from obspy Trace.
- Parameters
trace (Trace) – obspy Trace object
- Returns
GCMT ID
- Return type
str
- dsmpy.dataset.get_station(trace)¶
Return Station object from obspy Trace.
- Parameters
trace (Trace) – obspy Trace object
- Returns
station
- Return type
- dsmpy.dataset.read_sac_from_windows(sac_files: list, windows: list, headonly=False) → list¶
- Parameters
sac_files (list of str) – paths to potential SAC files. Only the files contained in windows will be read
windows (list of Window) – time windows indicating which SAC files should be read
- Returns
traces from SAC files contained in windows list of str: SAC files contained in windows windows: the windows which had SAC files
- Return type
list of obspy traces
- dsmpy.dataset.read_sac_meta(sac_files: list) → list¶
Returns a list of dict with SAC and other metadata.
The available keys are: ‘stnm’, ‘netwk’, ‘stla’, ‘stlo’, ‘evnm’, ‘evla’, ‘evlo’, ‘evdp’, ‘stcount’, ‘evcount’. evcount and stcount give, for each record, the number of times that event appears in other records, and the number of times that station appears in other records, respectively.
- Parameters
sac_files (list of str) – list of paths to sac files.
- Returns
list of traces: list of obspy traces
- Return type
list of dict
- dsmpy.dataset.read_traces(sac_files: list) → list¶
Return a list of obspy traces read from the sac files without including waveform data.
- Parameters
sac_files (list of str) – list of paths to SAC files
- Returns
list of obspy traces without data
- Return type
list of Trace