cdwave package¶
Submodules¶
cdwave.data module¶
- class cdwave.data.DataLoader(filepath: Optional[str] = None, log=None)¶
Bases:
object- parse()¶
This function should parse waveform data from files into self.waveforms
- transfer(validate=True, sort=False) cdwave.data.Dataset¶
Parse the waveform from tables and generate a dataset
- Returns
The dataset containing all the waveforms and parameters
- Return type
- class cdwave.data.Dataset(waveforms: Optional[List[cdwave.data.WaveformFull]] = None)¶
Bases:
objectA dataset contains a list of WaveformFull objects and a meta table with the information and parameters of the waveforms
- waveforms¶
A list of WaveformFull objects.
- dataframe¶
The meta table of the waveforms.
- filtered_df¶
A view of the meta table.
- filterable_columns¶
columns that can be used to filter the waveforms
- size¶
Number of waveforms in the dataset
- Parameters
waveforms – A list of WaveformFull objects
- static concat(datasets: list) cdwave.data.Dataset¶
- copy()¶
- property df¶
- property dtypes¶
- export_raw(filename=None, compression='infer')¶
Export the row data into a csv file with the columns of compound,concentration,well,plate,time,signal
- filter_by_filters(filters, replace=True)¶
- Parameters
filters – a dictionary of {column: value}
- Returns
filtered dataframe
- get_df() pandas.core.frame.DataFrame¶
- get_parameter_df() pandas.core.frame.DataFrame¶
Return a dataframe with all papameters for each wave
- static loaddata(filepath) cdwave.data.Dataset¶
- merge(dataset)¶
- save(filename, compress=True)¶
- class cdwave.data.SeqLoader(filepath: str, plate: Optional[str] = None, state: Optional[str] = None, opts=None, log=None)¶
Bases:
cdwave.data.DataLoader- parse()¶
This function should parse waveform data from files into self.waveforms
- transfer_with_meta(df: pandas.core.frame.DataFrame, columns=None) cdwave.data.Dataset¶
- class cdwave.data.StandardCSVLoader(filepath: Optional[str] = None, data: Optional[pandas.core.frame.DataFrame] = None, log=None)¶
Bases:
cdwave.data.DataLoaderA loader parsing “standard csv file”.
The format of the table file is like:
compound,concentration,well,plate,time,signal CP1,0.1,A1,P1,0,1000 CP1,0.1,A1,P1,0.33,1001 CP2,0.1,A2,P1,0,1000
- parse() List¶
This function should parse waveform data from files into self.waveforms
- class cdwave.data.WaveformFull(item, scale=True, window_length=5)¶
Bases:
objectWaveformFull contains all the information about a sample.
The information includes compound name, concentration or vendor, and all calculated parameters of a waveform.
- profile¶
A dictionary of The profile of the waveform, including: plate: The plate name form which the waveform is generated. compound: The compound name of the waveform. concentration: The concentration of the compound of the waveform. well: The well of the waveform. cpid: Compound id.
- signal¶
A dictionary with with keys, x and y, which are are time and calcium transient of the waveform.
- Type
dict
- state¶
The state of the waveform, such as before treatment and after treatment.
- parameters¶
A dictionary containing all the parameters of the waveform.
- Type
dict
- Parameters
item (dict) – A dictionary contains all the information of the waveform required keys: signals, plate, well, concentration. item[‘singal’] is a dictionary with times (x) and signals (y) {‘x’: [0.1, 0.2], ‘y’:[100, 101]}
scale (bool) – Whether to scale the minimum to 0. Default is True
window_length (int) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer. If set to 0, the waveform will not be smoothed.
Example
>>> item = {'signal': {'x': [0.1, 0.2], 'y': [100, 101]}, ... 'plate': 'P1', ... 'compound': 'cmp1', ... 'concentration': 0.01, ... 'well': 'A1'} >>> wave = WaveformFull(item)
- get_dict()¶
- get_parameters(fillna=0)¶
Return a dictionary with all parameters
- Parameters
fillna – If fill na is ‘raise’, then an exception will be raised when
values (a parameter has not been calculated. For other) –
used (it will be) –
parameter. (to fill the empty) –
- Returns
A dictionary with all parameters listed in parameter_names
- Return type
dict
- get_signal_series()¶
- static standardise_signal(signal: dict, scale=True, window_length=5)¶
- cdwave.data.get_wells()¶
cdwave.derive module¶
- cdwave.derive.calc_parameter(waveform: cdwave.data.WaveformFull) dict¶
Calculate parameters of a waveform
- cdwave.derive.calc_parameter_with_threshold(waveform: cdwave.data.WaveformFull, threshold, method='prominence') dict¶
Calculate parameters of a waveform
- cdwave.derive.calc_parameters_for_waveforms(dataset: cdwave.data.Dataset, process_fnc: Optional[collections.abc.Callable] = None, batch: int = 200, processes: Optional[int] = None, custom_calculator: Optional[collections.abc.Callable] = None)¶
Calculate parameter for waveforms
- Parameters
dataset – The waveform dataset.
process_fnc – A processing function used to send out the progress, see default_process_fnc, which uses tqdm
batch – Batch size for multi-processing.
processes – Number of processors.
custom_calculator – A custom calculator which can setup the custom thresholds. If None, calc_parameter will be used by default.
- cdwave.derive.calc_parameters_with_threshold(dataset: cdwave.data.Dataset, threshold: int = 100, processes: Optional[int] = None, custom_calculator: Optional[collections.abc.Callable] = None)¶
Calculate parameter for waveforms
- Parameters
dataset – The waveform dataset
threshold – The threshold
processes – Number of processors
custom_calculator – A custom calculator which can setup the custom thresholds. If None,
calc_parameter_with_threshold()will be used by default.
- cdwave.derive.default_process_fnc(status=- 1, total=0)¶
- cdwave.derive.derive_batch_bp_parameters(batch_bp: cdwave.fnc.BloodPressure)¶
- cdwave.derive.derive_bp_parameters(bp: cdwave.fnc.BloodPressure, start=0, end=None, processes=4)¶
cdwave.fnc module¶
- class cdwave.fnc.BloodPressure(data: pandas.core.series.Series, batch_window: int = 1800000, window_size: int = 10000, sample_interval: int = 2, point_interval: int = 1000)¶
Bases:
objectClass for blood pressure data analysis
- Parameters
data – A pandas series with time as index and blood pressure as value.
batch_window – Size of a batch to analyze (unit ms).
window_size – Size of the window to derive parameters.
sample_interval – The interval of sample. If None, it will be inferred by the first two data points.
point_interval – The inverval of points to get parameters.
- max_time¶
The maximum time in the data.
- calc_angle(start_time, tao)¶
- calc_attractor(df, tao: int)¶
- calc_hrv(window: pandas.core.frame.DataFrame)¶
Calculate heart rate variability
SD1 is the perpendicular distances of the points \((RR_{n}, RR_{n+1})\) to the line \(y=x\). SD2 is the points to the line \(y=-x + 2R_{m}\), where \(R_{m}\) is the mean of RR intervals.
See Computing in Cardiology 2014; 41:437-440.
- Parameters
window – A data frame of blood pressure window
- Returns
A tuple with \(R_{m}\), SD1, SD2, and SD1/SD2
- Return type
tuple
- get_batch_series(series: Optional[pandas.core.series.Series] = None) Iterator[cdwave.fnc.BloodPressure]¶
- get_start_times(batch_series: Optional[pandas.core.series.Series] = None) pandas.core.indexes.base.Index¶
- get_windows_generator() Tuple[int, Generator[pandas.core.frame.DataFrame, None, None]]¶
Return a generator of window data frame
The data frame has the following columns: SP, DP, PP, RR, time, time_diff
- run_filter()¶
- class cdwave.fnc.Waveform(series: pandas.core.series.Series, index_penalty: float = 1e-06)¶
Bases:
objectClass for waveform analysis
- Parameters
series (pd.Series) – A series of which the name is the well name, values are amplitudes.
index_penalty (float) – Add penalty to signals to prioritise former time point during peak detection.
- df¶
The main dataframe of the waveform, including several important columns, peak: 1: main peak, 2-x: double peaks; status: 0: normal, 1: raising point, 2: peak, 3: down starting; category: 0-9: intensities are categorised into 10 levels in terms of the span.
- Type
pd.DataFrame
- num_peak¶
Number of peaks
- Type
int
- n¶
Number of points
- Type
int
- maximum¶
Maximum of intensity
- Type
int
- minimum¶
Minimum of intensity
- Type
int
- analyse()¶
Analyse the status of each point
The status indicates the thrend of the point, such as risng and declining. For waveforms with a frequency higher than 10, it will identify double peaks according their prominences and tail duration. This function also calculates the valley possitions.
- analyse_normal_waves(diff_n=1, l=3)¶
Get status of points for a normal wave
- Parameters
series – A series of which the name is the well name, values are amplitudes
diff_n – n times of continuous points higher than this point to be regarded as a starting point
l – length of exploration to find a point after higher than upper_half, indicating it can be a starting point
- blood_pressure_profile() pandas.core.frame.DataFrame¶
Get profile of blood pressure
The function returns a data frame with rows of cycles of blood pressure change and columns of systolic, diastolic, pulse pressure and RR distance.
- calc_amplitudes()¶
- calc_fft_freq_ratio()¶
Calculate the energy ratio between double frequency (minor) and major
\[FFT Ratio = \frac{\sum_{x=f_{mi}-0.05}^{f_{mi}+0.05}p(x)} {\sum_{x=f_{ma}-0.05}^{f_{ma}+0.05}p(x)}\]
- calc_frequency_parameters()¶
- calc_peak_width()¶
- calc_shoulder_parameters()¶
Calculate shoulder related parameters
Shoulder parameters include mean and standard deviation of shoulder position (in amplitude), and median and standard deviation of shoulder_tail ratio. Median is used because sometime the ratio is unstable. For example, when tail is missed in one period, the ratio will become 100, which is cover other normal values.
- calc_status_parameter()¶
- check_status_points()¶
A debugging function checking whether any period has more than 1 or has not starting point or downing point.
- draw_series(figure, series=None, style='-')¶
Plot points of the waveform :param figure: A Matplotlib figure object :param series: A pandas Series of the waveform :param style: A Maplotlib line style, - by default indicating a line plot
- draw_status(figure)¶
Plot the points into a figure colored by their status black indicates normal points; orange is starting point, the first point of raising wave; red is the peak and green is the first point of the tail. :param figure: A Matplotlib figure object
- export(filename)¶
Export the waveform to a csv file
- fix_double_peak_by_prominence()¶
Identify double peaks by their prominence
Definition of a double peak: A peak of which the prominence is below the threshold (see below about the threshold) and the signal value is close to the last real peak. Close means the difference between the signal value is smaller than variance (10% of maximum value). For waves of which the maximum is higher than 250, the threshold of the prominence is 0.7 * maixmum of all prominence. For others less than 250, the threshold is 0.5 * maximum of all prominence. :returns: Always True
- fix_double_peak_by_tail(min_amplitude=100, std_threshold=0.5)¶
Identify double peaks by comparing the average signal values of tails
In some situations, “real” peaks have a long tail but the subpeaks don’t, so we can recognise the subpeaks by comparing their tail with maximum. But for some waveforms, the tails are too short to be used as a symbol to identify double peaks.
- Parameters
min_amplitude – Minimum of amplitude the wave should have as a prerequisite to find double peak.
std_threshold – Number of times that the standard deviation of the tails is higher than mean of the tails which will indicate there is double peak.
- Returns
- If starting point or downing point were not found, return
False and self.fail_analysis will be True. Otherwise always return True
- Return type
bool
- get_parameters()¶
Get all parameters from the waveform.
- get_peaks(height=None, prominence=None, min_prominence=20, span_ratio=0.1) bool¶
Identify the peaks and group of the whole waveform
- Parameters
height – Number or ndarray or sequence, optional Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.
prominence – Number or ndarray or sequence, optional Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence. If None, the minimal prominence will be the max(min_prominence, span_ratio*self.span)
min_prominence – The absolute minimal prominence
span_ratio – The ratio of span as the prominence threshold
- get_valleys()¶
Add valley infomation into the main dataframe
- get_width_distribution(gdf: pandas.core.frame.DataFrame, normalise=True)¶
Calculate the distirbution of width between position of the peak point and the points.
- Parameters
gdf – subdataframe of a group
- Returns
A tuple of grid and y. grid is grid sampled in all widthes. y is the probability of the width
- Return type
tuple
- property group¶
Return a iterator of group (i, gdf), excluding the first and the last
- max_shoulder_tail_ratio = 2.5¶
- peak_uniform_test(interpolation=False)¶
Test if the peak points are in uniform distribution
- Parameters
interpolation – True if interpolation is applied to the waveform. Mostly used when ther are only 3 peaks in the waveform so we need to interpolate the points between the three points. Otherwise KS test may underestimate the possibility
- Returns
The probability of of waveform being in uniform distribution
- Return type
float
- regroup()¶
After the peaks are changed, the groups need to be recalculated
This function calculate the number of peaks and re-define the group number of each point.
- resample(sample_rate=100) numpy.ndarray¶
Resample points in the waveform
- Parameters
sample_rate – How many points per second, default 100
- Returns
An numpy array with sample_rate points per second
- Return type
np.ndarray
- standardise_by_filter()¶
- cdwave.fnc.signal_filter(signals: numpy.ndarray, window_length=5) numpy.ndarray¶
Apply a Savitzky-Golay filter to an array.
- cdwave.fnc.wave_transform(signals: numpy.ndarray, sample_rate=100, method='fft')¶
Transform the wave using methods such as Fourier transformation
- Parameters
signals (np.ndarray) – resampled signals from waveform
sample_rate (int) – How many points per second, default 100
method (str) – Transformation method, default fft, fast fourier transform
- Returns
- A tuple containing:
frq (np.ndarray): Frequency points (Hz)
psd (np.ndarray): Power Spectral Density from FFT
- Return type
tuple
cdwave.hillcurve module¶
- class cdwave.hillcurve.HillCurve(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logc=True)¶
Bases:
object- property EC50¶
- calc_perr()¶
- property curve_diff¶
- property hill¶
- predict(x)¶
- class cdwave.hillcurve.TCPL(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶
Bases:
objectImplementation of ToxCast Pipeline
- Parameters
concentration – A numpy array of concentrations
responess – A number array of parameters responding to the concentrations
concentration_unit – The unit of the concentration. -6 means uM.
boundary – Boundary of the model for fitting. Take auto to use the default boundary, defined in the get_bound.
logit – Whether to take the logirithm of the concentrations. If the input concentration is not in logirthm (e.g. uM), use True.
- Attribution:
k (int): Number of estimated parameters n (int): Number of data points
- property AIC¶
Akaike information criterion
The likelihood is simplified by calculating RSS(MAE) https://www.tandfonline.com/doi/pdf/10.1080/21642583.2018.1496042
Also see Comparison with least squares in https://en.wikipedia.org/wiki/Akaike_information_criterion
- property E50: float¶
- EC50(unit='logM') float¶
- property RMSD¶
- property RSS¶
- calc_perr()¶
- property curve_diff¶
- property curve_max¶
- property curve_min¶
- fit(fnc)¶
- get_bound(c_max, c_min)¶
- predict(x)¶
- class cdwave.hillcurve.TCPLGainLoss(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶
Bases:
cdwave.hillcurve.TCPL- static fnc(x, gw, ga, tp, lw, la, s, b)¶
- get_bound(c_max, c_min)¶
- k = 7¶
- name = 'TCPL-GainLoss'¶
- class cdwave.hillcurve.TCPLHill(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶
Bases:
cdwave.hillcurve.TCPL- static fnc(x, a, x0, k, b)¶
- get_bound(c_max, c_min)¶
- property hill¶
- k = 4¶
- name = 'TCPL-Hill'¶
- class cdwave.hillcurve.TCPLPlain(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶
Bases:
cdwave.hillcurve.TCPL- static fnc(x, b)¶
- get_bound(c_max, c_min)¶
- k = 1¶
- name = 'TCPL-Constant'¶
- cdwave.hillcurve.fit_parameter(df: pandas.core.frame.DataFrame, parameter)¶
Fit the S curve of concentration-response (deprecated)
- Parameters
df – Dataframe from the dataset
ax – Axes object from the matplotlib
- Returns
parameters of the S curve perr: RMSE of the fitted curve
- Return type
popt
- Raises
RuntimeError – When the curve cannot be fitted
- cdwave.hillcurve.fsigmoid(x, a, x0, k, b)¶
- cdwave.hillcurve.gain_loss(x, gw, ga, tp, lw, la, s, b)¶
- cdwave.hillcurve.plain(x, b)¶
cdwave.model module¶
- cdwave.model.four_point_parameter_generator(parameters, suffixes)¶
- cdwave.model.prepare_four_point_model(agg_df: pandas.core.frame.DataFrame, endpoint: pandas.core.series.Series, parameters, suffixes)¶
cdwave.param module¶
- cdwave.param.aggrate_parameters(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None, method='median', compound_column='uniname', plates: Optional[dict] = None)¶
Aggregate the parameters of the same compound under the same concentration by methods such as median or mean
- Parameters
df – Dataframe of the whole parameters
parameters – The parameter list to process
method – The method to aggrate the parameters
plates – A dictionary with the key of compounds and values of list of plates to use.
- Returns
Dataframe with aggregated parameters.
- Return type
DataFrame
- cdwave.param.calc_4_descriptors(df: pandas.core.frame.DataFrame, parameters: List[str], compounds: List[str]) pandas.core.frame.DataFrame¶
Calculate four descriptors for each parameter, including minimum concentration, maximum concentration, median concentration and slope of the concentration-response
- cdwave.param.calc_grit(df: pandas.core.frame.DataFrame, parameter: str)¶
- cdwave.param.calc_rcv(x: numpy.ndarray) float¶
Robust coefficient of variation
Using the second approch in this paper https://arxiv.org/pdf/1907.01110.pdf
\[ \begin{align}\begin{aligned}MAD = med | x_i - m |\\RCV_M = 1.4826 * \frac{MAD}{m}\end{aligned}\end{align} \]- Parameters
x – A 1-d array of parameters
- Returns
robust coefficient of variation
- Return type
float
- cdwave.param.linear_regression_with_logc(df: pandas.core.frame.DataFrame, parameter: str, remove_beatstop=True, error='raise')¶
Use linear regression to derive slope and intercept for a parameter
- Parameters
df – A dataframe that contains the samples needed.
parameter – The name of the parameter that are included in the dataframe
remove_beatstop – Whether to remove the samples of which ‘beat_stop’ is True
error – The return k and b if all the samples are beat_stop or there is no sample. if raise is used, it will raise a ValueError from LinearRegression
- Returns
slope (k) and intercept (b)
- Return type
Tuple
- cdwave.param.normalise_by_baseline(df: pandas.core.frame.DataFrame, subtract_params: list, divide_params: list, divide_only_params: Optional[list] = None, std_params: Optional[dict] = None) pandas.core.frame.DataFrame¶
Normalise the parameters by baseline of the well
- Parameters
subtract_params – Parameter list to be subtracted only.
divide_params – Parameter list to be subtracted and divided.
divide_only_params – Parameter list to be divided only.
std_params – A dictionary mapping standard deviation parameters to its average parameters, such as {‘std_amplitude’: ‘avg_amplitude’}. Parameters in this dictionary will be processed via following equation. \(std(A)= \frac{std(A)}{\overline{A}}\)
- Returns
Normalised parameters
- Return type
DataFrame
- cdwave.param.normalise_by_negctrl(df: pandas.core.frame.DataFrame, standardiser: str = 'sdm', parameters: Optional[list] = None, standardisers: Optional[dict] = None, control_compound: str = 'DMSO') pandas.core.frame.DataFrame¶
Normalise the parameters by negative control of the plate Due to the fact that there will be one negative control in a plate, we use mean to aggragate the parameters.
- Parameters
df – DataFrame of parameters got from CardioWave
standardiser –
Method to standardise the datak, including
sdm: Subtract and divide by median of negative control
sm: Subtract by median of negative control
smdmad: Subtract median and divide by median absolute deviation
parameters – A list of parameters which will be normalised
standardisers – A dictionary of which the keys are standarise methods and the values are parameters implementing the standardisers. This will override standardiser and parameters.
control_compound – The name of control samples in the compound column
- Returns
Normalised parameters
- Return type
DataFrame
- cdwave.param.npoint_descriptor(df: pandas.core.frame.DataFrame, parameter: str, n: int)¶
This function only works for 8 concentrations
- cdwave.param.parameter_correlation(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None)¶
Calculate the correlation between the parameters
- cdwave.param.parameter_projection(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None, method='tsne', n_components=2)¶
- cdwave.param.remove_low_quality(df: pandas.core.frame.DataFrame)¶
Remove waveforms with low quality
Wells of a plate will be removed if:
Double peak in negative control
High standard deviation of peak space in negative control
Low quality in baseline. See
remove_well_by_baseline()
The whole plate will be removed if RCV is higher than 0. See
calc_rcv()- Parameters
df – A dataframe of all the samples with parameters
- Returns
- A tuple containing a filtered dataframe and a dictionary of
removed wells
- Return type
tuple
- cdwave.param.remove_well_by_baseline(pdf: pandas.core.frame.DataFrame) list¶
Remove wells by the quality of baseline (pre-measurement)
When the quality of baseline is low under the following critera, the well should be removed.
There is at least one multi-peak
standard deviation of peak space is higher than 1
maximum amplitude is lower than 100
Some key time point (such as decay point) cannot be recognised
- Parameters
pdf – DataFrame of one plate
- Returns
A list of removing wells
- Return type
list
- cdwave.param.select_concentration(df: pandas.core.frame.DataFrame, c: float, lim=(0.01, 10))¶
- cdwave.param.select_concentration_by_log(df: pandas.core.frame.DataFrame, c: float)¶
- cdwave.param.select_plates(df: pandas.core.frame.DataFrame, t=0.2)¶
Remove plates when amplitude and freq of the lowest concentration is out of +- 0.2
- Parameters
df – input dataframe of the parameters
t – threshold of the good quality
- Returns
A dictionary with key of compounds and value of a list of available plates
- Return type
dict