cdwave package

Submodules

cdwave.data module

class cdwave.data.DataLoader(filepath: Optional[str] = None, log=None)

Bases: object

parse()

This function should parse waveform data from files into self.waveforms

transfer(validate=True, sort=False) cdwave.data.Dataset

Parse the waveform from tables and generate a dataset

Returns

The dataset containing all the waveforms and parameters

Return type

Dataset

class cdwave.data.Dataset(waveforms: Optional[List[cdwave.data.WaveformFull]] = None)

Bases: object

A dataset contains a list of WaveformFull objects and a meta table with the information and parameters of the waveforms

waveforms

A list of WaveformFull objects.

dataframe

The meta table of the waveforms.

filtered_df

A view of the meta table.

filterable_columns

columns that can be used to filter the waveforms

size

Number of waveforms in the dataset

Parameters

waveforms – A list of WaveformFull objects

static concat(datasets: list) cdwave.data.Dataset
copy()
property df
property dtypes
export_raw(filename=None, compression='infer')

Export the row data into a csv file with the columns of compound,concentration,well,plate,time,signal

filter_by_filters(filters, replace=True)
Parameters

filters – a dictionary of {column: value}

Returns

filtered dataframe

get_df() pandas.core.frame.DataFrame
get_parameter_df() pandas.core.frame.DataFrame

Return a dataframe with all papameters for each wave

static loaddata(filepath) cdwave.data.Dataset
merge(dataset)
save(filename, compress=True)
class cdwave.data.SeqLoader(filepath: str, plate: Optional[str] = None, state: Optional[str] = None, opts=None, log=None)

Bases: cdwave.data.DataLoader

parse()

This function should parse waveform data from files into self.waveforms

transfer_with_meta(df: pandas.core.frame.DataFrame, columns=None) cdwave.data.Dataset
class cdwave.data.StandardCSVLoader(filepath: Optional[str] = None, data: Optional[pandas.core.frame.DataFrame] = None, log=None)

Bases: cdwave.data.DataLoader

A loader parsing “standard csv file”.

The format of the table file is like:

compound,concentration,well,plate,time,signal
CP1,0.1,A1,P1,0,1000
CP1,0.1,A1,P1,0.33,1001
CP2,0.1,A2,P1,0,1000
parse() List

This function should parse waveform data from files into self.waveforms

class cdwave.data.WaveformFull(item, scale=True, window_length=5)

Bases: object

WaveformFull contains all the information about a sample.

The information includes compound name, concentration or vendor, and all calculated parameters of a waveform.

profile

A dictionary of The profile of the waveform, including: plate: The plate name form which the waveform is generated. compound: The compound name of the waveform. concentration: The concentration of the compound of the waveform. well: The well of the waveform. cpid: Compound id.

signal

A dictionary with with keys, x and y, which are are time and calcium transient of the waveform.

Type

dict

state

The state of the waveform, such as before treatment and after treatment.

parameters

A dictionary containing all the parameters of the waveform.

Type

dict

Parameters
  • item (dict) – A dictionary contains all the information of the waveform required keys: signals, plate, well, concentration. item[‘singal’] is a dictionary with times (x) and signals (y) {‘x’: [0.1, 0.2], ‘y’:[100, 101]}

  • scale (bool) – Whether to scale the minimum to 0. Default is True

  • window_length (int) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer. If set to 0, the waveform will not be smoothed.

Example

>>> item = {'signal': {'x': [0.1, 0.2], 'y': [100, 101]},
...         'plate': 'P1',
...         'compound': 'cmp1',
...         'concentration': 0.01,
...         'well': 'A1'}
>>> wave = WaveformFull(item)
get_dict()
get_parameters(fillna=0)

Return a dictionary with all parameters

Parameters
  • fillna – If fill na is ‘raise’, then an exception will be raised when

  • values (a parameter has not been calculated. For other) –

  • used (it will be) –

  • parameter. (to fill the empty) –

Returns

A dictionary with all parameters listed in parameter_names

Return type

dict

get_signal_series()
static standardise_signal(signal: dict, scale=True, window_length=5)
cdwave.data.get_wells()

cdwave.derive module

cdwave.derive.calc_parameter(waveform: cdwave.data.WaveformFull) dict

Calculate parameters of a waveform

cdwave.derive.calc_parameter_with_threshold(waveform: cdwave.data.WaveformFull, threshold, method='prominence') dict

Calculate parameters of a waveform

cdwave.derive.calc_parameters_for_waveforms(dataset: cdwave.data.Dataset, process_fnc: Optional[collections.abc.Callable] = None, batch: int = 200, processes: Optional[int] = None, custom_calculator: Optional[collections.abc.Callable] = None)

Calculate parameter for waveforms

Parameters
  • dataset – The waveform dataset.

  • process_fnc – A processing function used to send out the progress, see default_process_fnc, which uses tqdm

  • batch – Batch size for multi-processing.

  • processes – Number of processors.

  • custom_calculator – A custom calculator which can setup the custom thresholds. If None, calc_parameter will be used by default.

cdwave.derive.calc_parameters_with_threshold(dataset: cdwave.data.Dataset, threshold: int = 100, processes: Optional[int] = None, custom_calculator: Optional[collections.abc.Callable] = None)

Calculate parameter for waveforms

Parameters
  • dataset – The waveform dataset

  • threshold – The threshold

  • processes – Number of processors

  • custom_calculator – A custom calculator which can setup the custom thresholds. If None, calc_parameter_with_threshold() will be used by default.

cdwave.derive.default_process_fnc(status=- 1, total=0)
cdwave.derive.derive_batch_bp_parameters(batch_bp: cdwave.fnc.BloodPressure)
cdwave.derive.derive_bp_parameters(bp: cdwave.fnc.BloodPressure, start=0, end=None, processes=4)

cdwave.fnc module

class cdwave.fnc.BloodPressure(data: pandas.core.series.Series, batch_window: int = 1800000, window_size: int = 10000, sample_interval: int = 2, point_interval: int = 1000)

Bases: object

Class for blood pressure data analysis

Parameters
  • data – A pandas series with time as index and blood pressure as value.

  • batch_window – Size of a batch to analyze (unit ms).

  • window_size – Size of the window to derive parameters.

  • sample_interval – The interval of sample. If None, it will be inferred by the first two data points.

  • point_interval – The inverval of points to get parameters.

max_time

The maximum time in the data.

calc_angle(start_time, tao)
calc_attractor(df, tao: int)
calc_hrv(window: pandas.core.frame.DataFrame)

Calculate heart rate variability

SD1 is the perpendicular distances of the points \((RR_{n}, RR_{n+1})\) to the line \(y=x\). SD2 is the points to the line \(y=-x + 2R_{m}\), where \(R_{m}\) is the mean of RR intervals.

See Computing in Cardiology 2014; 41:437-440.

Parameters

window – A data frame of blood pressure window

Returns

A tuple with \(R_{m}\), SD1, SD2, and SD1/SD2

Return type

tuple

get_batch_series(series: Optional[pandas.core.series.Series] = None) Iterator[cdwave.fnc.BloodPressure]
get_start_times(batch_series: Optional[pandas.core.series.Series] = None) pandas.core.indexes.base.Index
get_windows_generator() Tuple[int, Generator[pandas.core.frame.DataFrame, None, None]]

Return a generator of window data frame

The data frame has the following columns: SP, DP, PP, RR, time, time_diff

run_filter()
class cdwave.fnc.Waveform(series: pandas.core.series.Series, index_penalty: float = 1e-06)

Bases: object

Class for waveform analysis

Parameters
  • series (pd.Series) – A series of which the name is the well name, values are amplitudes.

  • index_penalty (float) – Add penalty to signals to prioritise former time point during peak detection.

df

The main dataframe of the waveform, including several important columns, peak: 1: main peak, 2-x: double peaks; status: 0: normal, 1: raising point, 2: peak, 3: down starting; category: 0-9: intensities are categorised into 10 levels in terms of the span.

Type

pd.DataFrame

num_peak

Number of peaks

Type

int

n

Number of points

Type

int

maximum

Maximum of intensity

Type

int

minimum

Minimum of intensity

Type

int

analyse()

Analyse the status of each point

The status indicates the thrend of the point, such as risng and declining. For waveforms with a frequency higher than 10, it will identify double peaks according their prominences and tail duration. This function also calculates the valley possitions.

analyse_normal_waves(diff_n=1, l=3)

Get status of points for a normal wave

Parameters
  • series – A series of which the name is the well name, values are amplitudes

  • diff_n – n times of continuous points higher than this point to be regarded as a starting point

  • l – length of exploration to find a point after higher than upper_half, indicating it can be a starting point

blood_pressure_profile() pandas.core.frame.DataFrame

Get profile of blood pressure

The function returns a data frame with rows of cycles of blood pressure change and columns of systolic, diastolic, pulse pressure and RR distance.

calc_amplitudes()
calc_fft_freq_ratio()

Calculate the energy ratio between double frequency (minor) and major

\[FFT Ratio = \frac{\sum_{x=f_{mi}-0.05}^{f_{mi}+0.05}p(x)} {\sum_{x=f_{ma}-0.05}^{f_{ma}+0.05}p(x)}\]
calc_frequency_parameters()
calc_peak_width()
calc_shoulder_parameters()

Calculate shoulder related parameters

Shoulder parameters include mean and standard deviation of shoulder position (in amplitude), and median and standard deviation of shoulder_tail ratio. Median is used because sometime the ratio is unstable. For example, when tail is missed in one period, the ratio will become 100, which is cover other normal values.

calc_status_parameter()
check_status_points()

A debugging function checking whether any period has more than 1 or has not starting point or downing point.

draw_series(figure, series=None, style='-')

Plot points of the waveform :param figure: A Matplotlib figure object :param series: A pandas Series of the waveform :param style: A Maplotlib line style, - by default indicating a line plot

draw_status(figure)

Plot the points into a figure colored by their status black indicates normal points; orange is starting point, the first point of raising wave; red is the peak and green is the first point of the tail. :param figure: A Matplotlib figure object

export(filename)

Export the waveform to a csv file

fix_double_peak_by_prominence()

Identify double peaks by their prominence

Definition of a double peak: A peak of which the prominence is below the threshold (see below about the threshold) and the signal value is close to the last real peak. Close means the difference between the signal value is smaller than variance (10% of maximum value). For waves of which the maximum is higher than 250, the threshold of the prominence is 0.7 * maixmum of all prominence. For others less than 250, the threshold is 0.5 * maximum of all prominence. :returns: Always True

fix_double_peak_by_tail(min_amplitude=100, std_threshold=0.5)

Identify double peaks by comparing the average signal values of tails

In some situations, “real” peaks have a long tail but the subpeaks don’t, so we can recognise the subpeaks by comparing their tail with maximum. But for some waveforms, the tails are too short to be used as a symbol to identify double peaks.

Parameters
  • min_amplitude – Minimum of amplitude the wave should have as a prerequisite to find double peak.

  • std_threshold – Number of times that the standard deviation of the tails is higher than mean of the tails which will indicate there is double peak.

Returns

If starting point or downing point were not found, return

False and self.fail_analysis will be True. Otherwise always return True

Return type

bool

get_parameters()

Get all parameters from the waveform.

get_peaks(height=None, prominence=None, min_prominence=20, span_ratio=0.1) bool

Identify the peaks and group of the whole waveform

Parameters
  • height – Number or ndarray or sequence, optional Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.

  • prominence – Number or ndarray or sequence, optional Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence. If None, the minimal prominence will be the max(min_prominence, span_ratio*self.span)

  • min_prominence – The absolute minimal prominence

  • span_ratio – The ratio of span as the prominence threshold

get_valleys()

Add valley infomation into the main dataframe

get_width_distribution(gdf: pandas.core.frame.DataFrame, normalise=True)

Calculate the distirbution of width between position of the peak point and the points.

Parameters

gdf – subdataframe of a group

Returns

A tuple of grid and y. grid is grid sampled in all widthes. y is the probability of the width

Return type

tuple

property group

Return a iterator of group (i, gdf), excluding the first and the last

max_shoulder_tail_ratio = 2.5
peak_uniform_test(interpolation=False)

Test if the peak points are in uniform distribution

Parameters

interpolation – True if interpolation is applied to the waveform. Mostly used when ther are only 3 peaks in the waveform so we need to interpolate the points between the three points. Otherwise KS test may underestimate the possibility

Returns

The probability of of waveform being in uniform distribution

Return type

float

regroup()

After the peaks are changed, the groups need to be recalculated

This function calculate the number of peaks and re-define the group number of each point.

resample(sample_rate=100) numpy.ndarray

Resample points in the waveform

Parameters

sample_rate – How many points per second, default 100

Returns

An numpy array with sample_rate points per second

Return type

np.ndarray

standardise_by_filter()
cdwave.fnc.signal_filter(signals: numpy.ndarray, window_length=5) numpy.ndarray

Apply a Savitzky-Golay filter to an array.

cdwave.fnc.wave_transform(signals: numpy.ndarray, sample_rate=100, method='fft')

Transform the wave using methods such as Fourier transformation

Parameters
  • signals (np.ndarray) – resampled signals from waveform

  • sample_rate (int) – How many points per second, default 100

  • method (str) – Transformation method, default fft, fast fourier transform

Returns

A tuple containing:
  • frq (np.ndarray): Frequency points (Hz)

  • psd (np.ndarray): Power Spectral Density from FFT

Return type

tuple

cdwave.hillcurve module

class cdwave.hillcurve.HillCurve(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logc=True)

Bases: object

property EC50
calc_perr()
property curve_diff
property hill
predict(x)
class cdwave.hillcurve.TCPL(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)

Bases: object

Implementation of ToxCast Pipeline

Parameters
  • concentration – A numpy array of concentrations

  • responess – A number array of parameters responding to the concentrations

  • concentration_unit – The unit of the concentration. -6 means uM.

  • boundary – Boundary of the model for fitting. Take auto to use the default boundary, defined in the get_bound.

  • logit – Whether to take the logirithm of the concentrations. If the input concentration is not in logirthm (e.g. uM), use True.

Attribution:

k (int): Number of estimated parameters n (int): Number of data points

property AIC

Akaike information criterion

The likelihood is simplified by calculating RSS(MAE) https://www.tandfonline.com/doi/pdf/10.1080/21642583.2018.1496042

Also see Comparison with least squares in https://en.wikipedia.org/wiki/Akaike_information_criterion

property E50: float
EC50(unit='logM') float
property RMSD
property RSS
calc_perr()
property curve_diff
property curve_max
property curve_min
fit(fnc)
get_bound(c_max, c_min)
predict(x)
class cdwave.hillcurve.TCPLGainLoss(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)

Bases: cdwave.hillcurve.TCPL

static fnc(x, gw, ga, tp, lw, la, s, b)
get_bound(c_max, c_min)
k = 7
name = 'TCPL-GainLoss'
class cdwave.hillcurve.TCPLHill(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)

Bases: cdwave.hillcurve.TCPL

static fnc(x, a, x0, k, b)
get_bound(c_max, c_min)
property hill
k = 4
name = 'TCPL-Hill'
class cdwave.hillcurve.TCPLPlain(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)

Bases: cdwave.hillcurve.TCPL

static fnc(x, b)
get_bound(c_max, c_min)
k = 1
name = 'TCPL-Constant'
cdwave.hillcurve.fit_parameter(df: pandas.core.frame.DataFrame, parameter)

Fit the S curve of concentration-response (deprecated)

Parameters
  • df – Dataframe from the dataset

  • ax – Axes object from the matplotlib

Returns

parameters of the S curve perr: RMSE of the fitted curve

Return type

popt

Raises

RuntimeError – When the curve cannot be fitted

cdwave.hillcurve.fsigmoid(x, a, x0, k, b)
cdwave.hillcurve.gain_loss(x, gw, ga, tp, lw, la, s, b)
cdwave.hillcurve.plain(x, b)

cdwave.model module

cdwave.model.four_point_parameter_generator(parameters, suffixes)
cdwave.model.prepare_four_point_model(agg_df: pandas.core.frame.DataFrame, endpoint: pandas.core.series.Series, parameters, suffixes)

cdwave.param module

cdwave.param.aggrate_parameters(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None, method='median', compound_column='uniname', plates: Optional[dict] = None)

Aggregate the parameters of the same compound under the same concentration by methods such as median or mean

Parameters
  • df – Dataframe of the whole parameters

  • parameters – The parameter list to process

  • method – The method to aggrate the parameters

  • plates – A dictionary with the key of compounds and values of list of plates to use.

Returns

Dataframe with aggregated parameters.

Return type

DataFrame

cdwave.param.calc_4_descriptors(df: pandas.core.frame.DataFrame, parameters: List[str], compounds: List[str]) pandas.core.frame.DataFrame

Calculate four descriptors for each parameter, including minimum concentration, maximum concentration, median concentration and slope of the concentration-response

cdwave.param.calc_grit(df: pandas.core.frame.DataFrame, parameter: str)
cdwave.param.calc_rcv(x: numpy.ndarray) float

Robust coefficient of variation

Using the second approch in this paper https://arxiv.org/pdf/1907.01110.pdf

\[ \begin{align}\begin{aligned}MAD = med | x_i - m |\\RCV_M = 1.4826 * \frac{MAD}{m}\end{aligned}\end{align} \]
Parameters

x – A 1-d array of parameters

Returns

robust coefficient of variation

Return type

float

cdwave.param.linear_regression_with_logc(df: pandas.core.frame.DataFrame, parameter: str, remove_beatstop=True, error='raise')

Use linear regression to derive slope and intercept for a parameter

Parameters
  • df – A dataframe that contains the samples needed.

  • parameter – The name of the parameter that are included in the dataframe

  • remove_beatstop – Whether to remove the samples of which ‘beat_stop’ is True

  • error – The return k and b if all the samples are beat_stop or there is no sample. if raise is used, it will raise a ValueError from LinearRegression

Returns

slope (k) and intercept (b)

Return type

Tuple

cdwave.param.normalise_by_baseline(df: pandas.core.frame.DataFrame, subtract_params: list, divide_params: list, divide_only_params: Optional[list] = None, std_params: Optional[dict] = None) pandas.core.frame.DataFrame

Normalise the parameters by baseline of the well

Parameters
  • subtract_params – Parameter list to be subtracted only.

  • divide_params – Parameter list to be subtracted and divided.

  • divide_only_params – Parameter list to be divided only.

  • std_params – A dictionary mapping standard deviation parameters to its average parameters, such as {‘std_amplitude’: ‘avg_amplitude’}. Parameters in this dictionary will be processed via following equation. \(std(A)= \frac{std(A)}{\overline{A}}\)

Returns

Normalised parameters

Return type

DataFrame

cdwave.param.normalise_by_negctrl(df: pandas.core.frame.DataFrame, standardiser: str = 'sdm', parameters: Optional[list] = None, standardisers: Optional[dict] = None, control_compound: str = 'DMSO') pandas.core.frame.DataFrame

Normalise the parameters by negative control of the plate Due to the fact that there will be one negative control in a plate, we use mean to aggragate the parameters.

Parameters
  • df – DataFrame of parameters got from CardioWave

  • standardiser

    Method to standardise the datak, including

    • sdm: Subtract and divide by median of negative control

    • sm: Subtract by median of negative control

    • smdmad: Subtract median and divide by median absolute deviation

  • parameters – A list of parameters which will be normalised

  • standardisers – A dictionary of which the keys are standarise methods and the values are parameters implementing the standardisers. This will override standardiser and parameters.

  • control_compound – The name of control samples in the compound column

Returns

Normalised parameters

Return type

DataFrame

cdwave.param.npoint_descriptor(df: pandas.core.frame.DataFrame, parameter: str, n: int)

This function only works for 8 concentrations

cdwave.param.parameter_correlation(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None)

Calculate the correlation between the parameters

cdwave.param.parameter_projection(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None, method='tsne', n_components=2)
cdwave.param.remove_low_quality(df: pandas.core.frame.DataFrame)

Remove waveforms with low quality

Wells of a plate will be removed if:

  1. Double peak in negative control

  2. High standard deviation of peak space in negative control

  3. Low quality in baseline. See remove_well_by_baseline()

The whole plate will be removed if RCV is higher than 0. See calc_rcv()

Parameters

df – A dataframe of all the samples with parameters

Returns

A tuple containing a filtered dataframe and a dictionary of

removed wells

Return type

tuple

cdwave.param.remove_well_by_baseline(pdf: pandas.core.frame.DataFrame) list

Remove wells by the quality of baseline (pre-measurement)

When the quality of baseline is low under the following critera, the well should be removed.

  1. There is at least one multi-peak

  2. standard deviation of peak space is higher than 1

  3. maximum amplitude is lower than 100

  4. Some key time point (such as decay point) cannot be recognised

Parameters

pdf – DataFrame of one plate

Returns

A list of removing wells

Return type

list

cdwave.param.select_concentration(df: pandas.core.frame.DataFrame, c: float, lim=(0.01, 10))
cdwave.param.select_concentration_by_log(df: pandas.core.frame.DataFrame, c: float)
cdwave.param.select_plates(df: pandas.core.frame.DataFrame, t=0.2)

Remove plates when amplitude and freq of the lowest concentration is out of +- 0.2

Parameters
  • df – input dataframe of the parameters

  • t – threshold of the good quality

Returns

A dictionary with key of compounds and value of a list of available plates

Return type

dict

Module contents