cdwave package¶

Submodules¶

cdwave.data module¶

class cdwave.data.DataLoader(filepath: Optional[str] = None, log=None)¶

Bases: object

parse()¶: This function should parse waveform data from files into self.waveforms

transfer(validate=True, sort=False) → cdwave.data.Dataset¶

Parse the waveform from tables and generate a dataset

Returns: The dataset containing all the waveforms and parameters
Return type: Dataset

class cdwave.data.Dataset(waveforms: Optional[List[cdwave.data.WaveformFull]] = None)¶

Bases: object

A dataset contains a list of WaveformFull objects and a meta table with the information and parameters of the waveforms

waveforms¶: A list of WaveformFull objects.

dataframe¶: The meta table of the waveforms.

filtered_df¶: A view of the meta table.

filterable_columns¶: columns that can be used to filter the waveforms

size¶: Number of waveforms in the dataset

Parameters: waveforms – A list of WaveformFull objects

static concat(datasets: list) → cdwave.data.Dataset¶

copy()¶

property df¶

property dtypes¶

export_raw(filename=None, compression='infer')¶: Export the row data into a csv file with the columns of compound,concentration,well,plate,time,signal

filter_by_filters(filters, replace=True)¶

Parameters: filters – a dictionary of {column: value}
Returns: filtered dataframe

get_df() → pandas.core.frame.DataFrame¶

get_parameter_df() → pandas.core.frame.DataFrame¶: Return a dataframe with all papameters for each wave

static loaddata(filepath) → cdwave.data.Dataset¶

merge(dataset)¶

save(filename, compress=True)¶

class cdwave.data.SeqLoader(filepath: str, plate: Optional[str] = None, state: Optional[str] = None, opts=None, log=None)¶

Bases: cdwave.data.DataLoader

parse()¶: This function should parse waveform data from files into self.waveforms

transfer_with_meta(df: pandas.core.frame.DataFrame, columns=None) → cdwave.data.Dataset¶

class cdwave.data.StandardCSVLoader(filepath: Optional[str] = None, data: Optional[pandas.core.frame.DataFrame] = None, log=None)¶

Bases: cdwave.data.DataLoader

A loader parsing “standard csv file”.

The format of the table file is like:

compound,concentration,well,plate,time,signal
CP1,0.1,A1,P1,0,1000
CP1,0.1,A1,P1,0.33,1001
CP2,0.1,A2,P1,0,1000

parse() → List¶: This function should parse waveform data from files into self.waveforms

class cdwave.data.WaveformFull(item, scale=True, window_length=5)¶

Bases: object

WaveformFull contains all the information about a sample.

The information includes compound name, concentration or vendor, and all calculated parameters of a waveform.

profile¶: A dictionary of The profile of the waveform, including: plate: The plate name form which the waveform is generated. compound: The compound name of the waveform. concentration: The concentration of the compound of the waveform. well: The well of the waveform. cpid: Compound id.

signal¶

A dictionary with with keys, x and y, which are are time and calcium transient of the waveform.

Type: dict

state¶: The state of the waveform, such as before treatment and after treatment.

parameters¶

A dictionary containing all the parameters of the waveform.

Type: dict

Parameters

item (dict) – A dictionary contains all the information of the waveform required keys: signals, plate, well, concentration. item[‘singal’] is a dictionary with times (x) and signals (y) {‘x’: [0.1, 0.2], ‘y’:[100, 101]}
scale (bool) – Whether to scale the minimum to 0. Default is True
window_length (int) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer. If set to 0, the waveform will not be smoothed.

Example

>>> item = {'signal': {'x': [0.1, 0.2], 'y': [100, 101]},
...         'plate': 'P1',
...         'compound': 'cmp1',
...         'concentration': 0.01,
...         'well': 'A1'}
>>> wave = WaveformFull(item)

get_dict()¶

get_parameters(fillna=0)¶

Return a dictionary with all parameters

Parameters

fillna – If fill na is ‘raise’, then an exception will be raised when
values (a parameter has not been calculated. For other) –
used (it will be) –
parameter. (to fill the empty) –

Returns

A dictionary with all parameters listed in parameter_names

Return type

dict

get_signal_series()¶

static standardise_signal(signal: dict, scale=True, window_length=5)¶

cdwave.data.get_wells()¶

cdwave.derive module¶

cdwave.derive.calc_parameter(waveform: cdwave.data.WaveformFull) → dict¶: Calculate parameters of a waveform

cdwave.derive.calc_parameter_with_threshold(waveform: cdwave.data.WaveformFull, threshold, method='prominence') → dict¶: Calculate parameters of a waveform

cdwave.derive.calc_parameters_for_waveforms(dataset: cdwave.data.Dataset, process_fnc: Optional[collections.abc.Callable] = None, batch: int = 200, processes: Optional[int] = None, custom_calculator: Optional[collections.abc.Callable] = None)¶

Calculate parameter for waveforms

Parameters

dataset – The waveform dataset.
process_fnc – A processing function used to send out the progress, see default_process_fnc, which uses tqdm
batch – Batch size for multi-processing.
processes – Number of processors.
custom_calculator – A custom calculator which can setup the custom thresholds. If None, calc_parameter will be used by default.

cdwave.derive.calc_parameters_with_threshold(dataset: cdwave.data.Dataset, threshold: int = 100, processes: Optional[int] = None, custom_calculator: Optional[collections.abc.Callable] = None)¶

Calculate parameter for waveforms

Parameters

dataset – The waveform dataset
threshold – The threshold
processes – Number of processors
custom_calculator – A custom calculator which can setup the custom thresholds. If None, calc_parameter_with_threshold() will be used by default.

cdwave.derive.default_process_fnc(status=- 1, total=0)¶

cdwave.derive.derive_batch_bp_parameters(batch_bp: cdwave.fnc.BloodPressure)¶

cdwave.derive.derive_bp_parameters(bp: cdwave.fnc.BloodPressure, start=0, end=None, processes=4)¶

cdwave.fnc module¶

class cdwave.fnc.BloodPressure(data: pandas.core.series.Series, batch_window: int = 1800000, window_size: int = 10000, sample_interval: int = 2, point_interval: int = 1000)¶

Bases: object

Class for blood pressure data analysis

Parameters

data – A pandas series with time as index and blood pressure as value.
batch_window – Size of a batch to analyze (unit ms).
window_size – Size of the window to derive parameters.
sample_interval – The interval of sample. If None, it will be inferred by the first two data points.
point_interval – The inverval of points to get parameters.

max_time¶: The maximum time in the data.

calc_angle(start_time, tao)¶

calc_attractor(df, tao: int)¶

calc_hrv(window: pandas.core.frame.DataFrame)¶

Calculate heart rate variability

SD1 is the perpendicular distances of the points \((RR_{n}, RR_{n+1})\) to the line \(y=x\). SD2 is the points to the line \(y=-x + 2R_{m}\), where \(R_{m}\) is the mean of RR intervals.

See Computing in Cardiology 2014; 41:437-440.

Parameters: window – A data frame of blood pressure window
Returns: A tuple with \(R_{m}\), SD1, SD2, and SD1/SD2
Return type: tuple

get_batch_series(series: Optional[pandas.core.series.Series] = None) → Iterator[cdwave.fnc.BloodPressure]¶

get_start_times(batch_series: Optional[pandas.core.series.Series] = None) → pandas.core.indexes.base.Index¶

get_windows_generator() → Tuple[int, Generator[pandas.core.frame.DataFrame, None, None]]¶

Return a generator of window data frame

The data frame has the following columns: SP, DP, PP, RR, time, time_diff

run_filter()¶

class cdwave.fnc.Waveform(series: pandas.core.series.Series, index_penalty: float = 1e-06)¶

Bases: object

Class for waveform analysis

Parameters

series (pd.Series) – A series of which the name is the well name, values are amplitudes.
index_penalty (float) – Add penalty to signals to prioritise former time point during peak detection.

df¶

The main dataframe of the waveform, including several important columns, peak: 1: main peak, 2-x: double peaks; status: 0: normal, 1: raising point, 2: peak, 3: down starting; category: 0-9: intensities are categorised into 10 levels in terms of the span.

Type: pd.DataFrame

num_peak¶

Number of peaks

Type: int

n¶

Number of points

Type: int

maximum¶

Maximum of intensity

Type: int

minimum¶

Minimum of intensity

Type: int

analyse()¶

Analyse the status of each point

The status indicates the thrend of the point, such as risng and declining. For waveforms with a frequency higher than 10, it will identify double peaks according their prominences and tail duration. This function also calculates the valley possitions.

analyse_normal_waves(diff_n=1, l=3)¶

Get status of points for a normal wave

Parameters

series – A series of which the name is the well name, values are amplitudes
diff_n – n times of continuous points higher than this point to be regarded as a starting point
l – length of exploration to find a point after higher than upper_half, indicating it can be a starting point

blood_pressure_profile() → pandas.core.frame.DataFrame¶

Get profile of blood pressure

The function returns a data frame with rows of cycles of blood pressure change and columns of systolic, diastolic, pulse pressure and RR distance.

calc_amplitudes()¶

calc_fft_freq_ratio()¶: Calculate the energy ratio between double frequency (minor) and major

\[FFT Ratio = \frac{\sum_{x=f_{mi}-0.05}^{f_{mi}+0.05}p(x)} {\sum_{x=f_{ma}-0.05}^{f_{ma}+0.05}p(x)}\]

calc_frequency_parameters()¶

calc_peak_width()¶

calc_shoulder_parameters()¶

Calculate shoulder related parameters

Shoulder parameters include mean and standard deviation of shoulder position (in amplitude), and median and standard deviation of shoulder_tail ratio. Median is used because sometime the ratio is unstable. For example, when tail is missed in one period, the ratio will become 100, which is cover other normal values.

calc_status_parameter()¶

check_status_points()¶: A debugging function checking whether any period has more than 1 or has not starting point or downing point.

draw_series(figure, series=None, style='-')¶: Plot points of the waveform :param figure: A Matplotlib figure object :param series: A pandas Series of the waveform :param style: A Maplotlib line style, - by default indicating a line plot

draw_status(figure)¶: Plot the points into a figure colored by their status black indicates normal points; orange is starting point, the first point of raising wave; red is the peak and green is the first point of the tail. :param figure: A Matplotlib figure object

export(filename)¶: Export the waveform to a csv file

fix_double_peak_by_prominence()¶

Identify double peaks by their prominence

Definition of a double peak: A peak of which the prominence is below the threshold (see below about the threshold) and the signal value is close to the last real peak. Close means the difference between the signal value is smaller than variance (10% of maximum value). For waves of which the maximum is higher than 250, the threshold of the prominence is 0.7 * maixmum of all prominence. For others less than 250, the threshold is 0.5 * maximum of all prominence. :returns: Always True

fix_double_peak_by_tail(min_amplitude=100, std_threshold=0.5)¶

Identify double peaks by comparing the average signal values of tails

In some situations, “real” peaks have a long tail but the subpeaks don’t, so we can recognise the subpeaks by comparing their tail with maximum. But for some waveforms, the tails are too short to be used as a symbol to identify double peaks.

Parameters

min_amplitude – Minimum of amplitude the wave should have as a prerequisite to find double peak.
std_threshold – Number of times that the standard deviation of the tails is higher than mean of the tails which will indicate there is double peak.

Returns

If starting point or downing point were not found, return: False and self.fail_analysis will be True. Otherwise always return True

Return type

bool

get_parameters()¶: Get all parameters from the waveform.

get_peaks(height=None, prominence=None, min_prominence=20, span_ratio=0.1) → bool¶

Identify the peaks and group of the whole waveform

Parameters

height – Number or ndarray or sequence, optional Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.
prominence – Number or ndarray or sequence, optional Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence. If None, the minimal prominence will be the max(min_prominence, span_ratio*self.span)
min_prominence – The absolute minimal prominence
span_ratio – The ratio of span as the prominence threshold

get_valleys()¶: Add valley infomation into the main dataframe

get_width_distribution(gdf: pandas.core.frame.DataFrame, normalise=True)¶

Calculate the distirbution of width between position of the peak point and the points.

Parameters: gdf – subdataframe of a group
Returns: A tuple of grid and y. grid is grid sampled in all widthes. y is the probability of the width
Return type: tuple

property group¶: Return a iterator of group (i, gdf), excluding the first and the last

max_shoulder_tail_ratio = 2.5¶

peak_uniform_test(interpolation=False)¶

Test if the peak points are in uniform distribution

Parameters: interpolation – True if interpolation is applied to the waveform. Mostly used when ther are only 3 peaks in the waveform so we need to interpolate the points between the three points. Otherwise KS test may underestimate the possibility
Returns: The probability of of waveform being in uniform distribution
Return type: float

regroup()¶

After the peaks are changed, the groups need to be recalculated

This function calculate the number of peaks and re-define the group number of each point.

resample(sample_rate=100) → numpy.ndarray¶

Resample points in the waveform

Parameters: sample_rate – How many points per second, default 100
Returns: An numpy array with sample_rate points per second
Return type: np.ndarray

standardise_by_filter()¶

cdwave.fnc.signal_filter(signals: numpy.ndarray, window_length=5) → numpy.ndarray¶: Apply a Savitzky-Golay filter to an array.

cdwave.fnc.wave_transform(signals: numpy.ndarray, sample_rate=100, method='fft')¶

Transform the wave using methods such as Fourier transformation

Parameters

signals (np.ndarray) – resampled signals from waveform
sample_rate (int) – How many points per second, default 100
method (str) – Transformation method, default fft, fast fourier transform

Returns

A tuple containing:

frq (np.ndarray): Frequency points (Hz)
psd (np.ndarray): Power Spectral Density from FFT

Return type

tuple

cdwave.hillcurve module¶

class cdwave.hillcurve.HillCurve(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logc=True)¶

Bases: object

property EC50¶

calc_perr()¶

property curve_diff¶

property hill¶

predict(x)¶

class cdwave.hillcurve.TCPL(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶

Bases: object

Implementation of ToxCast Pipeline

Parameters

concentration – A numpy array of concentrations
responess – A number array of parameters responding to the concentrations
concentration_unit – The unit of the concentration. -6 means uM.
boundary – Boundary of the model for fitting. Take auto to use the default boundary, defined in the get_bound.
logit – Whether to take the logirithm of the concentrations. If the input concentration is not in logirthm (e.g. uM), use True.

Attribution:: k (int): Number of estimated parameters n (int): Number of data points

property AIC¶

Akaike information criterion

The likelihood is simplified by calculating RSS(MAE) https://www.tandfonline.com/doi/pdf/10.1080/21642583.2018.1496042

Also see Comparison with least squares in https://en.wikipedia.org/wiki/Akaike_information_criterion

property E50: float¶

EC50(unit='logM') → float¶

property RMSD¶

property RSS¶

calc_perr()¶

property curve_diff¶

property curve_max¶

property curve_min¶

fit(fnc)¶

get_bound(c_max, c_min)¶

predict(x)¶

class cdwave.hillcurve.TCPLGainLoss(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶

Bases: cdwave.hillcurve.TCPL

static fnc(x, gw, ga, tp, lw, la, s, b)¶

get_bound(c_max, c_min)¶

k = 7¶

name = 'TCPL-GainLoss'¶

class cdwave.hillcurve.TCPLHill(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶

Bases: cdwave.hillcurve.TCPL

static fnc(x, a, x0, k, b)¶

get_bound(c_max, c_min)¶

property hill¶

k = 4¶

name = 'TCPL-Hill'¶

class cdwave.hillcurve.TCPLPlain(concentrations: numpy.ndarray, responses: numpy.ndarray, concentration_unit=- 6, boundary='auto', logit=True)¶

Bases: cdwave.hillcurve.TCPL

static fnc(x, b)¶

get_bound(c_max, c_min)¶

k = 1¶

name = 'TCPL-Constant'¶

cdwave.hillcurve.fit_parameter(df: pandas.core.frame.DataFrame, parameter)¶

Fit the S curve of concentration-response (deprecated)

Parameters

df – Dataframe from the dataset
ax – Axes object from the matplotlib

Returns

parameters of the S curve perr: RMSE of the fitted curve

Return type

popt

Raises

RuntimeError – When the curve cannot be fitted

cdwave.hillcurve.fsigmoid(x, a, x0, k, b)¶

cdwave.hillcurve.gain_loss(x, gw, ga, tp, lw, la, s, b)¶

cdwave.hillcurve.plain(x, b)¶

cdwave.model module¶

cdwave.model.four_point_parameter_generator(parameters, suffixes)¶

cdwave.model.prepare_four_point_model(agg_df: pandas.core.frame.DataFrame, endpoint: pandas.core.series.Series, parameters, suffixes)¶

cdwave.param module¶

cdwave.param.aggrate_parameters(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None, method='median', compound_column='uniname', plates: Optional[dict] = None)¶

Aggregate the parameters of the same compound under the same concentration by methods such as median or mean

Parameters

df – Dataframe of the whole parameters
parameters – The parameter list to process
method – The method to aggrate the parameters
plates – A dictionary with the key of compounds and values of list of plates to use.

Returns

Dataframe with aggregated parameters.

Return type

DataFrame

cdwave.param.calc_4_descriptors(df: pandas.core.frame.DataFrame, parameters: List[str], compounds: List[str]) → pandas.core.frame.DataFrame¶: Calculate four descriptors for each parameter, including minimum concentration, maximum concentration, median concentration and slope of the concentration-response

cdwave.param.calc_grit(df: pandas.core.frame.DataFrame, parameter: str)¶

cdwave.param.calc_rcv(x: numpy.ndarray) → float¶

Robust coefficient of variation

Using the second approch in this paper https://arxiv.org/pdf/1907.01110.pdf

\[ \begin{align}\begin{aligned}MAD = med | x_i - m |\\RCV_M = 1.4826 * \frac{MAD}{m}\end{aligned}\end{align} \]

Parameters: x – A 1-d array of parameters
Returns: robust coefficient of variation
Return type: float

cdwave.param.linear_regression_with_logc(df: pandas.core.frame.DataFrame, parameter: str, remove_beatstop=True, error='raise')¶

Use linear regression to derive slope and intercept for a parameter

Parameters

df – A dataframe that contains the samples needed.
parameter – The name of the parameter that are included in the dataframe
remove_beatstop – Whether to remove the samples of which ‘beat_stop’ is True
error – The return k and b if all the samples are beat_stop or there is no sample. if raise is used, it will raise a ValueError from LinearRegression

Returns

slope (k) and intercept (b)

Return type

Tuple

cdwave.param.normalise_by_baseline(df: pandas.core.frame.DataFrame, subtract_params: list, divide_params: list, divide_only_params: Optional[list] = None, std_params: Optional[dict] = None) → pandas.core.frame.DataFrame¶

Normalise the parameters by baseline of the well

Parameters

subtract_params – Parameter list to be subtracted only.
divide_params – Parameter list to be subtracted and divided.
divide_only_params – Parameter list to be divided only.
std_params – A dictionary mapping standard deviation parameters to its average parameters, such as {‘std_amplitude’: ‘avg_amplitude’}. Parameters in this dictionary will be processed via following equation. \(std(A)= \frac{std(A)}{\overline{A}}\)

Returns

Normalised parameters

Return type

DataFrame

cdwave.param.normalise_by_negctrl(df: pandas.core.frame.DataFrame, standardiser: str = 'sdm', parameters: Optional[list] = None, standardisers: Optional[dict] = None, control_compound: str = 'DMSO') → pandas.core.frame.DataFrame¶

Normalise the parameters by negative control of the plate Due to the fact that there will be one negative control in a plate, we use mean to aggragate the parameters.

Parameters

df – DataFrame of parameters got from CardioWave
standardiser –
Method to standardise the datak, including
- sdm: Subtract and divide by median of negative control
- sm: Subtract by median of negative control
- smdmad: Subtract median and divide by median absolute deviation
parameters – A list of parameters which will be normalised
standardisers – A dictionary of which the keys are standarise methods and the values are parameters implementing the standardisers. This will override standardiser and parameters.
control_compound – The name of control samples in the compound column

Returns

Normalised parameters

Return type

DataFrame

cdwave.param.npoint_descriptor(df: pandas.core.frame.DataFrame, parameter: str, n: int)¶: This function only works for 8 concentrations

cdwave.param.parameter_correlation(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None)¶: Calculate the correlation between the parameters

cdwave.param.parameter_projection(df: pandas.core.frame.DataFrame, parameters: Optional[list] = None, method='tsne', n_components=2)¶

cdwave.param.remove_low_quality(df: pandas.core.frame.DataFrame)¶

Remove waveforms with low quality

Wells of a plate will be removed if:

Double peak in negative control
High standard deviation of peak space in negative control
Low quality in baseline. See remove_well_by_baseline()

The whole plate will be removed if RCV is higher than 0. See calc_rcv()

Parameters

df – A dataframe of all the samples with parameters

Returns

A tuple containing a filtered dataframe and a dictionary of: removed wells

Return type

tuple

cdwave.param.remove_well_by_baseline(pdf: pandas.core.frame.DataFrame) → list¶

Remove wells by the quality of baseline (pre-measurement)

When the quality of baseline is low under the following critera, the well should be removed.

There is at least one multi-peak
standard deviation of peak space is higher than 1
maximum amplitude is lower than 100
Some key time point (such as decay point) cannot be recognised

Parameters: pdf – DataFrame of one plate
Returns: A list of removing wells
Return type: list

cdwave.param.select_concentration(df: pandas.core.frame.DataFrame, c: float, lim=(0.01, 10))¶

cdwave.param.select_concentration_by_log(df: pandas.core.frame.DataFrame, c: float)¶

cdwave.param.select_plates(df: pandas.core.frame.DataFrame, t=0.2)¶

Remove plates when amplitude and freq of the lowest concentration is out of +- 0.2

Parameters

df – input dataframe of the parameters
t – threshold of the good quality

Returns

A dictionary with key of compounds and value of a list of available plates

Return type

dict

cdwave package¶

Submodules¶

cdwave.data module¶

cdwave.derive module¶

cdwave.fnc module¶

cdwave.hillcurve module¶

cdwave.model module¶

cdwave.param module¶

Module contents¶

Table of Contents

Previous topic

This Page