Routine to perform resampling (cmomy.resample)#

Functions:

freq_to_indices(freq[, shuffle, rng])

Convert a frequency array to indices array.

indices_to_freq(indices[, ndat])

Convert indices to frequency array.

random_indices(nrep, ndat[, nsamp, rng, replace])

Create indices for random resampling (bootstrapping).

random_freq(nrep, ndat[, nsamp, rng, replace])

Create frequencies for random resampling (bootstrapping).

randsamp_freq(ndat[, nrep, nsamp, indices, ...])

Produce a random sample for bootstrapping.

resample_data(data, freq, mom[, axis, ...])

Resample data according to frequency table.

resample_vals(x, freq, mom[, axis, w, ...])

Resample data according to frequency table.

bootstrap_confidence_interval(distribution)

Calculate the error bounds.

xbootstrap_confidence_interval(x[, ...])

Bootstrap xarray object.

cmomy.resample.freq_to_indices(freq, shuffle=True, rng=None)[source]#

Convert a frequency array to indices array.

This creates an “indices” array that is compatible with “freq” array. Note that by default, the indices for a single sample (along output[k, :]) are randomly shuffled. If you pass shuffle=False, then the output will be something like [[0,0,…, 1,1,…, 2,2, …]].

Parameters:
  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • shuffle (bool, default: True) – If True (default), shuffle values for each row.

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

Returns:

ndarray – Indices array of shape (nrep, nsamp) where nsamp = freq[k, :].sum() where k is any row.

cmomy.resample.indices_to_freq(indices, ndat=None)[source]#

Convert indices to frequency array.

It is assumed that indices.shape == (nrep, nsamp) with nsamp == ndat. For cases that nsamp != ndat, pass in ndat.

cmomy.resample.random_indices(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#

Create indices for random resampling (bootstrapping).

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

  • replace (bool, default: True) – Whether to allow replacement.

Returns:

indices (ndarray) – Index array of integers of shape (nrep, nsamp).

cmomy.resample.random_freq(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#

Create frequencies for random resampling (bootstrapping).

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

  • replace (bool, default: True) – Whether to allow replacement.

Returns:

freq (ndarray) – Frequency array. freq[rep, k] is the number of times to sample from the k`th observation for replicate `rep.

See also

random_indices

cmomy.resample.randsamp_freq(ndat, nrep=None, nsamp=None, indices=None, freq=None, check=False, rng=None)[source]#

Produce a random sample for bootstrapping.

In order, the return will be one of freq, frequencies from indices or new sample from random_freq().

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • indices (array of int) – Array of shape (nrep, size). If passed, create freq from indices. See randsamp_freq().

  • check (bool, default False) – if check is True, then check freq and indices against ndat and nrep

Returns:

freq (ndarray) – Frequency array.

cmomy.resample.resample_data(data, freq, mom, axis=0, dtype=None, order=None, parallel=True, out=None)[source]#

Resample data according to frequency table.

Parameters:
  • data (array-like) – central mom array to be resampled

  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • mom (int or tuple of int) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variables

  • parallel (bool, default True) – flags to numba.njit

  • out (ndarray, optional) – optional output array.

Returns:

output (array) – output shape is (nrep,) + shape + mom, where shape is the shape of data less axis, and mom is the shape of the resulting mom.

cmomy.resample.resample_vals(x, freq, mom, axis=0, w=None, mom_ndim=None, broadcast=False, dtype=None, order=None, parallel=True, out=None)[source]#

Resample data according to frequency table.

Parameters:
  • x (ndarray or tuple of ndarray) – Input values.

  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • mom (int or tuple of int) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variables

  • axis (int) – Axis to reduce along.

  • w (ndarray[Any, dtype[Any]] | None, default: None) – Weights array.

  • mom_ndim ({1, 2}) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2).

  • broadcast (bool) – If True, and x=(x0, x1), then perform ‘smart’ broadcasting. In this case, if x1.ndim = 1 and len(x1) == x0.shape[axis], then broadcast x1 to x0.shape.

  • dtype (dtype) – Optional dtype for output data.

  • order (Literal['C', 'F', 'A', 'K', None], default: None) – Parameter order to numpy.asarray().

  • parallel (bool, default True) – flags to numba.njit

  • out (ndarray) – Optional output array.

Returns:

ndarray – Resampled central moments array.

cmomy.resample.bootstrap_confidence_interval(distribution, stats_val='mean', axis=0, alpha=0.05, style=None, **kwargs)[source]#

Calculate the error bounds.

Parameters:
  • distribution (array-like) – distribution of values to consider

  • stats_val (array-like, {None, 'mean','median'}, optional) –

    • array: perform pivotal error bounds (correct) with this as value.

    • percentile: percentiles, with value as median

    • mean: pivotal error bounds with mean as value

    • median: pivotal error bounds with median as value

  • axis (int, default 0) – axis to analyze along

  • alpha (float) – alpha value for confidence interval. Percent confidence = 100 * (1 - alpha)

  • style ({None, 'delta', 'pm'}) – controls style of output

  • **kwargs (Any) – extra arguments to numpy.percentile

Returns:

out (array) – fist dimension will be statistics. Other dimensions have shape of input less axis reduced over. Depending on style first dimension will be (note val is either stats_val or median):

  • None: [val, low, high]

  • delta: [val, val-low, high - val]

  • pm : [val, (high - low) / 2]

cmomy.resample.xbootstrap_confidence_interval(x, stats_val='mean', axis=0, dim=None, alpha=0.05, style=None, bootstrap_dim='bootstrap', bootstrap_coords=None, **kwargs)[source]#

Bootstrap xarray object.

Parameters:
  • dim (str) – if passed, use reduce along this dimension

  • bootstrap_dim (str, default 'bootstrap') – name of new dimension. If bootstrap_dim conflicts, then new_name = dim + new_name

  • bootstrap_coords (array-like or str) – coords of new dimension. If None, use default names If string, use this for the ‘values’ name