Routine to perform resampling (cmomy.resample)#

Functions:

freq_to_indices(freq[, shuffle, rng])

Convert a frequency array to indices array.

indices_to_freq(indices[, ndat])

Convert indices to frequency array.

random_indices(nrep, ndat[, nsamp, rng, replace])

Create indices for random resampling (bootstrapping).

random_freq(nrep, ndat[, nsamp, rng, replace])

Create frequencies for random resampling (bootstrapping).

select_ndat(data, *[, axis, dim, mom_ndim])

Determine ndat from array.

randsamp_freq(*[, ndat, nrep, nsamp, ...])

Convenience function to create frequency table for resampling.

resample_data(data, *, mom_ndim, freq[, ...])

Resample data according to frequency table.

resample_vals(x, *y, mom, freq[, weight, ...])

Resample data according to frequency table.

bootstrap_confidence_interval(distribution)

Calculate the error bounds.

xbootstrap_confidence_interval(x[, ...])

Bootstrap xarray object.

cmomy.resample.freq_to_indices(freq, shuffle=True, rng=None)[source]#

Convert a frequency array to indices array.

This creates an “indices” array that is compatible with “freq” array. Note that by default, the indices for a single sample (along output[k, :]) are randomly shuffled. If you pass shuffle=False, then the output will be something like [[0,0,…, 1,1,…, 2,2, …]].

Parameters:
  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • shuffle (bool, default: True) – If True (default), shuffle values for each row.

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

Returns:

ndarray – Indices array of shape (nrep, nsamp) where nsamp = freq[k, :].sum() where k is any row.

cmomy.resample.indices_to_freq(indices, ndat=None)[source]#

Convert indices to frequency array.

It is assumed that indices.shape == (nrep, nsamp) with nsamp == ndat. For cases that nsamp != ndat, pass in ndat.

cmomy.resample.random_indices(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#

Create indices for random resampling (bootstrapping).

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

  • replace (bool, default: True) – Whether to allow replacement.

Returns:

indices (ndarray) – Index array of integers of shape (nrep, nsamp).

cmomy.resample.random_freq(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#

Create frequencies for random resampling (bootstrapping).

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

  • replace (bool, default: True) – Whether to allow replacement.

Returns:

freq (ndarray) – Frequency array. freq[rep, k] is the number of times to sample from the k`th observation for replicate `rep.

See also

random_indices

cmomy.resample.select_ndat(data, *, axis=MISSING, dim=MISSING, mom_ndim=None)[source]#

Determine ndat from array.

Parameters:
  • data (ndarray or DataArray)

  • {axis}

  • {dim}

  • mom_ndim (int, optional) – If specified, then treat data as a moments array, and wrap negative values for axis relative to value dimensions only.

Returns:

int – size of data along specified axis or dim

Examples

>>> data = np.zeros((2, 3, 4))
>>> select_ndat(data, axis=1)
3
>>> select_ndat(data, axis=-1, mom_ndim=2)
2
>>> xdata = xr.DataArray(data, dims=["x", "y", "mom"])
>>> select_ndat(xdata, dim="y")
3
>>> select_ndat(xdata, dim="mom", mom_ndim=1)
Traceback (most recent call last):
...
ValueError: Cannot select moment dimension.  axis=2, dim='mom'.
cmomy.resample.randsamp_freq(*, ndat=None, nrep=None, nsamp=None, indices=None, freq=None, data=None, axis=MISSING, dim=MISSING, mom_ndim=None, check=False, rng=None)[source]#

Convenience function to create frequency table for resampling.

In order, the return will be one of freq, frequencies from indices or new sample from random_freq().

Parameters:
  • ndat (int) – Size of data along resampled axis.

  • nrep (int) – Number of resample replicates.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • indices (array of int) – Array of shape (nrep, size). If passed, create freq from indices. See randsamp_freq().

  • check (bool, default False) – if check is True, then check freq and indices against ndat and nrep

  • rng (Generator) – Random number generator object. Defaults to output of default_rng().

  • data (ndarray or DataArray)

  • axis (int) – Axis to reduce along.

  • dim (hashable) – Dimension to reduce along.

  • mom_ndim (int, optional) – If specified, then treat data as a moments array, and wrap negative values for axis relative to value dimensions only.

Notes

If ndat is None, attempt to set ndat using ndat = select_ndat(data, axis=axis, dim=dim, mom_ndim=mom_ndim). See select_ndat().

Returns:

freq (ndarray) – Frequency array.

Examples

>>> import cmomy
>>> rng = cmomy.random.default_rng(0)
>>> randsamp_freq(ndat=3, nrep=5, rng=rng)
array([[0, 2, 1],
       [3, 0, 0],
       [3, 0, 0],
       [0, 1, 2],
       [0, 2, 1]])

Create from data and axis

>>> data = np.zeros((2, 3, 5))
>>> freq = randsamp_freq(data=data, axis=-1, mom_ndim=1, nrep=5, rng=rng)
>>> freq
array([[0, 2, 1],
       [1, 1, 1],
       [1, 0, 2],
       [0, 2, 1],
       [1, 0, 2]])

This can also be used to convert from indices to freq array

>>> indices = freq_to_indices(freq)
>>> randsamp_freq(data=data, axis=-1, mom_ndim=1, indices=indices)
array([[0, 2, 1],
       [1, 1, 1],
       [1, 0, 2],
       [0, 2, 1],
       [1, 0, 2]])
cmomy.resample.resample_data(data, *, mom_ndim, freq, axis=MISSING, dim=MISSING, rep_dim='rep', order=None, parallel=True, dtype=None, out=None, keep_attrs=None)[source]#

Resample data according to frequency table.

Parameters:
  • data (array-like) – central mom array to be resampled

  • mom_ndim ({1, 2}) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2).

  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • axis (int) – Axis to reduce along.

  • dim (hashable) – Dimension to reduce along.

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • parallel (bool, default True) – flags to numba.njit

  • order ({"C", "F", "A", "K"}, optional) – Order argument to numpy.asarray().

  • dtype (dtype) – Optional dtype for output data.

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array.

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

Returns:

out (ndarray) – Resampled central moments. out.shape = (..., shape[axis-1], shape[axis+1], ..., nrep, mom0, ...), where shape = data.shape and nrep = freq.shape[0].

cmomy.resample.resample_vals(x, *y, mom, freq, weight=None, axis=MISSING, order=None, parallel=None, dtype=None, out=None, dim=MISSING, rep_dim='rep', mom_dims=None, keep_attrs=None)[source]#

Resample data according to frequency table.

Parameters:
  • x (ndarray) – Value to analyze

  • *y (array-like, optional) – Second value needed if len(mom)==2.

  • freq (array of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to resamples values. See randsamp_freq()

  • mom (int or tuple of int) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variables

  • weight (array-like, optional) – Optional weights. Can be scalar, 1d array of length args[0].shape[axis] or array of same form as args[0].

  • axis (int) – Axis to reduce along.

  • order ({"C", "F", "A", "K"}, optional) – Order argument to numpy.asarray().

  • parallel (bool, default True) – flags to numba.njit

  • dtype (dtype) – Optional dtype for output data.

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array.

  • dim (hashable) – Dimension to reduce along.

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. Defaults to ("mom_0",) for mom_ndim==1 and (mom_0, mom_1) for mom_ndim==2

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

Returns:

out (ndarray) – Resampled Central moments array. out.shape = (...,shape[axis-1], shape[axis+1], ..., nrep, mom0, ...) where shape = args[0].shape. and nrep = freq.shape[0].

cmomy.resample.bootstrap_confidence_interval(distribution, stats_val='mean', axis=0, alpha=0.05, style=None, **kwargs)[source]#

Calculate the error bounds.

Parameters:
  • distribution (array-like) – distribution of values to consider

  • stats_val (array-like, {None, 'mean','median'}, optional) –

    • array: perform pivotal error bounds (correct) with this as value.

    • percentile: percentiles, with value as median

    • mean: pivotal error bounds with mean as value

    • median: pivotal error bounds with median as value

  • axis (int, default 0) – axis to analyze along

  • alpha (float) – alpha value for confidence interval. Percent confidence = 100 * (1 - alpha)

  • style ({None, 'delta', 'pm'}) – controls style of output

  • **kwargs (Any) – extra arguments to numpy.percentile

Returns:

out (array) – fist dimension will be statistics. Other dimensions have shape of input less axis reduced over. Depending on style first dimension will be (note val is either stats_val or median):

  • None: [val, low, high]

  • delta: [val, val-low, high - val]

  • pm : [val, (high - low) / 2]

cmomy.resample.xbootstrap_confidence_interval(x, stats_val='mean', axis=0, dim=MISSING, alpha=0.05, style=None, bootstrap_dim='bootstrap', bootstrap_coords=None, **kwargs)[source]#

Bootstrap xarray object.

Parameters:
  • dim (str) – if passed, use reduce along this dimension

  • bootstrap_dim (str, default 'bootstrap') – name of new dimension. If bootstrap_dim conflicts, then new_name = dim + new_name

  • bootstrap_coords (array-like or str) – coords of new dimension. If None, use default names If string, use this for the ‘values’ name