Routine to perform resampling (cmomy.resample)#

Classes:

IndexSampler(*[, indices, freq, ndat, ...])

Wrapper around indices and freq resample arrays

Functions:

factory_sampler([sampler, freq, indices, ...])

Factory method to create sampler.

freq_to_indices(freq, *[, shuffle, rng, ...])

Convert a frequency array to indices array.

indices_to_freq(indices, *[, ndat, parallel])

Convert indices to frequency array.

jackknife_data(data[, data_reduced, axis, ...])

Perform jackknife resample and moments data.

jackknife_freq(ndat)

Frequency array for jackknife resampling.

jackknife_vals(x, *y[, data_reduced, axis, ...])

Jackknife by value.

random_freq(nrep, ndat[, nsamp, rng, ...])

Create frequencies for random resampling (bootstrapping).

random_indices(nrep, ndat[, nsamp, rng, replace])

Create indices for random resampling (bootstrapping).

resample_data(data, *, sampler[, axis, dim, ...])

Resample and reduce data.

resample_vals(x, *y, sampler, mom[, axis, ...])

Resample and reduce values.

select_ndat(data, *[, axis, dim, mom_ndim, ...])

Determine ndat from array.

class cmomy.resample.IndexSampler(*, indices=None, freq=None, ndat=None, parallel=None, shuffle=False, rng=None, fastpath=False)[source]#

Bases: Generic[SamplerArrayT]

Wrapper around indices and freq resample arrays

This a convenience wrapper class to make working with resampling indices straightforward. cmomy primarily performs resampling using frequency tables instead of the more standard resampling indices arrays. This class keeps track of both.

Parameters:
  • indices (ndarray, DataArray, or Dataset) – Indices resampling array.

  • freq (ndarray, DataArray, or Dataset) – Frequency resampling table.

  • ndat (int) – Size of data along resampled axis.

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

  • shuffle (bool) – If True, shuffle indices created from freq for each row.

  • rng (Union[int, Sequence[int], SeedSequence, BitGenerator, Generator, None], default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • fastpath (bool) – Internal variable.

Methods:

from_params(nrep, ndat[, nsamp, rng, ...])

Create sampler from parameters

from_data(data, *, nrep[, nsamp, axis, dim, ...])

Create sampler for data.

classmethod from_params(nrep, ndat, nsamp=None, rng=None, replace=True, parallel=None)[source]#

Create sampler from parameters

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Union[int, Sequence[int], SeedSequence, BitGenerator, Generator, None], default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • resample_replace (bool) – If True, do resampling with replacement.

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

Returns:

resample (IndexSampler) – Wrapped object will be an ndarray of integers.

classmethod from_data(data, *, nrep, nsamp=None, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, rep_dim='rep', paired=True, rng=None, replace=True, parallel=None)[source]#

Create sampler for data.

Parameters:
  • data (ndarray, DataArray, or Dataset)

  • nrep (int) – Number of resample replicates.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • mom_ndim ({1, 2}, optional) – If mom_ndim is not None, then wrap axis relative to mom_ndim. For Example, with mom_ndim=``2``, axis = -1 will be transformed to axis = -3. If mom_dims is passed and data is an xarray object, infer mom_n=ndim from mom_dims.

  • mom_axes (int or tuple of int, optional) – Location of the moment dimensions. Default to (-mom_ndim, -mom_ndim+1, ...). If specified and mom_ndim is None, set mom_ndim to len(mom_axes). Note that if mom_axes is specified, negative values are relative to the end of the array. This is also the case for axes if mom_axes is specified.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. If specified, infer mom_ndim from mom_dims. If also pass mom_ndim, check that mom_dims is consistent with mom_dims. If not specified, defaults to data.dims[-mom_ndim:]. This is primarily used if data is a Dataset, or if mom_dims are not the last dimensions.

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • paired (bool) – If False and generating freq from nrep with data of type Dataset, Generate unique freq for each variable in data. If True, treat all variables in data as paired, and use same freq for each.

  • rng (RngTypes | None, default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • resample_replace (bool) – If True, do resampling with replacement.

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

Returns:

sampler (IndexSampler) – Type of wrapped array depends on the passed parameters. In all cases, if data is an array, sampler will wrap an array, if data is an DataArray, sampler will wrap an DataArray. If data is an Dataset, return a wrapped DataArray if paired=True or if the resulting Dataset has only one variable, and a Dataset otherwise.

cmomy.resample.factory_sampler(sampler=None, *, freq=None, indices=None, nrep=None, ndat=None, nsamp=None, paired=True, rng=None, replace=True, shuffle=False, data=None, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, rep_dim='rep', parallel=None)[source]#

Factory method to create sampler.

The main intent of the function is to be called by other functions/method that need a sampler. For example, it is used in .resample_data. You can pass in a frequency array, an IndexSampler, or a mapping to create an IndexSampler. The order of evaluation is as follows:

  1. sampler is a IndexSampler: return sampler.

  2. sampler is None:
    • if specify ndat: return IndexSampler.from_param(...)

    • if specify data: return IndexSampler.from_data(...)

  3. sampler is array-like: return IndexSampler(freq=sampler, ...)

  4. sampler is an int, return IndexSampler.from_data(..., nrep=sampler)

  5. sampler is a mapping: return factory_sampler(**sampler, data=data, axis=axis, dim=dims, mom_ndim=mom_ndim, mom_dims=mom_dims, rep_dim=rep_dim).

Parameters:
  • sampler (int or array-like or IndexSampler or mapping) – Passed through resample.factory_sampler() to create an IndexSampler. Value can either be nrep (the number of replicates), freq (frequency array), a IndexSampler object, or a mapping of parameters. The mapping can have form of FactoryIndexSamplerKwargs. Allowable keys are freq, indices, ndat, nrep, nsamp, paired, rng, replace, shuffle.

  • freq (array-like, DataArray, or Dataset of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to a replicate. If freq is an xarray object, it should have dimensions rep_dim and dim.

  • indices (array of int) – Array of shape (nrep, size). If passed, create freq from indices.

  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • paired (bool) – If False and generating freq from nrep with data of type Dataset, Generate unique freq for each variable in data. If True, treat all variables in data as paired, and use same freq for each.

  • rng (RngTypes | None, default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • resample_replace (bool) – If True, do resampling with replacement.

  • shuffle (bool)

  • data (array-like) – If needed, extract ndat from data. Also used if paired = True.

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • mom_ndim ({1, 2}, optional) – If mom_ndim is not None, then wrap axis relative to mom_ndim. For Example, with mom_ndim=``2``, axis = -1 will be transformed to axis = -3. If mom_dims is passed and data is an xarray object, infer mom_n=ndim from mom_dims.

  • mom_axes (int or tuple of int, optional) – Location of the moment dimensions. Default to (-mom_ndim, -mom_ndim+1, ...). If specified and mom_ndim is None, set mom_ndim to len(mom_axes). Note that if mom_axes is specified, negative values are relative to the end of the array. This is also the case for axes if mom_axes is specified.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. If specified, infer mom_ndim from mom_dims. If also pass mom_ndim, check that mom_dims is consistent with mom_dims. If not specified, defaults to data.dims[-mom_ndim:]. This is primarily used if data is a Dataset, or if mom_dims are not the last dimensions.

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

Returns:

IndexSampler

Examples

>>> a = factory_sampler(nrep=3, ndat=2, rng=0)
>>> b = factory_sampler(dict(nrep=3, ndat=2, rng=0))
>>> c = factory_sampler(dict(freq=a.freq))
>>> d = factory_sampler(a)
>>> for other in [b, c, d]:
...     np.testing.assert_equal(a.freq, other.freq)
>>> assert d is a

To instead just pass indices, use:

>>> e = factory_sampler(dict(indices=a.indices))
>>> assert a.indices is e.indices
cmomy.resample.freq_to_indices(freq, *, shuffle=False, rng=None, parallel=None)[source]#

Convert a frequency array to indices array.

This creates an “indices” array that is compatible with “freq” array. Note that by default, the indices for a single sample (along output[k, :]) are in sorted order (something like [[0, 0, …, 1, 1, …], …]). Pass shuffle = True to randomly shuffle indices along axis=1.

Parameters:
  • freq (array-like, DataArray, or Dataset of int) – Array of shape (nrep, size) where nrep is the number of replicates and size = self.shape[axis]. freq is the weight that each sample contributes to a replicate. If freq is an xarray object, it should have dimensions rep_dim and dim.

  • shuffle (bool) – If True, shuffle indices created from freq for each row.

  • rng (Union[int, Sequence[int], SeedSequence, BitGenerator, Generator, None], default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

Returns:

ndarray – Indices array of shape (nrep, nsamp) where nsamp = freq[k, :].sum() where k is any row.

cmomy.resample.indices_to_freq(indices, *, ndat=None, parallel=None)[source]#

Convert indices to frequency array.

It is assumed that indices.shape == (nrep, nsamp) with nsamp == ndat. For cases that nsamp != ndat, pass in ndat explicitly.

cmomy.resample.jackknife_data(data, data_reduced=None, *, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_axes_reduced=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=False, keep_attrs=None, apply_ufunc_kwargs=None)[source]#

Perform jackknife resample and moments data.

This uses moments addition/subtraction to speed up jackknife resampling.

Parameters:
  • data (ndarray or DataArray or Dataset) – Moments array(s). It is assumed moment dimensions are last.

  • data_reduced (array-like or DataArray, optional) – data reduced along axis or dim. This will be calculated using reduce_data() if not passed.

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • mom_ndim ({1, 2}, optional) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2). If not specified and data is an xarray object attempt to infer mom_ndim from mom_dims. Otherwise, default to mom_ndim = 1.

  • mom_axes (int or tuple of int, optional) – Location of the moment dimensions. Default to (-mom_ndim, -mom_ndim+1, ...). If specified and mom_ndim is None, set mom_ndim to len(mom_axes). Note that if mom_axes is specified, negative values are relative to the end of the array. This is also the case for axes if mom_axes is specified.

  • mom_axes_reduced (int or sequence of int) – Location(s) of moment dimensions in data_reduced. This option is only needed if data_reduced is passed in and is an array. Defaults to mom_axes, or last dimensions of data_reduced.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. If specified, infer mom_ndim from mom_dims. If also pass mom_ndim, check that mom_dims is consistent with mom_dims. If not specified, defaults to data.dims[-mom_ndim:]. This is primarily used if data is a Dataset, or if mom_dims are not the last dimensions.

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns a Dataset, then this option is ignored.

  • dtype (dtype) – Optional dtype for output data.

  • casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –

    Controls what kind of data casting may occur.

    • ’no’ means the data types should not be cast at all.

    • ’equiv’ means only byte-order changes are allowed.

    • ’safe’ means only casts which can preserve values are allowed.

    • ’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.

    • ’unsafe’ (default) means any data conversions may be done.

  • order ({"C", "F", "A", "K"}, optional) – Order argument. See numpy.asarray().

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

  • axes_to_end (bool) – If True, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as input axis and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

  • apply_ufunc_kwargs (dict-like) – Extra parameters to xarray.apply_ufunc(). One useful option is on_missing_core_dim, which can take the value "copy" (the default), "raise", or "drop" and controls what to do with variables of a Dataset missing core dimensions. Other options are join, dataset_join, dataset_fill_value, and dask_gufunc_kwargs. Unlisted options are handled internally.

Returns:

out (ndarray or DataArray) – Jackknife resampled along axis. That is, out[...,axis=i, ...] is reduced_data(out[...,axis=[...,i-1,i+1,...], ...]).

Examples

>>> import cmomy
>>> data = cmomy.default_rng(0).random((4, 3))
>>> out_jackknife = jackknife_data(data, mom_ndim=1, axis=0)
>>> out_jackknife
array([[1.5582, 0.7822, 0.2247],
       [2.1787, 0.6322, 0.22  ],
       [1.5886, 0.5969, 0.0991],
       [1.2601, 0.4982, 0.3478]])

Note that this is equivalent to (but typically faster than) resampling with a frequency table from :func:cmomy.resample.jackknife_freq

>>> freq = cmomy.resample.jackknife_freq(4)
>>> resample_data(data, sampler=dict(freq=freq), mom_ndim=1, axis=0)
array([[1.5582, 0.7822, 0.2247],
       [2.1787, 0.6322, 0.22  ],
       [1.5886, 0.5969, 0.0991],
       [1.2601, 0.4982, 0.3478]])

To speed up the calculation even further, pass in data_reduced

>>> data_reduced = cmomy.reduce_data(data, mom_ndim=1, axis=0)
>>> jackknife_data(data, mom_ndim=1, axis=0, data_reduced=data_reduced)
array([[1.5582, 0.7822, 0.2247],
       [2.1787, 0.6322, 0.22  ],
       [1.5886, 0.5969, 0.0991],
       [1.2601, 0.4982, 0.3478]])

Also works with DataArray objects

>>> xdata = xr.DataArray(data, dims=["samp", "mom"])
>>> jackknife_data(xdata, mom_ndim=1, dim="samp", rep_dim="jackknife")
<xarray.DataArray (jackknife: 4, mom: 3)> Size: 96B
array([[1.5582, 0.7822, 0.2247],
       [2.1787, 0.6322, 0.22  ],
       [1.5886, 0.5969, 0.0991],
       [1.2601, 0.4982, 0.3478]])
Dimensions without coordinates: jackknife, mom
cmomy.resample.jackknife_freq(ndat)[source]#

Frequency array for jackknife resampling.

Use this frequency array to perform jackknife [1] resampling

Parameters:

ndat (int) – Size of data along resampled axis.

Returns:

freq (ndarray) – Frequency array for jackknife resampling.

References

Examples

>>> jackknife_freq(4)
array([[0, 1, 1, 1],
       [1, 0, 1, 1],
       [1, 1, 0, 1],
       [1, 1, 1, 0]])
cmomy.resample.jackknife_vals(x, *y, data_reduced=None, mom, axis=MISSING, dim=MISSING, weight=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=True, keep_attrs=None, apply_ufunc_kwargs=None)[source]#

Jackknife by value.

Parameters:
  • x (array-like or DataArray or Dataset) – Values to reduce.

  • *y (array-like or DataArray or Dataset) – Additional values (needed if len(mom)==2). y has same type restrictions and broadcasting rules as weight.

  • data_reduced (array-like or DataArray, optional) – data reduced along axis or dim. This will be calculated using reduce_vals() if not passed. Same type restrictions as weight.

  • mom (int or tuple of int) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variables

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • weight (array-like or DataArray or Dataset) –

    Optional weight. The type of weight must be “less than” the type of x.

    In the case that weight is array-like, it must broadcast to x using usual broadcasting rules (see numpy.broadcast_to()), with the following exceptions: If weight is a 1d array of length x.shape[axis]], it will be formatted to broadcast along the other dimensions of x. For example, if x has shape (10, 2, 3) and weight has shape (10,), then weight will be converted to the broadcastable shape (10, 1, 1). If weight is a scalar, it will be broadcast to x.shape.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. Defaults to ("mom_0",) for mom_ndim==1 and (mom_0, mom_1) for mom_ndim==2

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns a Dataset, then this option is ignored.

  • dtype (dtype) – Optional dtype for output data.

  • casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –

    Controls what kind of data casting may occur.

    • ’no’ means the data types should not be cast at all.

    • ’equiv’ means only byte-order changes are allowed.

    • ’safe’ means only casts which can preserve values are allowed.

    • ’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.

    • ’unsafe’ (default) means any data conversions may be done.

  • order ({"C", "F", "A", "K"}, optional) – Order argument. See numpy.asarray().

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

  • axes_to_end (bool) – If True, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as input axis and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

  • apply_ufunc_kwargs (dict-like) – Extra parameters to xarray.apply_ufunc(). One useful option is on_missing_core_dim, which can take the value "copy" (the default), "raise", or "drop" and controls what to do with variables of a Dataset missing core dimensions. Other options are join, dataset_join, dataset_fill_value, and dask_gufunc_kwargs. Unlisted options are handled internally.

Returns:

out (ndarray or DataArray) – Resampled Central moments array. out.shape = (...,shape[axis-1], shape[axis+1], ..., shape[axis], mom0, ...) where shape = x.shape. That is, the resampled dimension is moved to the end, just before the moment dimensions.

Notes

Note that the resampled axis (resamp_axis) is at position -(len(mom) + 1), just before the moment axes. This is opposed to the behavior of resampling moments arrays (e.g., func:.resample_data), where the resampled axis is the same as the argument axis. This is because the shape of the output array when resampling values is dependent the result of broadcasting x and y and weight.

cmomy.resample.random_freq(nrep, ndat, nsamp=None, rng=None, replace=True, parallel=None)[source]#

Create frequencies for random resampling (bootstrapping).

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Union[int, Sequence[int], SeedSequence, BitGenerator, Generator, None], default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • replace (bool, default: True) – Whether to allow replacement.

  • parallel (bool | None, default: None) – The description is missing.

Returns:

  • freq (ndarray) – Frequency array. freq[rep, k] is the number of times to sample from the k`th observation for replicate `rep.

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

See also

random_indices

cmomy.resample.random_indices(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#

Create indices for random resampling (bootstrapping).

Parameters:
  • nrep (int) – Number of resample replicates.

  • ndat (int) – Size of data along resampled axis.

  • nsamp (int) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.

  • rng (Union[int, Sequence[int], SeedSequence, BitGenerator, Generator, None], default: None) – Random number generator object. Defaults to output of default_rng(). If pass in a seed value, create a new Generator object with this seed

  • replace (bool, default: True) – Whether to allow replacement.

Returns:

indices (ndarray) – Index array of integers of shape (nrep, nsamp).

cmomy.resample.resample_data(data, *, sampler, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=False, keep_attrs=None, apply_ufunc_kwargs=None)[source]#

Resample and reduce data.

Parameters:
  • data (ndarray or DataArray or Dataset) – Moments array(s). It is assumed moment dimensions are last.

  • sampler (int or array-like or IndexSampler or mapping) – Passed through resample.factory_sampler() to create an IndexSampler. Value can either be nrep (the number of replicates), freq (frequency array), a IndexSampler object, or a mapping of parameters. The mapping can have form of FactoryIndexSamplerKwargs. Allowable keys are freq, indices, ndat, nrep, nsamp, paired, rng, replace, shuffle.

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • mom_ndim ({1, 2}, optional) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2). If not specified and data is an xarray object attempt to infer mom_ndim from mom_dims. Otherwise, default to mom_ndim = 1.

  • mom_axes (int or tuple of int, optional) – Location of the moment dimensions. Default to (-mom_ndim, -mom_ndim+1, ...). If specified and mom_ndim is None, set mom_ndim to len(mom_axes). Note that if mom_axes is specified, negative values are relative to the end of the array. This is also the case for axes if mom_axes is specified.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. If specified, infer mom_ndim from mom_dims. If also pass mom_ndim, check that mom_dims is consistent with mom_dims. If not specified, defaults to data.dims[-mom_ndim:]. This is primarily used if data is a Dataset, or if mom_dims are not the last dimensions.

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns a Dataset, then this option is ignored.

  • dtype (dtype) – Optional dtype for output data.

  • casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –

    Controls what kind of data casting may occur.

    • ’no’ means the data types should not be cast at all.

    • ’equiv’ means only byte-order changes are allowed.

    • ’safe’ means only casts which can preserve values are allowed.

    • ’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.

    • ’unsafe’ (default) means any data conversions may be done.

  • order ({"C", "F", "A", "K"}, optional) – Order argument. See numpy.asarray().

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

  • axes_to_end (bool) – If True, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as input axis and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

  • apply_ufunc_kwargs (dict-like) – Extra parameters to xarray.apply_ufunc(). One useful option is on_missing_core_dim, which can take the value "copy" (the default), "raise", or "drop" and controls what to do with variables of a Dataset missing core dimensions. Other options are join, dataset_join, dataset_fill_value, and dask_gufunc_kwargs. Unlisted options are handled internally.

Returns:

out (ndarray or DataArray) – Resampled central moments. out.shape = (..., shape[axis-1], nrep, shape[axis+1], ...), where shape = data.shape and nrep = sampler.nrep .

cmomy.resample.resample_vals(x, *y, sampler, mom, axis=MISSING, dim=MISSING, weight=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=True, keep_attrs=None, apply_ufunc_kwargs=None)[source]#

Resample and reduce values.

Parameters:
  • x (array-like or DataArray or Dataset) – Values to reduce.

  • *y (array-like or DataArray or Dataset) – Additional values (needed if len(mom)==2). y has same type restrictions and broadcasting rules as weight.

  • sampler (int or array-like or IndexSampler or mapping) – Passed through resample.factory_sampler() to create an IndexSampler. Value can either be nrep (the number of replicates), freq (frequency array), a IndexSampler object, or a mapping of parameters. The mapping can have form of FactoryIndexSamplerKwargs. Allowable keys are freq, indices, ndat, nrep, nsamp, paired, rng, replace, shuffle.

  • mom (int or tuple of int) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variables

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • weight (array-like or DataArray or Dataset) –

    Optional weight. The type of weight must be “less than” the type of x.

    In the case that weight is array-like, it must broadcast to x using usual broadcasting rules (see numpy.broadcast_to()), with the following exceptions: If weight is a 1d array of length x.shape[axis]], it will be formatted to broadcast along the other dimensions of x. For example, if x has shape (10, 2, 3) and weight has shape (10,), then weight will be converted to the broadcastable shape (10, 1, 1). If weight is a scalar, it will be broadcast to x.shape.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. Defaults to ("mom_0",) for mom_ndim==1 and (mom_0, mom_1) for mom_ndim==2

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

  • rep_dim (hashable) – Name of new ‘replicated’ dimension:

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns a Dataset, then this option is ignored.

  • dtype (dtype) – Optional dtype for output data.

  • casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –

    Controls what kind of data casting may occur.

    • ’no’ means the data types should not be cast at all.

    • ’equiv’ means only byte-order changes are allowed.

    • ’safe’ means only casts which can preserve values are allowed.

    • ’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.

    • ’unsafe’ (default) means any data conversions may be done.

  • order ({"C", "F"}, optional) – Order argument. See numpy.zeros().

  • parallel (bool, optional) – If True, use parallel numba numba.njit or numba.guvectorized code if possible. If None, use a heuristic to determine if should attempt to use parallel method.

  • axes_to_end (bool) – If True, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as input axis and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

  • apply_ufunc_kwargs (dict-like) – Extra parameters to xarray.apply_ufunc(). One useful option is on_missing_core_dim, which can take the value "copy" (the default), "raise", or "drop" and controls what to do with variables of a Dataset missing core dimensions. Other options are join, dataset_join, dataset_fill_value, and dask_gufunc_kwargs. Unlisted options are handled internally.

Returns:

out (ndarray or DataArray) – Resampled Central moments array. out.shape = (...,shape[axis-1], nrep, shape[axis+1], ...) where shape = x.shape. and nrep = sampler.nrep. This can be overridden by setting axes_to_end.

Notes

Note that the resampled axis (resamp_axis) is at position -(len(mom) + 1), just before the moment axes. This is opposed to the behavior of resampling moments arrays (e.g., func:.resample_data), where the resampled axis is the same as the argument axis. This is because the shape of the output array when resampling values is dependent the result of broadcasting x and y and weight.

cmomy.resample.select_ndat(data, *, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None)[source]#

Determine ndat from array.

Parameters:
  • data (ndarray, DataArray, Dataset)

  • axis (int) – Axis to reduce/sample along.

  • dim (hashable) – Dimension to reduce/sample along.

  • mom_ndim ({1, 2}, optional) – If mom_ndim is not None, then wrap axis relative to mom_ndim. For Example, with mom_ndim=``2``, axis = -1 will be transformed to axis = -3. If mom_dims is passed and data is an xarray object, infer mom_n=ndim from mom_dims.

  • mom_axes (int or tuple of int, optional) – Location of the moment dimensions. Default to (-mom_ndim, -mom_ndim+1, ...). If specified and mom_ndim is None, set mom_ndim to len(mom_axes). Note that if mom_axes is specified, negative values are relative to the end of the array. This is also the case for axes if mom_axes is specified.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. If specified, infer mom_ndim from mom_dims. If also pass mom_ndim, check that mom_dims is consistent with mom_dims. If not specified, defaults to data.dims[-mom_ndim:]. This is primarily used if data is a Dataset, or if mom_dims are not the last dimensions.

  • mom_params (cmomy.MomParams or cmomy.MomParamsDict or dict, optional) – Moment parameters. You can set moment parameters axes and dims using this option. For example, passing mom_params={"dim": ("a", "b")} is equivalent to passing mom_dims=("a", "b"). You can also pass as a cmomy.MomParams object with mom_params=cmomy.MomParams(dims=("a", "b")).

Returns:

int – size of data along specified axis or dim

Examples

>>> data = np.zeros((2, 3, 4))
>>> select_ndat(data, axis=1)
3

To wrap relative to the last mom_ndim dimensions of data, use complex axes

>>> select_ndat(data, axis=-1j, mom_ndim=2)
2
>>> xdata = xr.DataArray(data, dims=["x", "y", "mom"])
>>> select_ndat(xdata, dim="y")
3
>>> select_ndat(xdata, dim="mom", mom_ndim=1)
Traceback (most recent call last):
...
ValueError: Cannot select moment dimension. dim='mom', axis=2.