Routine to perform resampling (cmomy.resample
)#
Functions:
|
Convert a frequency array to indices array. |
|
Convert indices to frequency array. |
|
Create indices for random resampling (bootstrapping). |
|
Create frequencies for random resampling (bootstrapping). |
|
Determine ndat from array. |
|
Convenience function to create frequency table for resampling. |
|
Resample data according to frequency table. |
|
Resample data according to frequency table. |
|
Calculate the error bounds. |
|
Bootstrap xarray object. |
- cmomy.resample.freq_to_indices(freq, shuffle=True, rng=None)[source]#
Convert a frequency array to indices array.
This creates an “indices” array that is compatible with “freq” array. Note that by default, the indices for a single sample (along output[k, :]) are randomly shuffled. If you pass shuffle=False, then the output will be something like [[0,0,…, 1,1,…, 2,2, …]].
- Parameters:
freq (array of
int
) – Array of shape(nrep, size)
where nrep is the number of replicates andsize = self.shape[axis]
. freq is the weight that each sample contributes to resamples values. Seerandsamp_freq()
shuffle (
bool
, default:True
) – IfTrue
(default), shuffle values for each row.rng (
Generator
) – Random number generator object. Defaults to output ofdefault_rng()
.
- Returns:
ndarray
– Indices array of shape(nrep, nsamp)
wherensamp = freq[k, :].sum()
where k is any row.
- cmomy.resample.indices_to_freq(indices, ndat=None)[source]#
Convert indices to frequency array.
It is assumed that
indices.shape == (nrep, nsamp)
withnsamp == ndat
. For cases thatnsamp != ndat
, pass inndat
.
- cmomy.resample.random_indices(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#
Create indices for random resampling (bootstrapping).
- Parameters:
nrep (
int
) – Number of resample replicates.ndat (
int
) – Size of data along resampled axis.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.rng (
Generator
) – Random number generator object. Defaults to output ofdefault_rng()
.replace (bool, default:
True
) – Whether to allow replacement.
- Returns:
indices (
ndarray
) – Index array of integers of shape(nrep, nsamp)
.
- cmomy.resample.random_freq(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#
Create frequencies for random resampling (bootstrapping).
- Parameters:
nrep (
int
) – Number of resample replicates.ndat (
int
) – Size of data along resampled axis.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.rng (
Generator
) – Random number generator object. Defaults to output ofdefault_rng()
.replace (bool, default:
True
) – Whether to allow replacement.
- Returns:
freq (
ndarray
) – Frequency array.freq[rep, k]
is the number of times to sample from the k`th observation for replicate `rep.
See also
- cmomy.resample.select_ndat(data, *, axis=MISSING, dim=MISSING, mom_ndim=None)[source]#
Determine ndat from array.
- Parameters:
- Returns:
int
– size ofdata
along specifiedaxis
ordim
Examples
>>> data = np.zeros((2, 3, 4)) >>> select_ndat(data, axis=1) 3 >>> select_ndat(data, axis=-1, mom_ndim=2) 2
>>> xdata = xr.DataArray(data, dims=["x", "y", "mom"]) >>> select_ndat(xdata, dim="y") 3 >>> select_ndat(xdata, dim="mom", mom_ndim=1) Traceback (most recent call last): ... ValueError: Cannot select moment dimension. axis=2, dim='mom'.
- cmomy.resample.randsamp_freq(*, ndat=None, nrep=None, nsamp=None, indices=None, freq=None, data=None, axis=MISSING, dim=MISSING, mom_ndim=None, check=False, rng=None)[source]#
Convenience function to create frequency table for resampling.
In order, the return will be one of
freq
, frequencies fromindices
or new sample fromrandom_freq()
.- Parameters:
ndat (
int
) – Size of data along resampled axis.nrep (
int
) – Number of resample replicates.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.freq (array of
int
) – Array of shape(nrep, size)
where nrep is the number of replicates andsize = self.shape[axis]
. freq is the weight that each sample contributes to resamples values. Seerandsamp_freq()
indices (array of
int
) – Array of shape(nrep, size)
. If passed, create freq from indices. Seerandsamp_freq()
.check (
bool
, defaultFalse
) – if check is True, then check freq and indices against ndat and nreprng (
Generator
) – Random number generator object. Defaults to output ofdefault_rng()
.axis (
int
) – Axis to reduce along.dim (hashable) – Dimension to reduce along.
mom_ndim (
int
, optional) – If specified, then treatdata
as a moments array, and wrap negative values foraxis
relative to value dimensions only.
Notes
If
ndat
isNone
, attempt to setndat
usingndat = select_ndat(data, axis=axis, dim=dim, mom_ndim=mom_ndim)
. Seeselect_ndat()
.- Returns:
freq (
ndarray
) – Frequency array.
See also
Examples
>>> import cmomy >>> rng = cmomy.random.default_rng(0) >>> randsamp_freq(ndat=3, nrep=5, rng=rng) array([[0, 2, 1], [3, 0, 0], [3, 0, 0], [0, 1, 2], [0, 2, 1]])
Create from data and axis
>>> data = np.zeros((2, 3, 5)) >>> freq = randsamp_freq(data=data, axis=-1, mom_ndim=1, nrep=5, rng=rng) >>> freq array([[0, 2, 1], [1, 1, 1], [1, 0, 2], [0, 2, 1], [1, 0, 2]])
This can also be used to convert from indices to freq array
>>> indices = freq_to_indices(freq) >>> randsamp_freq(data=data, axis=-1, mom_ndim=1, indices=indices) array([[0, 2, 1], [1, 1, 1], [1, 0, 2], [0, 2, 1], [1, 0, 2]])
- cmomy.resample.resample_data(data, *, mom_ndim, freq, axis=MISSING, dim=MISSING, rep_dim='rep', order=None, parallel=True, dtype=None, out=None, keep_attrs=None)[source]#
Resample data according to frequency table.
- Parameters:
data (array-like) – central mom array to be resampled
mom_ndim (
{1, 2}
) – Value indicates if moments (mom_ndim = 1
) or comoments (mom_ndim=2
).freq (array of
int
) – Array of shape(nrep, size)
where nrep is the number of replicates andsize = self.shape[axis]
. freq is the weight that each sample contributes to resamples values. Seerandsamp_freq()
axis (
int
) – Axis to reduce along.dim (hashable) – Dimension to reduce along.
rep_dim (hashable) – Name of new ‘replicated’ dimension:
order (
{"C", "F", "A", "K"}
, optional) – Order argument tonumpy.asarray()
.out (
ndarray
) – Optional output array. If specified, output will be a reference to this array.keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
- Returns:
out (
ndarray
) – Resampled central moments.out.shape = (..., shape[axis-1], shape[axis+1], ..., nrep, mom0, ...)
, whereshape = data.shape
andnrep = freq.shape[0]
.
See also
- cmomy.resample.resample_vals(x, *y, mom, freq, weight=None, axis=MISSING, order=None, parallel=None, dtype=None, out=None, dim=MISSING, rep_dim='rep', mom_dims=None, keep_attrs=None)[source]#
Resample data according to frequency table.
- Parameters:
x (
ndarray
) – Value to analyze*y (array-like, optional) – Second value needed if len(mom)==2.
freq (array of
int
) – Array of shape(nrep, size)
where nrep is the number of replicates andsize = self.shape[axis]
. freq is the weight that each sample contributes to resamples values. Seerandsamp_freq()
mom (
int
ortuple
ofint
) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variablesweight (array-like, optional) – Optional weights. Can be scalar, 1d array of length
args[0].shape[axis]
or array of same form asargs[0]
.axis (
int
) – Axis to reduce along.order (
{"C", "F", "A", "K"}
, optional) – Order argument tonumpy.asarray()
.out (
ndarray
) – Optional output array. If specified, output will be a reference to this array.dim (hashable) – Dimension to reduce along.
rep_dim (hashable) – Name of new ‘replicated’ dimension:
mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. Defaults to("mom_0",)
formom_ndim==1
and(mom_0, mom_1)
formom_ndim==2
keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
- Returns:
out (
ndarray
) – Resampled Central moments array.out.shape = (...,shape[axis-1], shape[axis+1], ..., nrep, mom0, ...)
whereshape = args[0].shape
. andnrep = freq.shape[0]
.
See also
- cmomy.resample.bootstrap_confidence_interval(distribution, stats_val='mean', axis=0, alpha=0.05, style=None, **kwargs)[source]#
Calculate the error bounds.
- Parameters:
distribution (array-like) – distribution of values to consider
stats_val (array-like,
{None, 'mean','median'}
, optional) –array: perform pivotal error bounds (correct) with this as value.
percentile: percentiles, with value as median
mean: pivotal error bounds with mean as value
median: pivotal error bounds with median as value
axis (
int
, default0
) – axis to analyze alongalpha (
float
) – alpha value for confidence interval. Percent confidence = 100 * (1 - alpha)style (
{None, 'delta', 'pm'}
) – controls style of output**kwargs (Any) – extra arguments to numpy.percentile
- Returns:
out (array) – fist dimension will be statistics. Other dimensions have shape of input less axis reduced over. Depending on style first dimension will be (note val is either stats_val or median):
None: [val, low, high]
delta: [val, val-low, high - val]
pm : [val, (high - low) / 2]
- cmomy.resample.xbootstrap_confidence_interval(x, stats_val='mean', axis=0, dim=MISSING, alpha=0.05, style=None, bootstrap_dim='bootstrap', bootstrap_coords=None, **kwargs)[source]#
Bootstrap xarray object.
- Parameters:
dim (
str
) – if passed, use reduce along this dimensionbootstrap_dim (
str
, default'bootstrap'
) – name of new dimension. If bootstrap_dim conflicts, then new_name = dim + new_namebootstrap_coords (array-like or
str
) – coords of new dimension. If None, use default names If string, use this for the ‘values’ name