Routine to perform resampling (cmomy.resample
)#
Classes:
|
Wrapper around indices and freq resample arrays |
Functions:
|
Factory method to create sampler. |
|
Convert a frequency array to indices array. |
|
Convert indices to frequency array. |
|
Perform jackknife resample and moments data. |
|
Frequency array for jackknife resampling. |
|
Jackknife by value. |
|
Create frequencies for random resampling (bootstrapping). |
|
Create indices for random resampling (bootstrapping). |
|
Resample and reduce data. |
|
Resample and reduce values. |
|
Determine ndat from array. |
- class cmomy.resample.IndexSampler(*, indices=None, freq=None, ndat=None, parallel=None, shuffle=False, rng=None, fastpath=False)[source]#
Bases:
Generic
[SamplerArrayT
]Wrapper around indices and freq resample arrays
This a convenience wrapper class to make working with resampling indices straightforward.
cmomy
primarily performs resampling using frequency tables instead of the more standard resampling indices arrays. This class keeps track of both.- Parameters:
indices (
ndarray
,DataArray
, orDataset
) – Indices resampling array.freq (
ndarray
,DataArray
, orDataset
) – Frequency resampling table.ndat (
int
) – Size of data along resampled axis.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.shuffle (
bool
) – IfTrue
, shuffleindices
created fromfreq
for each row.rng (
Union
[int
,Sequence
[int
],SeedSequence
,BitGenerator
,Generator
,None
], default:None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedfastpath (
bool
) – Internal variable.
Methods:
from_params
(nrep, ndat[, nsamp, rng, ...])Create sampler from parameters
from_data
(data, *, nrep[, nsamp, axis, dim, ...])Create sampler for
data
.- classmethod from_params(nrep, ndat, nsamp=None, rng=None, replace=True, parallel=None)[source]#
Create sampler from parameters
- Parameters:
nrep (
int
) – Number of resample replicates.ndat (
int
) – Size of data along resampled axis.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.rng (
Union
[int
,Sequence
[int
],SeedSequence
,BitGenerator
,Generator
,None
], default:None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedresample_replace (
bool
) – If True, do resampling with replacement.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.
- Returns:
resample (
IndexSampler
) – Wrapped object will be anndarray
of integers.
- classmethod from_data(data, *, nrep, nsamp=None, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, rep_dim='rep', paired=True, rng=None, replace=True, parallel=None)[source]#
Create sampler for
data
.- Parameters:
nrep (
int
) – Number of resample replicates.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.axis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Ifmom_ndim
is notNone
, then wrap axis relative tomom_ndim
. For Example, with mom_ndim=``2``,axis = -1
will be transformed toaxis = -3
. Ifmom_dims
is passed and data is anxarray
object, infermom_n=ndim
frommom_dims
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.rep_dim (hashable) – Name of new ‘replicated’ dimension:
paired (
bool
) – IfFalse
and generatingfreq
fromnrep
withdata
of typeDataset
, Generate uniquefreq
for each variable indata
. IfTrue
, treat all variables indata
as paired, and use samefreq
for each.rng (RngTypes | None, default:
None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedresample_replace (
bool
) – If True, do resampling with replacement.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.
- Returns:
sampler (
IndexSampler
) – Type of wrapped array depends on the passed parameters. In all cases, ifdata
is an array,sampler
will wrap an array, ifdata
is anDataArray
,sampler
will wrap anDataArray
. Ifdata
is anDataset
, return a wrappedDataArray
ifpaired=True
or if the resulting Dataset has only one variable, and aDataset
otherwise.
- cmomy.resample.factory_sampler(sampler=None, *, freq=None, indices=None, nrep=None, ndat=None, nsamp=None, paired=True, rng=None, replace=True, shuffle=False, data=None, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, rep_dim='rep', parallel=None)[source]#
Factory method to create sampler.
The main intent of the function is to be called by other functions/method that need a sampler. For example, it is used in .resample_data. You can pass in a frequency array, an
IndexSampler
, or a mapping to create anIndexSampler
. The order of evaluation is as follows:sampler
is aIndexSampler
: returnsampler
.sampler
isNone
:if specify
ndat
: returnIndexSampler.from_param(...)
if specify
data
: returnIndexSampler.from_data(...)
sampler
is array-like: returnIndexSampler(freq=sampler, ...)
sampler
is an int, returnIndexSampler.from_data(..., nrep=sampler)
sampler
is a mapping: returnfactory_sampler(**sampler, data=data, axis=axis, dim=dims, mom_ndim=mom_ndim, mom_dims=mom_dims, rep_dim=rep_dim)
.
- Parameters:
sampler (
int
or array-like orIndexSampler
or mapping) – Passed throughresample.factory_sampler()
to create anIndexSampler
. Value can either benrep
(the number of replicates),freq
(frequency array), aIndexSampler
object, or a mapping of parameters. The mapping can have form ofFactoryIndexSamplerKwargs
. Allowable keys arefreq
,indices
,ndat
,nrep
,nsamp
,paired
,rng
,replace
,shuffle
.freq (array-like,
DataArray
, orDataset
ofint
) – Array of shape(nrep, size)
where nrep is the number of replicates andsize = self.shape[axis]
. freq is the weight that each sample contributes to a replicate. Iffreq
is anxarray
object, it should have dimensionsrep_dim
anddim
.indices (array of
int
) – Array of shape(nrep, size)
. If passed, create freq from indices.nrep (
int
) – Number of resample replicates.ndat (
int
) – Size of data along resampled axis.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.paired (
bool
) – IfFalse
and generatingfreq
fromnrep
withdata
of typeDataset
, Generate uniquefreq
for each variable indata
. IfTrue
, treat all variables indata
as paired, and use samefreq
for each.rng (RngTypes | None, default:
None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedresample_replace (
bool
) – If True, do resampling with replacement.shuffle (
bool
)data (array-like) – If needed, extract
ndat
from data. Also used ifpaired = True
.axis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Ifmom_ndim
is notNone
, then wrap axis relative tomom_ndim
. For Example, with mom_ndim=``2``,axis = -1
will be transformed toaxis = -3
. Ifmom_dims
is passed and data is anxarray
object, infermom_n=ndim
frommom_dims
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.rep_dim (hashable) – Name of new ‘replicated’ dimension:
parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.
- Returns:
Examples
>>> a = factory_sampler(nrep=3, ndat=2, rng=0)
>>> b = factory_sampler(dict(nrep=3, ndat=2, rng=0)) >>> c = factory_sampler(dict(freq=a.freq)) >>> d = factory_sampler(a) >>> for other in [b, c, d]: ... np.testing.assert_equal(a.freq, other.freq) >>> assert d is a
To instead just pass indices, use:
>>> e = factory_sampler(dict(indices=a.indices)) >>> assert a.indices is e.indices
- cmomy.resample.freq_to_indices(freq, *, shuffle=False, rng=None, parallel=None)[source]#
Convert a frequency array to indices array.
This creates an “indices” array that is compatible with “freq” array. Note that by default, the indices for a single sample (along output[k, :]) are in sorted order (something like [[0, 0, …, 1, 1, …], …]). Pass
shuffle = True
to randomly shuffle indices alongaxis=1
.- Parameters:
freq (array-like,
DataArray
, orDataset
ofint
) – Array of shape(nrep, size)
where nrep is the number of replicates andsize = self.shape[axis]
. freq is the weight that each sample contributes to a replicate. Iffreq
is anxarray
object, it should have dimensionsrep_dim
anddim
.shuffle (
bool
) – IfTrue
, shuffleindices
created fromfreq
for each row.rng (
Union
[int
,Sequence
[int
],SeedSequence
,BitGenerator
,Generator
,None
], default:None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedparallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.
- Returns:
ndarray
– Indices array of shape(nrep, nsamp)
wherensamp = freq[k, :].sum()
where k is any row.
- cmomy.resample.indices_to_freq(indices, *, ndat=None, parallel=None)[source]#
Convert indices to frequency array.
It is assumed that
indices.shape == (nrep, nsamp)
withnsamp == ndat
. For cases thatnsamp != ndat
, pass inndat
explicitly.
- cmomy.resample.jackknife_data(data, data_reduced=None, *, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_axes_reduced=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=False, keep_attrs=None, apply_ufunc_kwargs=None)[source]#
Perform jackknife resample and moments data.
This uses moments addition/subtraction to speed up jackknife resampling.
- Parameters:
data (
ndarray
orDataArray
orDataset
) – Moments array(s). It is assumed moment dimensions are last.data_reduced (array-like or
DataArray
, optional) –data
reduced alongaxis
ordim
. This will be calculated usingreduce_data()
if not passed.axis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Value indicates if moments (mom_ndim = 1
) or comoments (mom_ndim=2
). If not specified and data is anxarray
object attempt to infermom_ndim
frommom_dims
. Otherwise, default tomom_ndim = 1
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_axes_reduced (
int
or sequence ofint
) – Location(s) of moment dimensions indata_reduced
. This option is only needed ifdata_reduced
is passed in and is an array. Defaults tomom_axes
, or last dimensions ofdata_reduced
.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.rep_dim (hashable) – Name of new ‘replicated’ dimension:
out (
ndarray
) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns aDataset
, then this option is ignored.casting (
{'no', 'equiv', 'safe', 'same_kind', 'unsafe'}
, optional) –Controls what kind of data casting may occur.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ (default) means any data conversions may be done.
order (
{"C", "F", "A", "K"}
, optional) – Order argument. Seenumpy.asarray()
.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.axes_to_end (
bool
) – IfTrue
, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as inputaxis
and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
apply_ufunc_kwargs (dict-like) – Extra parameters to
xarray.apply_ufunc()
. One useful option ison_missing_core_dim
, which can take the value"copy"
(the default),"raise"
, or"drop"
and controls what to do with variables of aDataset
missing core dimensions. Other options arejoin
,dataset_join
,dataset_fill_value
, anddask_gufunc_kwargs
. Unlisted options are handled internally.
- Returns:
out (
ndarray
orDataArray
) – Jackknife resampled alongaxis
. That is,out[...,axis=i, ...]
isreduced_data(out[...,axis=[...,i-1,i+1,...], ...])
.
Examples
>>> import cmomy >>> data = cmomy.default_rng(0).random((4, 3)) >>> out_jackknife = jackknife_data(data, mom_ndim=1, axis=0) >>> out_jackknife array([[1.5582, 0.7822, 0.2247], [2.1787, 0.6322, 0.22 ], [1.5886, 0.5969, 0.0991], [1.2601, 0.4982, 0.3478]])
Note that this is equivalent to (but typically faster than) resampling with a frequency table from :func:
cmomy.resample.jackknife_freq
>>> freq = cmomy.resample.jackknife_freq(4) >>> resample_data(data, sampler=dict(freq=freq), mom_ndim=1, axis=0) array([[1.5582, 0.7822, 0.2247], [2.1787, 0.6322, 0.22 ], [1.5886, 0.5969, 0.0991], [1.2601, 0.4982, 0.3478]])
To speed up the calculation even further, pass in
data_reduced
>>> data_reduced = cmomy.reduce_data(data, mom_ndim=1, axis=0) >>> jackknife_data(data, mom_ndim=1, axis=0, data_reduced=data_reduced) array([[1.5582, 0.7822, 0.2247], [2.1787, 0.6322, 0.22 ], [1.5886, 0.5969, 0.0991], [1.2601, 0.4982, 0.3478]])
Also works with
DataArray
objects>>> xdata = xr.DataArray(data, dims=["samp", "mom"]) >>> jackknife_data(xdata, mom_ndim=1, dim="samp", rep_dim="jackknife") <xarray.DataArray (jackknife: 4, mom: 3)> Size: 96B array([[1.5582, 0.7822, 0.2247], [2.1787, 0.6322, 0.22 ], [1.5886, 0.5969, 0.0991], [1.2601, 0.4982, 0.3478]]) Dimensions without coordinates: jackknife, mom
- cmomy.resample.jackknife_freq(ndat)[source]#
Frequency array for jackknife resampling.
Use this frequency array to perform jackknife [1] resampling
- Parameters:
ndat (
int
) – Size of data along resampled axis.- Returns:
freq (
ndarray
) – Frequency array for jackknife resampling.
References
Examples
>>> jackknife_freq(4) array([[0, 1, 1, 1], [1, 0, 1, 1], [1, 1, 0, 1], [1, 1, 1, 0]])
- cmomy.resample.jackknife_vals(x, *y, data_reduced=None, mom, axis=MISSING, dim=MISSING, weight=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=True, keep_attrs=None, apply_ufunc_kwargs=None)[source]#
Jackknife by value.
- Parameters:
x (array-like or
DataArray
orDataset
) – Values to reduce.*y (array-like or
DataArray
orDataset
) – Additional values (needed iflen(mom)==2
).y
has same type restrictions and broadcasting rules asweight
.data_reduced (array-like or
DataArray
, optional) –data
reduced alongaxis
ordim
. This will be calculated usingreduce_vals()
if not passed. Same type restrictions asweight
.mom (
int
ortuple
ofint
) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variablesaxis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
weight (array-like or
DataArray
orDataset
) –Optional weight. The type of
weight
must be “less than” the type ofx
.x
isDataset
:weight
can be aDataset
,DataArray
, or array-likex
is array-like:weight
can be array-like
In the case that
weight
is array-like, it must broadcast tox
using usual broadcasting rules (seenumpy.broadcast_to()
), with the following exceptions: Ifweight
is a 1d array of lengthx.shape[axis]]
, it will be formatted to broadcast along the other dimensions ofx
. For example, ifx
has shape(10, 2, 3)
andweight
has shape(10,)
, thenweight
will be converted to the broadcastable shape(10, 1, 1)
. Ifweight
is a scalar, it will be broadcast tox.shape
.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. Defaults to("mom_0",)
formom_ndim==1
and(mom_0, mom_1)
formom_ndim==2
mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.rep_dim (hashable) – Name of new ‘replicated’ dimension:
out (
ndarray
) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns aDataset
, then this option is ignored.casting (
{'no', 'equiv', 'safe', 'same_kind', 'unsafe'}
, optional) –Controls what kind of data casting may occur.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ (default) means any data conversions may be done.
order (
{"C", "F", "A", "K"}
, optional) – Order argument. Seenumpy.asarray()
.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.axes_to_end (
bool
) – IfTrue
, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as inputaxis
and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
apply_ufunc_kwargs (dict-like) – Extra parameters to
xarray.apply_ufunc()
. One useful option ison_missing_core_dim
, which can take the value"copy"
(the default),"raise"
, or"drop"
and controls what to do with variables of aDataset
missing core dimensions. Other options arejoin
,dataset_join
,dataset_fill_value
, anddask_gufunc_kwargs
. Unlisted options are handled internally.
- Returns:
out (
ndarray
orDataArray
) – Resampled Central moments array.out.shape = (...,shape[axis-1], shape[axis+1], ..., shape[axis], mom0, ...)
whereshape = x.shape
. That is, the resampled dimension is moved to the end, just before the moment dimensions.
Notes
Note that the resampled axis (
resamp_axis
) is at position-(len(mom) + 1)
, just before the moment axes. This is opposed to the behavior of resampling moments arrays (e.g., func:.resample_data), where the resampled axis is the same as the argumentaxis
. This is because the shape of the output array when resampling values is dependent the result of broadcastingx
andy
andweight
.
- cmomy.resample.random_freq(nrep, ndat, nsamp=None, rng=None, replace=True, parallel=None)[source]#
Create frequencies for random resampling (bootstrapping).
- Parameters:
nrep (
int
) – Number of resample replicates.ndat (
int
) – Size of data along resampled axis.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.rng (
Union
[int
,Sequence
[int
],SeedSequence
,BitGenerator
,Generator
,None
], default:None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedreplace (
bool
, default:True
) – Whether to allow replacement.parallel (
bool
|None
, default:None
) – The description is missing.
- Returns:
freq (
ndarray
) – Frequency array.freq[rep, k]
is the number of times to sample from the k`th observation for replicate `rep.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.
See also
- cmomy.resample.random_indices(nrep, ndat, nsamp=None, rng=None, replace=True)[source]#
Create indices for random resampling (bootstrapping).
- Parameters:
nrep (
int
) – Number of resample replicates.ndat (
int
) – Size of data along resampled axis.nsamp (
int
) – Number of samples in a single resampled replicate. Defaults to size of data along sampled axis.rng (
Union
[int
,Sequence
[int
],SeedSequence
,BitGenerator
,Generator
,None
], default:None
) – Random number generator object. Defaults to output ofdefault_rng()
. If pass in a seed value, create a newGenerator
object with this seedreplace (
bool
, default:True
) – Whether to allow replacement.
- Returns:
indices (
ndarray
) – Index array of integers of shape(nrep, nsamp)
.
- cmomy.resample.resample_data(data, *, sampler, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=False, keep_attrs=None, apply_ufunc_kwargs=None)[source]#
Resample and reduce data.
- Parameters:
data (
ndarray
orDataArray
orDataset
) – Moments array(s). It is assumed moment dimensions are last.sampler (
int
or array-like orIndexSampler
or mapping) – Passed throughresample.factory_sampler()
to create anIndexSampler
. Value can either benrep
(the number of replicates),freq
(frequency array), aIndexSampler
object, or a mapping of parameters. The mapping can have form ofFactoryIndexSamplerKwargs
. Allowable keys arefreq
,indices
,ndat
,nrep
,nsamp
,paired
,rng
,replace
,shuffle
.axis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Value indicates if moments (mom_ndim = 1
) or comoments (mom_ndim=2
). If not specified and data is anxarray
object attempt to infermom_ndim
frommom_dims
. Otherwise, default tomom_ndim = 1
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.rep_dim (hashable) – Name of new ‘replicated’ dimension:
out (
ndarray
) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns aDataset
, then this option is ignored.casting (
{'no', 'equiv', 'safe', 'same_kind', 'unsafe'}
, optional) –Controls what kind of data casting may occur.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ (default) means any data conversions may be done.
order (
{"C", "F", "A", "K"}
, optional) – Order argument. Seenumpy.asarray()
.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.axes_to_end (
bool
) – IfTrue
, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as inputaxis
and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
apply_ufunc_kwargs (dict-like) – Extra parameters to
xarray.apply_ufunc()
. One useful option ison_missing_core_dim
, which can take the value"copy"
(the default),"raise"
, or"drop"
and controls what to do with variables of aDataset
missing core dimensions. Other options arejoin
,dataset_join
,dataset_fill_value
, anddask_gufunc_kwargs
. Unlisted options are handled internally.
- Returns:
out (
ndarray
orDataArray
) – Resampled central moments.out.shape = (..., shape[axis-1], nrep, shape[axis+1], ...)
, whereshape = data.shape
andnrep = sampler.nrep
.
See also
- cmomy.resample.resample_vals(x, *y, sampler, mom, axis=MISSING, dim=MISSING, weight=None, mom_dims=None, mom_params=None, rep_dim='rep', out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=True, keep_attrs=None, apply_ufunc_kwargs=None)[source]#
Resample and reduce values.
- Parameters:
x (array-like or
DataArray
orDataset
) – Values to reduce.*y (array-like or
DataArray
orDataset
) – Additional values (needed iflen(mom)==2
).y
has same type restrictions and broadcasting rules asweight
.sampler (
int
or array-like orIndexSampler
or mapping) – Passed throughresample.factory_sampler()
to create anIndexSampler
. Value can either benrep
(the number of replicates),freq
(frequency array), aIndexSampler
object, or a mapping of parameters. The mapping can have form ofFactoryIndexSamplerKwargs
. Allowable keys arefreq
,indices
,ndat
,nrep
,nsamp
,paired
,rng
,replace
,shuffle
.mom (
int
ortuple
ofint
) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variablesaxis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
weight (array-like or
DataArray
orDataset
) –Optional weight. The type of
weight
must be “less than” the type ofx
.x
isDataset
:weight
can be aDataset
,DataArray
, or array-likex
is array-like:weight
can be array-like
In the case that
weight
is array-like, it must broadcast tox
using usual broadcasting rules (seenumpy.broadcast_to()
), with the following exceptions: Ifweight
is a 1d array of lengthx.shape[axis]]
, it will be formatted to broadcast along the other dimensions ofx
. For example, ifx
has shape(10, 2, 3)
andweight
has shape(10,)
, thenweight
will be converted to the broadcastable shape(10, 1, 1)
. Ifweight
is a scalar, it will be broadcast tox.shape
.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. Defaults to("mom_0",)
formom_ndim==1
and(mom_0, mom_1)
formom_ndim==2
mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.rep_dim (hashable) – Name of new ‘replicated’ dimension:
out (
ndarray
) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns aDataset
, then this option is ignored.casting (
{'no', 'equiv', 'safe', 'same_kind', 'unsafe'}
, optional) –Controls what kind of data casting may occur.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ (default) means any data conversions may be done.
order (
{"C", "F"}
, optional) – Order argument. Seenumpy.zeros()
.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.axes_to_end (
bool
) – IfTrue
, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as inputaxis
and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
apply_ufunc_kwargs (dict-like) – Extra parameters to
xarray.apply_ufunc()
. One useful option ison_missing_core_dim
, which can take the value"copy"
(the default),"raise"
, or"drop"
and controls what to do with variables of aDataset
missing core dimensions. Other options arejoin
,dataset_join
,dataset_fill_value
, anddask_gufunc_kwargs
. Unlisted options are handled internally.
- Returns:
out (
ndarray
orDataArray
) – Resampled Central moments array.out.shape = (...,shape[axis-1], nrep, shape[axis+1], ...)
whereshape = x.shape
. andnrep = sampler.nrep
. This can be overridden by setting axes_to_end.
Notes
Note that the resampled axis (
resamp_axis
) is at position-(len(mom) + 1)
, just before the moment axes. This is opposed to the behavior of resampling moments arrays (e.g., func:.resample_data), where the resampled axis is the same as the argumentaxis
. This is because the shape of the output array when resampling values is dependent the result of broadcastingx
andy
andweight
.See also
- cmomy.resample.select_ndat(data, *, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None)[source]#
Determine ndat from array.
- Parameters:
axis (
int
) – Axis to reduce/sample along.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Ifmom_ndim
is notNone
, then wrap axis relative tomom_ndim
. For Example, with mom_ndim=``2``,axis = -1
will be transformed toaxis = -3
. Ifmom_dims
is passed and data is anxarray
object, infermom_n=ndim
frommom_dims
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.
- Returns:
int
– size ofdata
along specifiedaxis
ordim
Examples
>>> data = np.zeros((2, 3, 4)) >>> select_ndat(data, axis=1) 3
To wrap relative to the last
mom_ndim
dimensions ofdata
, use complex axes>>> select_ndat(data, axis=-1j, mom_ndim=2) 2
>>> xdata = xr.DataArray(data, dims=["x", "y", "mom"]) >>> select_ndat(xdata, dim="y") 3 >>> select_ndat(xdata, dim="mom", mom_ndim=1) Traceback (most recent call last): ... ValueError: Cannot select moment dimension. dim='mom', axis=2.