Submodule to work with grouped data.
Functions:
|
Get group by array for block reduction. |
|
Factor by to codes and groups. |
|
Transform group_idx to quantities to be used with |
|
Reduce data by group. |
|
Reduce data by index |
|
Resample using indexed reduction. |
- cmomy.grouped.block_by(ndat, block, mode='drop_last')[source]#
Get group by array for block reduction.
- Parameters:
ndat (
int
) – Size ofby
.block (
int
) – Block size. Negative values is a single block.mode (
{"drop_first", "drop_last", "expand_first", "expand_last"}
) –What to do if ndat does not divide equally by
block
.”drop_first” : drop first samples
”drop_last” : drop last samples
”expand_first”: expand first block size
”expand_last”: expand last block size
- Returns:
by (
ndarray
) – Group array for block reduction.
See also
Examples
>>> block_by(5, 2) array([ 0, 0, 1, 1, -1])
>>> block_by(5, 2, mode="drop_first") array([-1, 0, 0, 1, 1])
>>> block_by(5, 2, mode="expand_first") array([0, 0, 0, 1, 1])
>>> block_by(5, 2, mode="expand_last") array([0, 0, 1, 1, 1])
- cmomy.grouped.factor_by(by, sort=True)[source]#
Factor by to codes and groups.
- Parameters:
by (sequence) – Values to group by. Negative or
None
values indicate to skip this value. Note that ifby
is a pandaspandas.Index
object, missing values should be marked withNone
only.sort (
bool
, defaultTrue
) – IfTrue
(default), sortgroups
. IfFalse
, return groups in order of first appearance.
- Returns:
groups (
list
orpandas.Index
) – Unique group names (excluding negative orNone
Values.)
Examples
>>> by = [1, 1, 0, -1, 0, 2, 2] >>> groups, codes = factor_by(by, sort=False) >>> groups [1, 0, 2] >>> codes array([ 0, 0, 1, -1, 1, 2, 2])
Note that with sort=False, groups are in order of first appearance.
>>> groups, codes = factor_by(by) >>> groups [0, 1, 2] >>> codes array([ 1, 1, 0, -1, 0, 2, 2])
This also works for sequences of non-integers.
>>> by = ["a", "a", None, "c", "c", -1] >>> groups, codes = factor_by(by) >>> groups ['a', 'c'] >>> codes array([ 0, 0, -1, 1, 1, -1])
And for
pandas.Index
objects>>> import pandas as pd >>> by = pd.Index(["a", "a", None, "c", "c", None]) >>> groups, codes = factor_by(by)
>>> groups Index(['a', 'c'], dtype='object') >>> codes array([ 0, 0, -1, 1, 1, -1])
- cmomy.grouped.factor_by_to_index(by, **kwargs)[source]#
Transform group_idx to quantities to be used with
reduce_data_indexed()
.- Parameters:
by (array-like) – Values to factor.
**kwargs – Extra arguments to
numpy.argsort()
- Returns:
groups (
list
orpandas.Index
) – Unique groups in group_idx (excluding Negative orNone
values ingroup_idx
ifexclude_negative
isTrue
).index (
ndarray
) – Indexing array.index[start[k]:end[k]]
are the index with groupgroups[k]
.start (
ndarray
) – Seeindex
end (
ndarray
) – Seeindex
.
See also
Examples
>>> factor_by_to_index([0, 1, 0, 1]) ([0, 1], array([0, 2, 1, 3]), array([0, 2]), array([2, 4]))
>>> factor_by_to_index(["a", "b", "a", "b"]) (['a', 'b'], array([0, 2, 1, 3]), array([0, 2]), array([2, 4]))
Also, missing values (None or negative) are excluded:
>>> factor_by_to_index([None, "a", None, "b"]) (['a', 'b'], array([1, 3]), array([0, 1]), array([1, 2]))
You can also pass
pandas.Index
objects:>>> factor_by_to_index(pd.Index([None, "a", None, "b"], name="my_index")) (Index(['a', 'b'], dtype='object', name='my_index'), array([1, 3]), array([0, 1]), array([1, 2]))
- cmomy.grouped.reduce_data_grouped(data, by, *, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=False, group_dim=None, groups=None, keep_attrs=None, apply_ufunc_kwargs=None)[source]#
Reduce data by group.
- Parameters:
data (
ndarray
orDataArray
orDataset
) – Moments array(s). It is assumed moment dimensions are last.by (array-like of
int
) – Group by values of same length asdata
along sampled dimension. Negative values indicate no group (i.e., skip this index).axis (
int
, optional) – Axis to reduce/sample along. Note that negative values are relative todata.ndim - mom_ndim
. It is assumed that the last dimensions are for moments. For example, ifdata.shape == (1,2,3)
withmom_ndim=1
,axis = -1 `` would be equivalent to ``axis = 1
. Defaults toaxis=-1
.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Value indicates if moments (mom_ndim = 1
) or comoments (mom_ndim=2
). If not specified and data is anxarray
object attempt to infermom_ndim
frommom_dims
. Otherwise, default tomom_ndim = 1
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.out (
ndarray
) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns aDataset
, then this option is ignored.casting (
{'no', 'equiv', 'safe', 'same_kind', 'unsafe'}
, optional) –Controls what kind of data casting may occur.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ (default) means any data conversions may be done.
order (
{"C", "F"}
, optional) – Order argument. Seenumpy.zeros()
.axes_to_end (
bool
) – IfTrue
, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as inputaxis
and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.group_dim (
str
, optional) – Name of the output group dimension. Defaults todim
.groups (sequence, optional) – Sequence of length
by.max() + 1
to assign as coordinates forgroup_dim
.keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
apply_ufunc_kwargs (dict-like) – Extra parameters to
xarray.apply_ufunc()
. One useful option ison_missing_core_dim
, which can take the value"copy"
(the default),"raise"
, or"drop"
and controls what to do with variables of aDataset
missing core dimensions. Other options arejoin
,dataset_join
,dataset_fill_value
, anddask_gufunc_kwargs
. Unlisted options are handled internally.
- Returns:
out (
ndarray
orDataArray
orDataset
) – Reduced data of same type as inputdata
, with shapeout.shape = (..., shape[axis-1], ngroup, shape[axis+1], ..., mom0, ...)
whereshape = data.shape
and ngroups =by.max() + 1
.
See also
Examples
>>> import cmomy >>> data = np.ones((5, 3)) >>> by = [0, 0, -1, 1, -1] >>> reduce_data_grouped(data, mom_ndim=1, axis=0, by=by) array([[2., 1., 1.], [1., 1., 1.]])
This also works for
DataArray
objects. In this case, the groups are added as coordinates togroup_dim
>>> xout = xr.DataArray(data, dims=["rec", "mom"]) >>> reduce_data_grouped(xout, mom_ndim=1, dim="rec", by=by, group_dim="group") <xarray.DataArray (group: 2, mom: 3)> Size: 48B array([[2., 1., 1.], [1., 1., 1.]]) Dimensions without coordinates: group, mom
Note that if
by
skips some groups, they will still be included in The output. For example the followingby
skips the value 0.>>> by = [1, 1, -1, 2, 2] >>> reduce_data_grouped(xout, mom_ndim=1, dim="rec", by=by) <xarray.DataArray (rec: 3, mom: 3)> Size: 72B array([[0., 0., 0.], [2., 1., 1.], [2., 1., 1.]]) Dimensions without coordinates: rec, mom
If you want to ensure that only included groups are used, use
cmomy.grouped.factor_by()
. This has the added benefit of working with non integer groups as well>>> by = ["a", "a", None, "b", "b"] >>> groups, codes = cmomy.grouped.factor_by(by) >>> reduce_data_grouped(xout, mom_ndim=1, dim="rec", by=codes, groups=groups) <xarray.DataArray (rec: 2, mom: 3)> Size: 48B array([[2., 1., 1.], [2., 1., 1.]]) Coordinates: * rec (rec) <U1 8B 'a' 'b' Dimensions without coordinates: mom
- cmomy.grouped.reduce_data_indexed(data, *, index, group_start, group_end, scale=None, axis=MISSING, dim=MISSING, mom_ndim=None, mom_axes=None, mom_dims=None, mom_params=None, out=None, dtype=None, casting='same_kind', order=None, parallel=None, axes_to_end=False, coords_policy='first', group_dim=None, groups=None, keep_attrs=None, apply_ufunc_kwargs=None)[source]#
Reduce data by index
- Parameters:
data (
ndarray
orDataArray
orDataset
) – Moments array(s). It is assumed moment dimensions are last.index (
ndarray
) – Index into data.shape[axis].group_start (
ndarray
) – Start, end of index for a group.index[group_start[group]:group_end[group]]
are the indices for groupgroup
.group_end (
ndarray
) – Start, end of index for a group.index[group_start[group]:group_end[group]]
are the indices for groupgroup
.scale (
ndarray
, optional) – Weights of same size asindex
.axis (
int
, optional) – Axis to reduce/sample along. Note that negative values are relative todata.ndim - mom_ndim
. It is assumed that the last dimensions are for moments. For example, ifdata.shape == (1,2,3)
withmom_ndim=1
,axis = -1 `` would be equivalent to ``axis = 1
. Defaults toaxis=-1
.dim (hashable) – Dimension to reduce/sample along.
mom_ndim (
{1, 2}
, optional) – Value indicates if moments (mom_ndim = 1
) or comoments (mom_ndim=2
). If not specified and data is anxarray
object attempt to infermom_ndim
frommom_dims
. Otherwise, default tomom_ndim = 1
.mom_axes (
int
ortuple
ofint
, optional) – Location of the moment dimensions. Default to(-mom_ndim, -mom_ndim+1, ...)
. If specified andmom_ndim
is None, setmom_ndim
tolen(mom_axes)
. Note that ifmom_axes
is specified, negative values are relative to the end of the array. This is also the case foraxes
ifmom_axes
is specified.mom_dims (hashable or
tuple
of hashable) – Name of moment dimensions. If specified, infermom_ndim
frommom_dims
. If also passmom_ndim
, check thatmom_dims
is consistent withmom_dims
. If not specified, defaults todata.dims[-mom_ndim:]
. This is primarily used ifdata
is aDataset
, or ifmom_dims
are not the last dimensions.mom_params (
cmomy.MomParams
orcmomy.MomParamsDict
ordict
, optional) – Moment parameters. You can set moment parametersaxes
anddims
using this option. For example, passingmom_params={"dim": ("a", "b")}
is equivalent to passingmom_dims=("a", "b")
. You can also pass as acmomy.MomParams
object withmom_params=cmomy.MomParams(dims=("a", "b"))
.out (
ndarray
) – Optional output array. If specified, output will be a reference to this array. Note that if the output if method returns aDataset
, then this option is ignored.casting (
{'no', 'equiv', 'safe', 'same_kind', 'unsafe'}
, optional) –Controls what kind of data casting may occur.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ (default) means any data conversions may be done.
order (
{"C", "F", "A", "K"}
, optional) – Order argument. Seenumpy.asarray()
.parallel (
bool
, optional) – IfTrue
, use parallel numbanumba.njit
ornumba.guvectorized
code if possible. IfNone
, use a heuristic to determine if should attempt to use parallel method.axes_to_end (
bool
) – IfTrue
, place sampled dimension (if exists in output) and moment dimensions at end of output. Otherwise, place sampled dimension (if exists in output) at same position as inputaxis
and moment dimensions at same position as input (if input does not contain moment dimensions, place them at end of array).coords_policy (
{'first', 'last', 'group', None}
) –Policy for handling coordinates along
dim
ifby
is specified forDataArray
data. If no coordinates do nothing, otherwise use:’first’: select first value of coordinate for each block.
’last’: select last value of coordinate for each block.
’group’: Assign unique groups from
group_idx
todim
None: drop any coordinates.
Note that if
coords_policy
is one offirst
orlast
, parametergroups
will be ignored.group_dim (
str
, optional) – Name of the output group dimension. Defaults todim
.groups (sequence, optional) – Sequence of length
by.max() + 1
to assign as coordinates forgroup_dim
.keep_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orbool
, optional) –‘drop’ or False: empty attrs on returned xarray object.
’identical’: all attrs must be the same on every object.
’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.
’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.
’override’ or True: skip comparing and copy attrs from the first object to the result.
apply_ufunc_kwargs (dict-like) – Extra parameters to
xarray.apply_ufunc()
. One useful option ison_missing_core_dim
, which can take the value"copy"
(the default),"raise"
, or"drop"
and controls what to do with variables of aDataset
missing core dimensions. Other options arejoin
,dataset_join
,dataset_fill_value
, anddask_gufunc_kwargs
. Unlisted options are handled internally.
- Returns:
out (
ndarray
orDataArray
orDataset
) – Reduced data of same type as inputdata
, with shapeout.shape = (..., shape[axis-1], ngroup, shape[axis+1], ..., mom0, ...)
, whereshape = data.shape
andngroup = len(group_start)
.
See also
Examples
This is a more general reduction than
reduce_data_grouped()
, but it can be used similarly.>>> import cmomy >>> data = np.ones((5, 3)) >>> by = ["a", "a", "b", "b", "c"] >>> groups, index, start, end = cmomy.grouped.factor_by_to_index(by) >>> reduce_data_indexed( ... data, mom_ndim=1, axis=0, index=index, group_start=start, group_end=end ... ) array([[2., 1., 1.], [2., 1., 1.], [1., 1., 1.]])
This also works for
DataArray
objects>>> xout = xr.DataArray(data, dims=["rec", "mom"]) >>> reduce_data_indexed( ... xout, ... mom_ndim=1, ... dim="rec", ... index=index, ... group_start=start, ... group_end=end, ... group_dim="group", ... groups=groups, ... coords_policy="group", ... ) <xarray.DataArray (group: 3, mom: 3)> Size: 72B array([[2., 1., 1.], [2., 1., 1.], [1., 1., 1.]]) Coordinates: * group (group) <U1 12B 'a' 'b' 'c' Dimensions without coordinates: mom
- cmomy.grouped.resample_data_indexed(data, sampler, *, mom_ndim=None, axis=MISSING, mom_axes=None, out=None, dtype=None, casting='same_kind', order=None, parallel=True, axes_to_end=False, dim=MISSING, mom_dims=None, coords_policy='first', rep_dim='rep', groups=None, keep_attrs=None)[source]#
Resample using indexed reduction.