Routines to perform central moments reduction (reduction)#

Functions:

reduce_vals(x, *y, mom[, weight, axis, ...])

Reduce values to central (co)moments.

reduce_data(data, *, mom_ndim[, dim, axis, ...])

Reduce central moments array along axis.

factor_by(by[, sort])

Factor by to codes and groups.

reduce_data_grouped(data, *, mom_ndim, by[, ...])

Reduce data by group.

factor_by_to_index(by)

Transform group_idx to quantities to be used with reduce_data_indexed().

reduce_data_indexed(data, *, mom_ndim, ...)

Reduce data by index

resample_data_indexed(data, freq, *, mom_ndim)

Resample using indexed reduction.

cmomy.reduction.reduce_vals(x, *y, mom, weight=None, axis=MISSING, order=None, parallel=None, dtype=None, out=None, dim=MISSING, mom_dims=None, keep_attrs=None)[source]#

Reduce values to central (co)moments.

Parameters:
  • x (ndarray or DataArray) – Values to analyze.

  • *y (array-like or DataArray) – Seconda value. Must specify if len(mom) == 2.

  • mom (int or tuple of int) – Order or moments. If integer or length one tuple, then moments are for a single variable. If length 2 tuple, then comoments of two variables

  • weight (scalar or array-like or DataArray) – Weights for each point.

  • axis (int) – Axis to reduce along.

  • order ({"C", "F", "A", "K"}, optional) – Order argument to numpy.asarray().

  • parallel (bool, default True) – flags to numba.njit

  • dtype (dtype) – Optional dtype for output data.

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array.

  • dim (hashable) – Dimension to reduce along.

  • mom_dims (hashable or tuple of hashable) – Name of moment dimensions. Defaults to ("mom_0",) for mom_ndim==1 and (mom_0, mom_1) for mom_ndim==2

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

Returns:

out (ndarray or DataArray) – Central moments array of same type as x. out.shape = (...,shape[axis-1], shape[axis+1], ..., mom0, ...) where shape = args[0].shape.

cmomy.reduction.reduce_data(data, *, mom_ndim, dim=MISSING, axis=MISSING, order=None, parallel=None, keep_attrs=None, out=None, dtype=None)[source]#

Reduce central moments array along axis.

Parameters:
  • data (ndarray or DataArray) – Moments collection array. It is assumed moment dimensions are last.

  • mom_ndim ({1, 2}) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2).

  • axis (int, optional) – Axis to reduce along. Note that negative values are relative to data.ndim - mom_ndim. It is assumed that the last dimensions are for moments. For example, if data.shape == (1,2,3) with mom_ndim=1, axis = -1 `` would be equivalent to ``axis = 1. Defaults to axis=-1.

  • dim (hashable) – Dimension to reduce along.

  • order ({"C", "F", "A", "K"}, optional) – Order argument to numpy.asarray().

  • parallel (bool, default True) – flags to numba.njit

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array.

  • dtype (dtype) – Optional dtype for output data.

Returns:

out (ndarray or DataArray) – Reduced data array with shape data.shape with axis removed. Same type as input data.

cmomy.reduction.factor_by(by, sort=True)[source]#

Factor by to codes and groups.

Parameters:
  • by (sequence) – Values to group by. Negative or None values indicate to skip this value. Note that if by is a pandas pandas.Index object, missing values should be marked with None only.

  • sort (bool, default True) – If True (default), sort groups. If False, return groups in order of first appearance.

Returns:

  • groups (list or pandas.Index) – Unique group names (excluding negative or None Values.)

  • codes (ndarray of int) – Indexer into groups.

Examples

>>> by = [1, 1, 0, -1, 0, 2, 2]
>>> groups, codes = factor_by(by, sort=False)
>>> groups
[1, 0, 2]
>>> codes
array([ 0,  0,  1, -1,  1,  2,  2])

Note that with sort=False, groups are in order of first appearance.

>>> groups, codes = factor_by(by)
>>> groups
[0, 1, 2]
>>> codes
array([ 1,  1,  0, -1,  0,  2,  2])

This also works for sequences of non-intengers.

>>> by = ["a", "a", None, "c", "c", -1]
>>> groups, codes = factor_by(by)
>>> groups
['a', 'c']
>>> codes
array([ 0,  0, -1,  1,  1, -1])

And for pandas.Index objects

>>> import pandas as pd
>>> by = pd.Index(["a", "a", None, "c", "c", None])
>>> groups, codes = factor_by(by)
>>> groups
Index(['a', 'c'], dtype='object')
>>> codes
array([ 0,  0, -1,  1,  1, -1])
cmomy.reduction.reduce_data_grouped(data, *, mom_ndim, by, axis=MISSING, order=None, parallel=None, out=None, dtype=None, dim=MISSING, group_dim=None, groups=None, keep_attrs=None)[source]#

Reduce data by group.

Parameters:
  • data (ndarray or DataArray) – Moments collection array. It is assumed moment dimensions are last.

  • mom_ndim ({1, 2}) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2).

  • by (array-like of int) – Groupby values of same length as data along sampled dimension. Negative values indicate no group (i.e., skip this index).

  • axis (int, optional) – Axis to reduce along. Note that negative values are relative to data.ndim - mom_ndim. It is assumed that the last dimensions are for moments. For example, if data.shape == (1,2,3) with mom_ndim=1, axis = -1 `` would be equivalent to ``axis = 1. Defaults to axis=-1.

  • order ({"C", "F", "A", "K"}, optional) – Order argument to numpy.asarray().

  • parallel (bool, default True) – flags to numba.njit

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array.

  • dtype (dtype) – Optional dtype for output data.

  • dim (hashable) – Dimension to reduce along.

  • group_dim (str, optional) – Name of the output group dimension. Defaults to dim.

  • groups (sequence, optional) – Sequence of length by.max() + 1 to assign as coordinates for group_dim.

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

Returns:

out (ndarray or DataArray) – Reduced data of same type as input data. The last dimensions are “group”, followed by moments. out.shape = (..., shape[axis-1], shape[axis+1], ..., ngroup, mom0, ...) where shape = data.shape and ngroups = by.max() + 1.

See also

factor_by

Examples

>>> data = np.ones((5, 3))
>>> by = [0, 0, -1, 1, -1]
>>> reduce_data_grouped(data, mom_ndim=1, axis=0, by=by)
array([[2., 1., 1.],
       [1., 1., 1.]])

This also works for DataArray objects. In this case, the groups are added as coordinates to group_dim

>>> xdata = xr.DataArray(data, dims=["rec", "mom"])
>>> reduce_data_grouped(xdata, mom_ndim=1, dim="rec", by=by, group_dim="group")
<xarray.DataArray (group: 2, mom: 3)> Size: 48B
array([[2., 1., 1.],
       [1., 1., 1.]])
Dimensions without coordinates: group, mom

Note that if by skips some groups, they will still be included in The output. For example the following by skips the value 0.

>>> by = [1, 1, -1, 2, 2]
>>> reduce_data_grouped(xdata, mom_ndim=1, dim="rec", by=by)
<xarray.DataArray (rec: 3, mom: 3)> Size: 72B
array([[0., 0., 0.],
       [2., 1., 1.],
       [2., 1., 1.]])
Dimensions without coordinates: rec, mom

If you want to ensure that only included groups are used, use factor_by(). This has the added benefit of working with non integer groups as well

>>> by = ["a", "a", None, "b", "b"]
>>> groups, codes = factor_by(by)
>>> reduce_data_grouped(xdata, mom_ndim=1, dim="rec", by=codes, groups=groups)
<xarray.DataArray (rec: 2, mom: 3)> Size: 48B
array([[2., 1., 1.],
       [2., 1., 1.]])
Coordinates:
  * rec      (rec) <U1 8B 'a' 'b'
Dimensions without coordinates: mom
cmomy.reduction.factor_by_to_index(by)[source]#

Transform group_idx to quantities to be used with reduce_data_indexed().

Parameters:
  • by (array-like) – Values to factor.

  • exclude_missing (bool, default True) – If True (default), filter Negative and None values from group_idx.

Returns:

  • groups (list or pandas.Index) – Unique groups in group_idx (excluding Negative or None values in group_idx if exclude_negative is True).

  • index (ndarray) – Indexing array. index[start[k]:end[k]] are the index with group groups[k].

  • start (ndarray) – See index

  • end (ndarray) – See index.

Examples

>>> factor_by_to_index([0, 1, 0, 1])
([0, 1], array([0, 2, 1, 3]), array([0, 2]), array([2, 4]))
>>> factor_by_to_index(["a", "b", "a", "b"])
(['a', 'b'], array([0, 2, 1, 3]), array([0, 2]), array([2, 4]))

Also, missing values (None or negative) are excluded:

>>> factor_by_to_index([None, "a", None, "b"])
(['a', 'b'], array([1, 3]), array([0, 1]), array([1, 2]))

You can also pass pandas.Index objects:

>>> factor_by_to_index(pd.Index([None, "a", None, "b"], name="my_index"))
(Index(['a', 'b'], dtype='object', name='my_index'), array([1, 3]), array([0, 1]), array([1, 2]))
cmomy.reduction.reduce_data_indexed(data, *, mom_ndim, index, group_start, group_end, scale=None, axis=MISSING, order=None, parallel=None, out=None, dtype=None, dim=MISSING, coords_policy='first', group_dim=None, groups=None, keep_attrs=None)[source]#

Reduce data by index

Parameters:
  • data (ndarray) – Moments collection array. It is assumed moment dimensions are last.

  • mom_ndim ({1, 2}) – Value indicates if moments (mom_ndim = 1) or comoments (mom_ndim=2).

  • index (ndarray) – Index into data.shape[axis].

  • group_start (ndarray) – Start, end of index for a group. index[group_start[group]:group_end[group]] are the indices for group group.

  • group_end (ndarray) – Start, end of index for a group. index[group_start[group]:group_end[group]] are the indices for group group.

  • scale (ndarray, optional) – Weights of same size as index.

  • axis (int, optional) – Axis to reduce along. Note that negative values are relative to data.ndim - mom_ndim. It is assumed that the last dimensions are for moments. For example, if data.shape == (1,2,3) with mom_ndim=1, axis = -1 `` would be equivalent to ``axis = 1. Defaults to axis=-1.

  • order ({"C", "F", "A", "K"}, optional) – Order argument to numpy.asarray().

  • parallel (bool, default True) – flags to numba.njit

  • out (ndarray) – Optional output array. If specified, output will be a reference to this array.

  • dtype (dtype) – Optional dtype for output data.

  • dim (hashable) – Dimension to reduce along.

  • coords_policy ({'first', 'last', 'group', None}) –

    Policy for handling coordinates along dim if by is specified for DataArray data. If no coordinates do nothing, otherwise use:

    • ’first’: select first value of coordinate for each block.

    • ’last’: select last value of coordinate for each block.

    • ’group’: Assign unique groups from group_idx to dim

    • None: drop any coordinates.

    Note that if coords_policy is one of first or last, parameter groups will be ignored.

  • group_dim (str, optional) – Name of the output group dimension. Defaults to dim.

  • groups (sequence, optional) – Sequence of length by.max() + 1 to assign as coordinates for group_dim.

  • keep_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional) –

    • ‘drop’ or False: empty attrs on returned xarray object.

    • ’identical’: all attrs must be the same on every object.

    • ’no_conflicts’: attrs from all objects are combined, any that have the same name must also have the same value.

    • ’drop_conflicts’: attrs from all objects are combined, any that have the same name but different values are dropped.

    • ’override’ or True: skip comparing and copy attrs from the first object to the result.

Returns:

out (ndarray or DataArray) – Reduced data of same type as input data. The last dimensions are group and moments. out.shape = (..., shape[axis-1], shape[axis+1], ..., ngroup, mom0, ...), where shape = data.shape and ngroup = len(group_start).

Examples

This is a more general reduction than reduce_data_grouped(), but it can be used similarly.

>>> data = np.ones((5, 3))
>>> by = ["a", "a", "b", "b", "c"]
>>> groups, index, start, end = factor_by_to_index(by)
>>> reduce_data_indexed(
...     data, mom_ndim=1, axis=0, index=index, group_start=start, group_end=end
... )
array([[2., 1., 1.],
       [2., 1., 1.],
       [1., 1., 1.]])

This also works for DataArray objects

>>> xdata = xr.DataArray(data, dims=["rec", "mom"])
>>> reduce_data_indexed(
...     xdata,
...     mom_ndim=1,
...     dim="rec",
...     index=index,
...     group_start=start,
...     group_end=end,
...     group_dim="group",
...     groups=groups,
...     coords_policy="group",
... )
<xarray.DataArray (group: 3, mom: 3)> Size: 72B
array([[2., 1., 1.],
       [2., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * group    (group) <U1 12B 'a' 'b' 'c'
Dimensions without coordinates: mom
cmomy.reduction.resample_data_indexed(data, freq, *, mom_ndim, axis=MISSING, order=None, parallel=True, out=None, dtype=None, dim=MISSING, coords_policy='first', group_dim=None, groups=None, keep_attrs=None)[source]#

Resample using indexed reduction.