Usage#

The basic usage of cmomy is to manipulate central moments.
We measure two quantities pretend quantities. The ‘energy’ and the ‘position’ of a thing. We’ll construct the average value for each record. Lets say 100 samples each.

import numpy as np
import xarray as xr

import cmomy

rng = cmomy.default_rng(seed=0)

nsamp = 100
energy = xr.DataArray(rng.random(nsamp), dims=["samp"])

position = xr.DataArray(rng.random((nsamp, 3)), dims=["samp", "dim"])

# weight associated with each sample and each record
w = xr.DataArray(rng.random(nsamp), dims=["samp"])

# average over the samples
ce = cmomy.wrap_reduce_vals(energy, weight=w, dim="samp", mom=3)
cp = cmomy.wrap_reduce_vals(position, weight=w, dim="samp", mom=3)
print("energy")
ce
energy
<xarray.DataArray (mom_0: 4)> Size: 32B
array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])
Dimensions without coordinates: mom_0
<xarray.DataArray (mom_0: 4)> Size: 32B
array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])
Dimensions without coordinates: mom_0
print("position")
cp
position
<xarray.DataArray (dim: 3, mom_0: 4)> Size: 96B
array([[ 5.3105e+01,  5.2437e-01,  6.9091e-02, -5.5768e-04],
       [ 5.3105e+01,  4.9062e-01,  9.5523e-02,  2.2967e-03],
       [ 5.3105e+01,  5.6372e-01,  8.7545e-02, -8.3042e-03]])
Dimensions without coordinates: dim, mom_0
<xarray.DataArray (dim: 3, mom_0: 4)> Size: 96B
array([[ 5.3105e+01,  5.2437e-01,  6.9091e-02, -5.5768e-04],
       [ 5.3105e+01,  4.9062e-01,  9.5523e-02,  2.2967e-03],
       [ 5.3105e+01,  5.6372e-01,  8.7545e-02, -8.3042e-03]])
Dimensions without coordinates: dim, mom_0

Here, we’ve used the constructor cmomy.wrap_reduce_vals(). There are a host of other constructors to create cmomy.CentralMomentsData and cmomy.CentralMomentsArray objects. Take a look at the docs for further info.

Dataset#

We can also wrap Dataset objects. This has some limitations, as the structure is more complicated. In particular, methods that depend on a fixed array shape, like shape(), etc, are not available for wrapped Dataset.

ds = xr.Dataset({"energy": energy, "position": position})
ctot = cmomy.wrap_reduce_vals(ds, weight=w, dim="samp", mom=3)
ctot
<xarray.Dataset> Size: 128B
Dimensions:   (mom_0: 4, dim: 3)
Dimensions without coordinates: mom_0, dim
Data variables:
    energy    (mom_0) float64 32B 53.11 0.5531 0.08638 -0.005746
    position  (dim, mom_0) float64 96B 53.11 0.5244 ... 0.08754 -0.008304
<xarray.Dataset> Size: 128B
Dimensions:   (mom_0: 4, dim: 3)
Dimensions without coordinates: mom_0, dim
Data variables:
    energy    (mom_0) float64 32B 53.11 0.5531 0.08638 -0.005746
    position  (dim, mom_0) float64 96B 53.11 0.5244 ... 0.08754 -0.008304
# verify that this is the same result as the DataArray version above...
xr.testing.assert_equal(ctot["energy"].obj, ce.obj)
xr.testing.assert_equal(ctot["position"].obj, cp.obj)

Chunked data#

You can also use dask chunked data (assuming dask is installed). For example:

ctot_chunk = cmomy.wrap_reduce_vals(ds.chunk({"dim": -1}), weight=w, dim="samp", mom=3)
ctot_chunk
<xarray.Dataset> Size: 128B
Dimensions:   (mom_0: 4, dim: 3)
Dimensions without coordinates: mom_0, dim
Data variables:
    energy    (mom_0) float64 32B dask.array<chunksize=(4,), meta=np.ndarray>
    position  (dim, mom_0) float64 96B dask.array<chunksize=(3, 4), meta=np.ndarray>
<xarray.Dataset> Size: 128B
Dimensions:   (mom_0: 4, dim: 3)
Dimensions without coordinates: mom_0, dim
Data variables:
    energy    (mom_0) float64 32B dask.array<chunksize=(4,), meta=np.ndarray>
    position  (dim, mom_0) float64 96B dask.array<chunksize=(3, 4), meta=np.ndarray>

To compute the final result, use compute():

ctot_chunk.compute()
<xarray.Dataset> Size: 128B
Dimensions:   (mom_0: 4, dim: 3)
Dimensions without coordinates: mom_0, dim
Data variables:
    energy    (mom_0) float64 32B 53.11 0.5531 0.08638 -0.005746
    position  (dim, mom_0) float64 96B 53.11 0.5244 ... 0.08754 -0.008304
<xarray.Dataset> Size: 128B
Dimensions:   (mom_0: 4, dim: 3)
Dimensions without coordinates: mom_0, dim
Data variables:
    energy    (mom_0) float64 32B 53.11 0.5531 0.08638 -0.005746
    position  (dim, mom_0) float64 96B 53.11 0.5244 ... 0.08754 -0.008304
xr.testing.assert_equal(ctot.obj, ctot_chunk.obj)

Basic attributes#

Notice that there are three shape parameters associated with a CentralMomentsData object:

  • mom_shape : shape of the moments. For single variable, tuple (mom+1,). For comoments, (mom_0+1, mom_1+1)

  • val_shape: shape of the ‘values’ part of the data. For scalar data, val_shape = (). For vector data, this is the shape of the passed observation data.

  • shape: total shape of wrapped moments shape = val_shape + mom_shape

for name, c in {"energy": ce, "position": cp}.items():
    print(
        f"""
{name} shapes:
    mom_shape: {c.mom_shape}
    val_shape: {c.val_shape}
    shape    : {c.shape}
"""
    )
energy shapes:
    mom_shape: (4,)
    val_shape: ()
    shape    : (4,)


position shapes:
    mom_shape: (4,)
    val_shape: (3,)
    shape    : (3, 4)

To access the underlying data, use the cmomy.CentralMomentsData.to_numpy() method. The structure is:

  • values[...,0]: total weight

  • values[...,1]: average/mean value

  • values[...,k>1]: kth central moment.

ce.to_numpy()
array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])

To access all the central moments (zeroth and first included), use the cmom() method.

ce.cmom()
<xarray.DataArray (mom_0: 4)> Size: 32B
array([ 1.    ,  0.    ,  0.0864, -0.0057])
Dimensions without coordinates: mom_0

Likewise, the central moments can be converted to raw moments using the rmom() method. This uses the moments_type() function behind the scenes.

# <x**k>
ce.rmom()
<xarray.DataArray (mom_0: 4)> Size: 32B
array([1.    , 0.5531, 0.3923, 0.3068])
Dimensions without coordinates: mom_0

Additionally, there are xarray.DataArray like attributes

ce.coords
Coordinates:
    *empty*
ce.attrs
{}
ce.sizes
Frozen({'mom_0': 4})
ce.dims
('mom_0',)

To access the underlying data use obj() attribute to access the DataArray style data, or to_numpy() method to access the underlying ndarray.

ce.obj
<xarray.DataArray (mom_0: 4)> Size: 32B
array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])
Dimensions without coordinates: mom_0
ce.to_numpy()
array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])

Manipulating (co)moments#

So we have our averages. Cool. Not very special. But what if instead we repeat our experiment. Let’s say we did the experiment 10 times each time giving a single record. Then our data would look like

nsamp = 100
nrec = 10
energy = xr.DataArray(rng.random((nrec, nsamp)), dims=["rec", "samp"])
position = xr.DataArray(rng.random((nrec, nsamp, 3)), dims=["rec", "samp", "dim"])

# weight associated with each sample and each record
w = xr.DataArray(rng.random((nrec, nsamp)), dims=["rec", "samp"])

# average over the samples
ce = cmomy.wrap_reduce_vals(energy, weight=w, dim="samp", mom=3)
cp = cmomy.wrap_reduce_vals(position, weight=w, dim="samp", mom=3)

Consider just the energy. We suspect that there is some correlation between the experiments (they where done in rapid succession). So we’d like to consider two adjacent experiments as one experiment. For this, we can use the reduce() method with the block parameter.

ce
<xarray.DataArray (rec: 10, mom_0: 4)> Size: 320B
array([[ 5.3064e+01,  4.7292e-01,  8.0077e-02,  6.5155e-03],
       [ 4.6744e+01,  4.7915e-01,  7.8450e-02,  1.8555e-03],
       [ 5.3254e+01,  5.0665e-01,  9.3529e-02,  2.1183e-03],
       [ 5.0303e+01,  5.3773e-01,  6.9736e-02, -6.0640e-03],
       [ 4.8231e+01,  5.2371e-01,  7.8316e-02, -3.9247e-03],
       [ 5.1988e+01,  4.4939e-01,  8.1112e-02,  3.2211e-03],
       [ 4.6908e+01,  4.9788e-01,  9.3582e-02, -4.0934e-03],
       [ 5.4791e+01,  4.8149e-01,  8.2545e-02,  8.0189e-04],
       [ 5.3189e+01,  4.6463e-01,  9.5055e-02,  3.3487e-03],
       [ 5.1267e+01,  5.4169e-01,  9.3554e-02, -4.9880e-03]])
Dimensions without coordinates: rec, mom_0
<xarray.DataArray (rec: 10, mom_0: 4)> Size: 320B
array([[ 5.3064e+01,  4.7292e-01,  8.0077e-02,  6.5155e-03],
       [ 4.6744e+01,  4.7915e-01,  7.8450e-02,  1.8555e-03],
       [ 5.3254e+01,  5.0665e-01,  9.3529e-02,  2.1183e-03],
       [ 5.0303e+01,  5.3773e-01,  6.9736e-02, -6.0640e-03],
       [ 4.8231e+01,  5.2371e-01,  7.8316e-02, -3.9247e-03],
       [ 5.1988e+01,  4.4939e-01,  8.1112e-02,  3.2211e-03],
       [ 4.6908e+01,  4.9788e-01,  9.3582e-02, -4.0934e-03],
       [ 5.4791e+01,  4.8149e-01,  8.2545e-02,  8.0189e-04],
       [ 5.3189e+01,  4.6463e-01,  9.5055e-02,  3.3487e-03],
       [ 5.1267e+01,  5.4169e-01,  9.3554e-02, -4.9880e-03]])
Dimensions without coordinates: rec, mom_0
ce.reduce(block=2, dim="rec")
<xarray.DataArray (rec: 5, mom_0: 4)> Size: 160B
array([[ 9.9808e+01,  4.7583e-01,  7.9325e-02,  4.3255e-03],
       [ 1.0356e+02,  5.2174e-01,  8.2213e-02, -2.4102e-03],
       [ 1.0022e+02,  4.8515e-01,  8.1146e-02, -3.6968e-04],
       [ 1.0170e+02,  4.8905e-01,  8.7702e-02, -1.3211e-03],
       [ 1.0446e+02,  5.0245e-01,  9.5802e-02, -8.2761e-04]])
Dimensions without coordinates: rec, mom_0
<xarray.DataArray (rec: 5, mom_0: 4)> Size: 160B
array([[ 9.9808e+01,  4.7583e-01,  7.9325e-02,  4.3255e-03],
       [ 1.0356e+02,  5.2174e-01,  8.2213e-02, -2.4102e-03],
       [ 1.0022e+02,  4.8515e-01,  8.1146e-02, -3.6968e-04],
       [ 1.0170e+02,  4.8905e-01,  8.7702e-02, -1.3211e-03],
       [ 1.0446e+02,  5.0245e-01,  9.5802e-02, -8.2761e-04]])
Dimensions without coordinates: rec, mom_0

Instead, we can resample the already averaged data using cmomy.CentralMomentsData.resample_and_reduce(). We produce a 20 new samples taken from the original (averaged) data.

ce_resamp = ce.resample_and_reduce(dim="rec", sampler={"nrep": 20})
ce_resamp
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B
array([[ 4.9894e+02,  4.8483e-01,  8.4004e-02,  7.6615e-04],
       [ 5.0316e+02,  5.0209e-01,  8.0869e-02, -1.1553e-04],
       [ 5.1114e+02,  4.8697e-01,  8.6689e-02,  1.2299e-03],
       [ 5.2135e+02,  4.9141e-01,  8.4228e-02, -5.7084e-04],
       [ 5.0592e+02,  4.9809e-01,  8.5436e-02, -8.7446e-04],
       [ 5.1902e+02,  4.9260e-01,  8.5149e-02,  7.5026e-04],
       [ 5.0659e+02,  5.0687e-01,  8.3342e-02, -2.2954e-03],
       [ 5.0490e+02,  4.9669e-01,  8.6755e-02, -8.1764e-04],
       [ 5.1172e+02,  4.9714e-01,  8.1701e-02, -1.5562e-04],
       [ 5.1915e+02,  4.8611e-01,  9.0029e-02,  2.7399e-03],
       [ 5.1257e+02,  4.8789e-01,  8.3948e-02,  1.4324e-03],
       [ 5.0757e+02,  4.9311e-01,  8.1820e-02, -3.7571e-04],
       [ 4.9210e+02,  4.7666e-01,  8.3947e-02,  2.1234e-03],
       [ 5.2529e+02,  4.9417e-01,  8.4350e-02,  1.1154e-03],
       [ 4.9973e+02,  4.9879e-01,  8.9895e-02, -2.9911e-04],
       [ 5.0993e+02,  4.8937e-01,  8.6093e-02,  8.3597e-04],
       [ 5.0860e+02,  4.7908e-01,  8.5023e-02,  9.4369e-04],
       [ 5.0110e+02,  4.9114e-01,  8.4371e-02, -1.0724e-03],
       [ 5.2742e+02,  4.8909e-01,  8.5711e-02,  1.4218e-03],
       [ 4.9154e+02,  4.9741e-01,  8.1855e-02, -3.3717e-04]])
Dimensions without coordinates: rep, mom_0
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B
array([[ 4.9894e+02,  4.8483e-01,  8.4004e-02,  7.6615e-04],
       [ 5.0316e+02,  5.0209e-01,  8.0869e-02, -1.1553e-04],
       [ 5.1114e+02,  4.8697e-01,  8.6689e-02,  1.2299e-03],
       [ 5.2135e+02,  4.9141e-01,  8.4228e-02, -5.7084e-04],
       [ 5.0592e+02,  4.9809e-01,  8.5436e-02, -8.7446e-04],
       [ 5.1902e+02,  4.9260e-01,  8.5149e-02,  7.5026e-04],
       [ 5.0659e+02,  5.0687e-01,  8.3342e-02, -2.2954e-03],
       [ 5.0490e+02,  4.9669e-01,  8.6755e-02, -8.1764e-04],
       [ 5.1172e+02,  4.9714e-01,  8.1701e-02, -1.5562e-04],
       [ 5.1915e+02,  4.8611e-01,  9.0029e-02,  2.7399e-03],
       [ 5.1257e+02,  4.8789e-01,  8.3948e-02,  1.4324e-03],
       [ 5.0757e+02,  4.9311e-01,  8.1820e-02, -3.7571e-04],
       [ 4.9210e+02,  4.7666e-01,  8.3947e-02,  2.1234e-03],
       [ 5.2529e+02,  4.9417e-01,  8.4350e-02,  1.1154e-03],
       [ 4.9973e+02,  4.9879e-01,  8.9895e-02, -2.9911e-04],
       [ 5.0993e+02,  4.8937e-01,  8.6093e-02,  8.3597e-04],
       [ 5.0860e+02,  4.7908e-01,  8.5023e-02,  9.4369e-04],
       [ 5.0110e+02,  4.9114e-01,  8.4371e-02, -1.0724e-03],
       [ 5.2742e+02,  4.8909e-01,  8.5711e-02,  1.4218e-03],
       [ 4.9154e+02,  4.9741e-01,  8.1855e-02, -3.3717e-04]])
Dimensions without coordinates: rep, mom_0

This is different than the usual ‘resample values’. This is also available if the original data is available.

# consider 'all' the data for this
energy_stack = energy.stack(c=["rec", "samp"])
weight_stack = w.stack(c=["rec", "samp"])

out = cmomy.wrap_resample_vals(
    energy.stack(c=["rec", "samp"]),
    weight=w.stack(c=["rec", "samp"]),
    dim="c",
    sampler={"nrep": 20},
    mom=3,
)
out
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B
array([[ 5.0470e+02,  5.0732e-01,  8.3537e-02, -1.7364e-03],
       [ 5.1471e+02,  5.0032e-01,  8.5279e-02, -1.9318e-04],
       [ 5.1740e+02,  4.8807e-01,  8.1873e-02, -6.0608e-04],
       [ 5.2171e+02,  5.0050e-01,  8.3951e-02, -2.8801e-04],
       [ 5.1191e+02,  4.9601e-01,  8.5987e-02,  5.9479e-04],
       [ 4.9898e+02,  5.0192e-01,  8.6259e-02, -4.8733e-04],
       [ 5.0383e+02,  5.0676e-01,  8.4769e-02, -9.1195e-04],
       [ 5.3502e+02,  4.8362e-01,  8.1934e-02,  8.0303e-04],
       [ 4.9916e+02,  5.1223e-01,  8.6978e-02, -2.4868e-03],
       [ 5.1508e+02,  4.8799e-01,  8.1554e-02,  1.2730e-03],
       [ 5.0503e+02,  4.9609e-01,  8.7332e-02, -1.2149e-03],
       [ 5.0252e+02,  4.8657e-01,  8.4106e-02,  1.2529e-03],
       [ 5.0575e+02,  4.8867e-01,  8.9293e-02,  9.5625e-04],
       [ 5.0848e+02,  4.8416e-01,  8.4131e-02,  1.9068e-05],
       [ 5.0767e+02,  4.9133e-01,  8.2855e-02,  1.3108e-03],
       [ 5.2246e+02,  4.8245e-01,  8.5488e-02,  7.4083e-05],
       [ 5.1387e+02,  5.0965e-01,  8.5511e-02, -1.8175e-03],
       [ 5.2162e+02,  4.9756e-01,  8.8069e-02, -3.8196e-04],
       [ 5.0263e+02,  4.9969e-01,  8.5556e-02, -2.0398e-03],
       [ 5.0454e+02,  5.0938e-01,  8.2281e-02, -1.8049e-03]])
Dimensions without coordinates: rep, mom_0
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B
array([[ 5.0470e+02,  5.0732e-01,  8.3537e-02, -1.7364e-03],
       [ 5.1471e+02,  5.0032e-01,  8.5279e-02, -1.9318e-04],
       [ 5.1740e+02,  4.8807e-01,  8.1873e-02, -6.0608e-04],
       [ 5.2171e+02,  5.0050e-01,  8.3951e-02, -2.8801e-04],
       [ 5.1191e+02,  4.9601e-01,  8.5987e-02,  5.9479e-04],
       [ 4.9898e+02,  5.0192e-01,  8.6259e-02, -4.8733e-04],
       [ 5.0383e+02,  5.0676e-01,  8.4769e-02, -9.1195e-04],
       [ 5.3502e+02,  4.8362e-01,  8.1934e-02,  8.0303e-04],
       [ 4.9916e+02,  5.1223e-01,  8.6978e-02, -2.4868e-03],
       [ 5.1508e+02,  4.8799e-01,  8.1554e-02,  1.2730e-03],
       [ 5.0503e+02,  4.9609e-01,  8.7332e-02, -1.2149e-03],
       [ 5.0252e+02,  4.8657e-01,  8.4106e-02,  1.2529e-03],
       [ 5.0575e+02,  4.8867e-01,  8.9293e-02,  9.5625e-04],
       [ 5.0848e+02,  4.8416e-01,  8.4131e-02,  1.9068e-05],
       [ 5.0767e+02,  4.9133e-01,  8.2855e-02,  1.3108e-03],
       [ 5.2246e+02,  4.8245e-01,  8.5488e-02,  7.4083e-05],
       [ 5.1387e+02,  5.0965e-01,  8.5511e-02, -1.8175e-03],
       [ 5.2162e+02,  4.9756e-01,  8.8069e-02, -3.8196e-04],
       [ 5.0263e+02,  4.9969e-01,  8.5556e-02, -2.0398e-03],
       [ 5.0454e+02,  5.0938e-01,  8.2281e-02, -1.8049e-03]])
Dimensions without coordinates: rep, mom_0

We see that the deviation in the moments is similar using the two resampling methods:

out.obj.sel(mom_0=slice(1, None)).std("rep")
<xarray.DataArray (mom_0: 3)> Size: 24B
array([0.0093, 0.0021, 0.0012])
Dimensions without coordinates: mom_0
ce_resamp.obj.sel(mom_0=slice(1, None)).std("rep")
<xarray.DataArray (mom_0: 3)> Size: 24B
array([0.0072, 0.0024, 0.0012])
Dimensions without coordinates: mom_0

We can also reduce our original data across all the records using reduce()

ce.reduce(dim="rec")
<xarray.DataArray (mom_0: 4)> Size: 32B
array([ 5.0974e+02,  4.9508e-01,  8.5572e-02, -6.5659e-05])
Dimensions without coordinates: mom_0
<xarray.DataArray (mom_0: 4)> Size: 32B
array([ 5.0974e+02,  4.9508e-01,  8.5572e-02, -6.5659e-05])
Dimensions without coordinates: mom_0