Usage#
The basic usage of cmomy is to manipulate central moments.
We measure two quantities pretend quantities. The ‘energy’ and the ‘position’ of a thing. We’ll construct the average value for each record. Lets say 100 samples each.
import numpy as np
import xarray as xr
import cmomy
rng = cmomy.random.default_rng(seed=0)
nsamp = 100
energy = xr.DataArray(rng.random(nsamp), dims=["samp"])
position = xr.DataArray(rng.random((nsamp, 3)), dims=["samp", "dim"])
# weight associated with each sample and each record
w = xr.DataArray(rng.random(nsamp), dims=["samp"])
# average over the samples
ce = cmomy.xCentralMoments.from_vals(x=energy, w=w, dim="samp", mom=3)
cp = cmomy.xCentralMoments.from_vals(x=position, w=w, dim="samp", mom=3)
print("energy")
ce
energy
<xarray.DataArray (mom_0: 4)> Size: 32B array([ 5.3105e+01, 5.5314e-01, 8.6378e-02, -5.7464e-03]) Dimensions without coordinates: mom_0
<xarray.DataArray (mom_0: 4)> Size: 32B array([ 5.3105e+01, 5.5314e-01, 8.6378e-02, -5.7464e-03]) Dimensions without coordinates: mom_0
print("position")
cp
position
<xarray.DataArray (dim: 3, mom_0: 4)> Size: 96B array([[ 5.3105e+01, 5.2437e-01, 6.9091e-02, -5.5768e-04], [ 5.3105e+01, 4.9062e-01, 9.5523e-02, 2.2967e-03], [ 5.3105e+01, 5.6372e-01, 8.7545e-02, -8.3042e-03]]) Dimensions without coordinates: dim, mom_0
<xarray.DataArray (dim: 3, mom_0: 4)> Size: 96B array([[ 5.3105e+01, 5.2437e-01, 6.9091e-02, -5.5768e-04], [ 5.3105e+01, 4.9062e-01, 9.5523e-02, 2.2967e-03], [ 5.3105e+01, 5.6372e-01, 8.7545e-02, -8.3042e-03]]) Dimensions without coordinates: dim, mom_0
Here, we’ve used the constructor cmomy.CentralMoments.from_vals()
. There are a host of other constructors to create cmomy.CentralMoments
object. Take a look at the docs for further info.
Basic attributes#
Notice that there are three shape
parameters associated with a cmomy.CentralMoments
object:
cmomy.CentralMoments.mom_shape
: shape of the moments. For single variable, tuple (mom+1,). For comoments, (mom_0+1, mom_1+1)cmomy.CentralMoments.val_shape
: shape of the ‘values’ part of the data. For scalar data,val_shape = ()
. For vector data, this is the shape of the passed observation data.cmomy.CentralMoments.shape
: total shape of wrapped momentsshape = val_shape + mom_shape
for name, c in {"energy": ce, "position": cp}.items():
print(
f"""
{name} shapes:
mom_shape: {c.mom_shape}
val_shape: {c.val_shape}
shape : {c.shape}
"""
)
energy shapes:
mom_shape: (4,)
val_shape: ()
shape : (4,)
position shapes:
mom_shape: (4,)
val_shape: (3,)
shape : (3, 4)
To access the underlying data, use the cmomy.CentralMoments.to_numpy()
method or cmomy.CentralMoments.values
attribute. The structure is:
values[...,0]
: total weightvalues[...,1]
: average/mean valuevalues[...,k>1]
:kth
central moment.
ce.to_numpy()
array([ 5.3105e+01, 5.5314e-01, 8.6378e-02, -5.7464e-03])
To access all the central moments (zeroth and first included), use the cmomy.CentralMoments.cmom()
method.
ce.cmom()
<xarray.DataArray (mom_0: 4)> Size: 32B array([ 1. , 0. , 0.0864, -0.0057]) Dimensions without coordinates: mom_0
Likewise, the central moments can be converted to raw moments using the cmomy.CentralMoments.rmom()
method. This uses the cmomy.convert
module behind the scenes.
# <x**k>
ce.rmom()
<xarray.DataArray (mom_0: 4)> Size: 32B array([1. , 0.5531, 0.3923, 0.3068]) Dimensions without coordinates: mom_0
Additionally, there are xarray.DataArray
like attributes
ce.coords
Coordinates:
*empty*
ce.attrs
{}
ce.sizes
Frozen({'mom_0': 4})
ce.dims
('mom_0',)
To access the underlying data use cmomy.xCentralMoments.to_dataarray()
method or cmomy.xCentralMoments.values
attribute to access the xarray.DataArray
style data, or cmomy.xCentralMoments.to_numpy()
method to access the underlying numpy.ndarray
data.b
ce.to_dataarray()
<xarray.DataArray (mom_0: 4)> Size: 32B array([ 5.3105e+01, 5.5314e-01, 8.6378e-02, -5.7464e-03]) Dimensions without coordinates: mom_0
ce.to_numpy()
array([ 5.3105e+01, 5.5314e-01, 8.6378e-02, -5.7464e-03])
Manipulating (co)moments#
So we have our averages. Cool. Not very special. But what if instead we repeat our experiment. Let’s say we did the experiment 10 times each time giving a single record. Then our data would look like
nsamp = 100
nrec = 10
energy = xr.DataArray(rng.random((nrec, nsamp)), dims=["rec", "samp"])
position = xr.DataArray(rng.random((nrec, nsamp, 3)), dims=["rec", "samp", "dim"])
# weight associated with each sample and each record
w = xr.DataArray(rng.random((nrec, nsamp)), dims=["rec", "samp"])
# average over the samples
ce = cmomy.xCentralMoments.from_vals(x=energy, w=w, dim="samp", mom=3)
cp = cmomy.xCentralMoments.from_vals(x=position, w=w, dim="samp", mom=3)
Consider just the energy. We suspect that there is some correlation between the experiments (they where done in rapid succession). So we’d like to consider two adjacent experiments as one experiment. For this, we can use the cmomy.xCentralMoments.block()
method.
ce
<xarray.DataArray (rec: 10, mom_0: 4)> Size: 320B array([[ 5.3064e+01, 4.7292e-01, 8.0077e-02, 6.5155e-03], [ 4.6744e+01, 4.7915e-01, 7.8450e-02, 1.8555e-03], [ 5.3254e+01, 5.0665e-01, 9.3529e-02, 2.1183e-03], [ 5.0303e+01, 5.3773e-01, 6.9736e-02, -6.0640e-03], [ 4.8231e+01, 5.2371e-01, 7.8316e-02, -3.9247e-03], [ 5.1988e+01, 4.4939e-01, 8.1112e-02, 3.2211e-03], [ 4.6908e+01, 4.9788e-01, 9.3582e-02, -4.0934e-03], [ 5.4791e+01, 4.8149e-01, 8.2545e-02, 8.0189e-04], [ 5.3189e+01, 4.6463e-01, 9.5055e-02, 3.3487e-03], [ 5.1267e+01, 5.4169e-01, 9.3554e-02, -4.9880e-03]]) Dimensions without coordinates: rec, mom_0
<xarray.DataArray (rec: 10, mom_0: 4)> Size: 320B array([[ 5.3064e+01, 4.7292e-01, 8.0077e-02, 6.5155e-03], [ 4.6744e+01, 4.7915e-01, 7.8450e-02, 1.8555e-03], [ 5.3254e+01, 5.0665e-01, 9.3529e-02, 2.1183e-03], [ 5.0303e+01, 5.3773e-01, 6.9736e-02, -6.0640e-03], [ 4.8231e+01, 5.2371e-01, 7.8316e-02, -3.9247e-03], [ 5.1988e+01, 4.4939e-01, 8.1112e-02, 3.2211e-03], [ 4.6908e+01, 4.9788e-01, 9.3582e-02, -4.0934e-03], [ 5.4791e+01, 4.8149e-01, 8.2545e-02, 8.0189e-04], [ 5.3189e+01, 4.6463e-01, 9.5055e-02, 3.3487e-03], [ 5.1267e+01, 5.4169e-01, 9.3554e-02, -4.9880e-03]]) Dimensions without coordinates: rec, mom_0
ce.block(block_size=2, dim="rec")
<xarray.DataArray (rec: 5, mom_0: 4)> Size: 160B array([[ 9.9808e+01, 4.7583e-01, 7.9325e-02, 4.3255e-03], [ 1.0356e+02, 5.2174e-01, 8.2213e-02, -2.4102e-03], [ 1.0022e+02, 4.8515e-01, 8.1146e-02, -3.6968e-04], [ 1.0170e+02, 4.8905e-01, 8.7702e-02, -1.3211e-03], [ 1.0446e+02, 5.0245e-01, 9.5802e-02, -8.2761e-04]]) Dimensions without coordinates: rec, mom_0
<xarray.DataArray (rec: 5, mom_0: 4)> Size: 160B array([[ 9.9808e+01, 4.7583e-01, 7.9325e-02, 4.3255e-03], [ 1.0356e+02, 5.2174e-01, 8.2213e-02, -2.4102e-03], [ 1.0022e+02, 4.8515e-01, 8.1146e-02, -3.6968e-04], [ 1.0170e+02, 4.8905e-01, 8.7702e-02, -1.3211e-03], [ 1.0446e+02, 5.0245e-01, 9.5802e-02, -8.2761e-04]]) Dimensions without coordinates: rec, mom_0
Instead, we can resample the already averaged data using cmomy.xCentralMoments.resample_and_reduce()
. We produce a 20 new samples taken from the original (averaged) data.
ce_resamp = ce.resample_and_reduce(nrep=20, dim="rec")
ce_resamp
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B array([[ 4.9894e+02, 4.8483e-01, 8.4004e-02, 7.6615e-04], [ 5.0316e+02, 5.0209e-01, 8.0869e-02, -1.1553e-04], [ 5.1114e+02, 4.8697e-01, 8.6689e-02, 1.2299e-03], [ 5.2135e+02, 4.9141e-01, 8.4228e-02, -5.7084e-04], [ 5.0592e+02, 4.9809e-01, 8.5436e-02, -8.7446e-04], [ 5.1902e+02, 4.9260e-01, 8.5149e-02, 7.5026e-04], [ 5.0659e+02, 5.0687e-01, 8.3342e-02, -2.2954e-03], [ 5.0490e+02, 4.9669e-01, 8.6755e-02, -8.1764e-04], [ 5.1172e+02, 4.9714e-01, 8.1701e-02, -1.5562e-04], [ 5.1915e+02, 4.8611e-01, 9.0029e-02, 2.7399e-03], [ 5.1257e+02, 4.8789e-01, 8.3948e-02, 1.4324e-03], [ 5.0757e+02, 4.9311e-01, 8.1820e-02, -3.7571e-04], [ 4.9210e+02, 4.7666e-01, 8.3947e-02, 2.1234e-03], [ 5.2529e+02, 4.9417e-01, 8.4350e-02, 1.1154e-03], [ 4.9973e+02, 4.9879e-01, 8.9895e-02, -2.9911e-04], [ 5.0993e+02, 4.8937e-01, 8.6093e-02, 8.3597e-04], [ 5.0860e+02, 4.7908e-01, 8.5023e-02, 9.4369e-04], [ 5.0110e+02, 4.9114e-01, 8.4371e-02, -1.0724e-03], [ 5.2742e+02, 4.8909e-01, 8.5711e-02, 1.4218e-03], [ 4.9154e+02, 4.9741e-01, 8.1855e-02, -3.3717e-04]]) Dimensions without coordinates: rep, mom_0
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B array([[ 4.9894e+02, 4.8483e-01, 8.4004e-02, 7.6615e-04], [ 5.0316e+02, 5.0209e-01, 8.0869e-02, -1.1553e-04], [ 5.1114e+02, 4.8697e-01, 8.6689e-02, 1.2299e-03], [ 5.2135e+02, 4.9141e-01, 8.4228e-02, -5.7084e-04], [ 5.0592e+02, 4.9809e-01, 8.5436e-02, -8.7446e-04], [ 5.1902e+02, 4.9260e-01, 8.5149e-02, 7.5026e-04], [ 5.0659e+02, 5.0687e-01, 8.3342e-02, -2.2954e-03], [ 5.0490e+02, 4.9669e-01, 8.6755e-02, -8.1764e-04], [ 5.1172e+02, 4.9714e-01, 8.1701e-02, -1.5562e-04], [ 5.1915e+02, 4.8611e-01, 9.0029e-02, 2.7399e-03], [ 5.1257e+02, 4.8789e-01, 8.3948e-02, 1.4324e-03], [ 5.0757e+02, 4.9311e-01, 8.1820e-02, -3.7571e-04], [ 4.9210e+02, 4.7666e-01, 8.3947e-02, 2.1234e-03], [ 5.2529e+02, 4.9417e-01, 8.4350e-02, 1.1154e-03], [ 4.9973e+02, 4.9879e-01, 8.9895e-02, -2.9911e-04], [ 5.0993e+02, 4.8937e-01, 8.6093e-02, 8.3597e-04], [ 5.0860e+02, 4.7908e-01, 8.5023e-02, 9.4369e-04], [ 5.0110e+02, 4.9114e-01, 8.4371e-02, -1.0724e-03], [ 5.2742e+02, 4.8909e-01, 8.5711e-02, 1.4218e-03], [ 4.9154e+02, 4.9741e-01, 8.1855e-02, -3.3717e-04]]) Dimensions without coordinates: rep, mom_0
This is different than the usual ‘resample values’. This is also available if the original data is available.
a = energy.stack(c=["rec", "samp"])
b = w.stack(c=["rec", "samp"])
# consider 'all' the data for this
out = cmomy.xCentralMoments.from_resample_vals(
energy.stack(c=["rec", "samp"]),
w=w.stack(c=["rec", "samp"]),
dim="c",
nrep=20,
mom=3,
)
out
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B array([[ 5.0470e+02, 5.0732e-01, 8.3537e-02, -1.7364e-03], [ 5.1471e+02, 5.0032e-01, 8.5279e-02, -1.9318e-04], [ 5.1740e+02, 4.8807e-01, 8.1873e-02, -6.0608e-04], [ 5.2171e+02, 5.0050e-01, 8.3951e-02, -2.8801e-04], [ 5.1191e+02, 4.9601e-01, 8.5987e-02, 5.9479e-04], [ 4.9898e+02, 5.0192e-01, 8.6259e-02, -4.8733e-04], [ 5.0383e+02, 5.0676e-01, 8.4769e-02, -9.1195e-04], [ 5.3502e+02, 4.8362e-01, 8.1934e-02, 8.0303e-04], [ 4.9916e+02, 5.1223e-01, 8.6978e-02, -2.4868e-03], [ 5.1508e+02, 4.8799e-01, 8.1554e-02, 1.2730e-03], [ 5.0503e+02, 4.9609e-01, 8.7332e-02, -1.2149e-03], [ 5.0252e+02, 4.8657e-01, 8.4106e-02, 1.2529e-03], [ 5.0575e+02, 4.8867e-01, 8.9293e-02, 9.5625e-04], [ 5.0848e+02, 4.8416e-01, 8.4131e-02, 1.9068e-05], [ 5.0767e+02, 4.9133e-01, 8.2855e-02, 1.3108e-03], [ 5.2246e+02, 4.8245e-01, 8.5488e-02, 7.4083e-05], [ 5.1387e+02, 5.0965e-01, 8.5511e-02, -1.8175e-03], [ 5.2162e+02, 4.9756e-01, 8.8069e-02, -3.8196e-04], [ 5.0263e+02, 4.9969e-01, 8.5556e-02, -2.0398e-03], [ 5.0454e+02, 5.0938e-01, 8.2281e-02, -1.8049e-03]]) Dimensions without coordinates: rep, mom_0
<xarray.DataArray (rep: 20, mom_0: 4)> Size: 640B array([[ 5.0470e+02, 5.0732e-01, 8.3537e-02, -1.7364e-03], [ 5.1471e+02, 5.0032e-01, 8.5279e-02, -1.9318e-04], [ 5.1740e+02, 4.8807e-01, 8.1873e-02, -6.0608e-04], [ 5.2171e+02, 5.0050e-01, 8.3951e-02, -2.8801e-04], [ 5.1191e+02, 4.9601e-01, 8.5987e-02, 5.9479e-04], [ 4.9898e+02, 5.0192e-01, 8.6259e-02, -4.8733e-04], [ 5.0383e+02, 5.0676e-01, 8.4769e-02, -9.1195e-04], [ 5.3502e+02, 4.8362e-01, 8.1934e-02, 8.0303e-04], [ 4.9916e+02, 5.1223e-01, 8.6978e-02, -2.4868e-03], [ 5.1508e+02, 4.8799e-01, 8.1554e-02, 1.2730e-03], [ 5.0503e+02, 4.9609e-01, 8.7332e-02, -1.2149e-03], [ 5.0252e+02, 4.8657e-01, 8.4106e-02, 1.2529e-03], [ 5.0575e+02, 4.8867e-01, 8.9293e-02, 9.5625e-04], [ 5.0848e+02, 4.8416e-01, 8.4131e-02, 1.9068e-05], [ 5.0767e+02, 4.9133e-01, 8.2855e-02, 1.3108e-03], [ 5.2246e+02, 4.8245e-01, 8.5488e-02, 7.4083e-05], [ 5.1387e+02, 5.0965e-01, 8.5511e-02, -1.8175e-03], [ 5.2162e+02, 4.9756e-01, 8.8069e-02, -3.8196e-04], [ 5.0263e+02, 4.9969e-01, 8.5556e-02, -2.0398e-03], [ 5.0454e+02, 5.0938e-01, 8.2281e-02, -1.8049e-03]]) Dimensions without coordinates: rep, mom_0
We see that the deviation in the moments is similar using the two resampling methods:
out.to_dataarray().sel(mom_0=slice(1, None)).std("rep")
<xarray.DataArray (mom_0: 3)> Size: 24B array([0.0093, 0.0021, 0.0012]) Dimensions without coordinates: mom_0
ce_resamp.to_dataarray().sel(mom_0=slice(1, None)).std("rep")
<xarray.DataArray (mom_0: 3)> Size: 24B array([0.0072, 0.0024, 0.0012]) Dimensions without coordinates: mom_0
We can also reduce our original data across all the records using cmomy.xCentralMoments.reduce()
ce.reduce(dim="rec")
<xarray.DataArray (mom_0: 4)> Size: 32B array([ 5.0974e+02, 4.9508e-01, 8.5572e-02, -6.5659e-05]) Dimensions without coordinates: mom_0
<xarray.DataArray (mom_0: 4)> Size: 32B array([ 5.0974e+02, 4.9508e-01, 8.5572e-02, -6.5659e-05]) Dimensions without coordinates: mom_0