Why `cmomy`?#

\[\newcommand{\ave}[1]{\langle{#1}\rangle}\]

We wish to calculate moments of the form \(\ave{x}\) and \(\ave{(\delta x)^k}\), \(\delta x = (x - \ave{x})\), where, in general we define averages of the form.

\[ \ave{x} = \frac{\sum_i w_i x_i}{\sum_i w_i} \]

\[ \ave{(\delta x)^k} = \frac{\sum_i w_i (x_i - \ave{x})^k}{\sum_i w_i} \]

where \(w_i\) are weights associated with each sample \(x_i\). In the simple case, \(w_i = 1\). To calculate a central moment with \(k > 1\), we have some choices.

First calculate \(\ave{x}\), then use the above formula.

Downside: requires two passes through the data.
Upside: numerically stable.

Expand out the central moment to raw moments. For example \(\ave{(x - \ave{x})^2} = \ave{x^n} - \ave{x}^2\). These raw moments can be calculated directly in one pass.

Downside: numerically unstable (see here for some info).
Upside: single pass for each of \(\ave{x^k}\).

An example#

In what follows, we’ll arrange data in the from output[moment] where output[0] = {total weight}, output[1] = {average value}, output[k > 1] = {kth order central moment}. We can generate the central moments using the following functions

def cmom_1(x, axis=0, mom=3):
    """Create central moments using stable method"""
    # output shape
    shape = x.shape[:axis] + x.shape[axis + 1 :] + (mom + 1,)
    output = np.zeros(shape, dtype=x.dtype)

    output[..., 0] = x.shape[axis]
    output[..., 1] = x.mean(axis=axis)

    # moments [2 -> mom]
    output[..., 2:] = (
        (x - x.mean(axis=axis, keepdims=True))[..., None] ** np.arange(2, mom + 1)
    ).mean(axis=axis)
    return output


def cmom_2(x, axis=0, mom=3):
    if mom > 3:
        raise ValueError

    shape = x.shape[:axis] + x.shape[axis + 1 :] + (mom + 1,)

    output = np.zeros(shape, dtype=x.dtype)

    output[..., 0] = x.shape[axis]

    # higher order raw moments
    raws = (x[..., None] ** np.arange(0, mom + 1)).mean(axis=axis)

    output[..., 1] = raws[..., 1]
    if mom > 1:
        # <(x - <x>)**2> = <x**2> - <x>**2
        output[..., 2] = raws[..., 2] - raws[..., 1] ** 2

    if mom > 2:
        # <(x - <x>)**3> = <x**3> - 3<x**2><x> + 2<x>**3
        output[..., 3] = (
            raws[..., 3] - 3 * raws[..., 2] * raws[..., 1] + 2 * raws[..., 1] ** 3
        )

    return output

import numpy as np

rng = np.random.default_rng(seed=0)

n = 1000
x = rng.random(n)

m1 = cmom_1(x)
m2 = cmom_2(x)
# same values
np.testing.assert_allclose(m1, m2)

OK, great. But what if we have unscaled data? For example, setting \(y = a x + b\) gives \(\ave{y} = a \ave{x} + b\), but for central moments, \(\ave{(y - \ave{y})^n} = \ave{(a x - b - \ave{a x - b})^n} = a^n \ave{(\delta x)^n}\)

# large value, small fluctuation
a = 0.1
b = 1e4
y = a * x + b

# expected value after shift
expected = m1.copy()
expected[..., 1] = a * expected[..., 1] + b
expected[..., 2:] *= a ** np.arange(2, 4)


m1_shift = cmom_1(y)
m2_shift = cmom_2(y)

m1_error = m1_shift - expected
m2_error = m2_shift - expected

print(
    f"""
abs error using central moments: {m1_error}
rel error using central moments: {m1_error / expected}
abs error using raw moments    : {m2_error}
rel error using raw moments    : {m2_error / expected}
"""
)

abs error using central moments: [ 0.0000e+00 -1.8190e-12 -8.8373e-16  3.5155e-15]
rel error using central moments: [ 0.0000e+00 -1.8190e-16 -1.0912e-12 -1.6856e-09]
abs error using raw moments    : [ 0.0000e+00  7.2760e-12 -7.2259e-08  9.7865e-04]
rel error using raw moments    : [ 0.0000e+00  7.2759e-16 -8.9221e-05 -4.6925e+02]

weighted averages#

cmomy is setup to work with arbitrary weights. We saw this work above when we considered unequal sample sizes. For example,

def get_cmom_with_weights(x, w, axis=0, mom=3):
    """Create central moments using stable method"""
    # output shape
    shape = x.shape[:axis] + x.shape[axis + 1 :] + (mom + 1,)
    output = np.zeros(shape, dtype=x.dtype)

    w_sum = w.sum(axis=axis)
    w_norm = 1.0 / w_sum

    output[..., 0] = w_sum
    output[..., 1] = ((w * x).sum(axis=axis)) * w_norm

    # moments [2 -> mom]
    output[..., 2:] = (
        w[..., None]
        * (x - (w * x).sum(axis=axis, keepdims=True) * w_norm)[..., None]
        ** np.arange(2, mom + 1)
    ).sum(axis=axis) * w_norm

    return output

x = rng.random(100)
w = rng.random(100)

moments = get_cmom_with_weights(x, w)
moments

cmomy.wrap_reduce_vals(x, weight=w, mom=3, axis=0)

array([4.8363e+01, 5.0295e-01, 9.2209e-02, 6.3664e-04])

comoments#

cmomy can also handle comoments (covariance, etc), like the \(\ave{(\delta x)^i (\delta y)^j}\)
For example

y = rng.random(100)

cmomy.wrap_reduce_vals(x, y, weight=w, mom=(2, 2), axis=0)

array([[ 4.8363e+01,  4.8516e-01,  7.5856e-02],
       [ 5.0295e-01, -1.2237e-02,  3.2919e-04],
       [ 9.2209e-02, -4.2304e-03,  6.5653e-03]])

All the special sauce works for central moments or comoments.

xarray DataArray support#

cmomy also works with xarray.DataArray and xarray.Dataset objects. For example:

import xarray as xr

x = xr.DataArray(rng.random((100, 2, 3)), dims=["rec", "a", "b"])
x.head(3)  # type: ignore[attr-defined]

<xarray.DataArray (rec: 3, a: 2, b: 3)> Size: 144B
array([[[0.0625, 0.5585, 0.0184],
        [0.4284, 0.4153, 0.0854]],

       [[0.0821, 0.587 , 0.0041],
        [0.7317, 0.3686, 0.2628]],

       [[0.951 , 0.0224, 0.627 ],
        [0.0179, 0.382 , 0.3126]]])
Dimensions without coordinates: rec, a, b

To create a cmomy object wrapping a xarray.DataArray, use CentralMomentsData

c = cmomy.wrap_reduce_vals(x, dim="rec", mom=3)
c

<xarray.DataArray (a: 2, b: 3, mom_0: 4)> Size: 192B
array([[[1.0000e+02, 4.0657e-01, 9.0470e-02, 1.3814e-02],
        [1.0000e+02, 4.9516e-01, 7.3811e-02, 1.1677e-03],
        [1.0000e+02, 4.6592e-01, 8.1210e-02, 3.3827e-03]],

       [[1.0000e+02, 4.8831e-01, 8.2432e-02, 1.6465e-04],
        [1.0000e+02, 4.6920e-01, 9.2950e-02, 5.5999e-03],
        [1.0000e+02, 4.8891e-01, 7.6961e-02, 2.3232e-03]]])
Dimensions without coordinates: a, b, mom_0

<xarray.DataArray (a: 2, b: 3, mom_0: 4)> Size: 192B
array([[[1.0000e+02, 4.0657e-01, 9.0470e-02, 1.3814e-02],
        [1.0000e+02, 4.9516e-01, 7.3811e-02, 1.1677e-03],
        [1.0000e+02, 4.6592e-01, 8.1210e-02, 3.3827e-03]],

       [[1.0000e+02, 4.8831e-01, 8.2432e-02, 1.6465e-04],
        [1.0000e+02, 4.6920e-01, 9.2950e-02, 5.5999e-03],
        [1.0000e+02, 4.8891e-01, 7.6961e-02, 2.3232e-03]]])
Dimensions without coordinates: a, b, mom_0

Most everything that CentralMomentsArray can do, CentralMomentsData can also do.

Why cmomy?

Contents

Why `cmomy`?#

An example#

So what?#

weighted averages#

comoments#

xarray DataArray support#

Why cmomy?

Contents

Why cmomy?#

An example#

So what?#

weighted averages#

comoments#

xarray DataArray support#

Why `cmomy`?#