Data Organization#

All of the extrapolation and interpolation models in the thermoextrap package expect input data to be organized in a certain fashion. To help manage the data, there are data objects that help organize it. Even the inputs to these data object, however, must be organized appropriately. Here we will use data from our ideal gas test system (from thermoextrap.idealgas) to demonstrate this organization, as well as the various options for what types of data that may be provided as input.

# need imports
%matplotlib inline
import numpy as np
import xarray as xr

# import thermoextrap
import thermoextrap as xtrap

# Import idealgas module
from thermoextrap import idealgas

# Define reference beta
beta_ref = 5.6

# And maximum order
order = 2

npart = 1000  # Number of particles (in single configuration)
nconfig = 100_000  # Number of configurations

# Generate all the data we could want
xdata, udata = idealgas.generate_data((nconfig, npart), beta_ref)

Refer to thermoextrap.data for more information on the data classes.

Basics#

Rather than passing data directly to __init__ methods for creating data class objects and simultaneously telling it which dimensions mean what (or expecting that specific dimensions mean a certain thing), thermoextrap uses xarray to label the dimensions of inputs. While this is also useful in the background, it helps to clarify what is expected of user inputs.

Currently, xdata is of the shape (nconfig), or the number of configurations generated with each entry being the average \(x\) location for the associated configuration.

print(nconfig, xdata.shape)

100000 (100000,)

The dimension over which independent samples vary is the “record” dimension, with its default name in thermoextrap.data being ‘rec’. So when we create an xarray.DataArray object to house the input \(x\) data, we must label that dimension ‘rec’. Same goes for the input potential energy data. Note that the list provided to the argument dims is a list of strings naming the dimensions in the array passed to xarray.DataArray.

xdata = xr.DataArray(xdata, dims=["rec"])
udata = xr.DataArray(udata, dims=["rec"])

Now when we create a data object in thermoextrap to hold the data, we tell it that the “record” dimension, rec_dim is named ‘rec’, which is the default, but it could be named something different as long as you provided that name to rec_dim.

Note that the xv is the argument for the observable \(x\) and uv is the potential energy or appropriate Hamiltonian or thermodynamic conjugate variable.

data = xtrap.DataCentralMomentsVals.from_vals(
    order=order, rec_dim="rec", xv=xdata, uv=udata, central=True
)

A couple more notes are in order about the inputs to any of the thermoextrap.data object variants. First, you only need to provide the order you expect to extrapolate to up front if you’re using the from_vals() constructor. This is because you need to specify the order of moments that will be calculated from the raw data.

The next argument to be aware of is central. This is True by default and tells the data object to work with central moments for calculating derivatives in the background, which it turns out is much more numerically stable than non-central moments. You probably want central to be True, but know that you can change it if you wish.

Data Organization

Contents

Data Organization#

Basics#

Data Structure#

Input formats and resampling#

Vector observables#