{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "0", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import logging\n", "import warnings\n", "\n", "import cmomy\n", "import numpy as np\n", "\n", "rng = cmomy.random.default_rng(0)\n", "\n", "np.set_printoptions(precision=4)\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.ERROR)" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "# Data Organization\n", "\n", "\n", "All of the extrapolation and interpolation models in the thermoextrap package expect input data to be organized in a certain fashion. To help manage the data, there are data objects that help organize it. Even the inputs to these data object, however, must be organized appropriately. Here we will use data from our ideal gas test system (from {mod}`thermoextrap.idealgas`) to demonstrate this organization, as well as the various options for what types of data that may be provided as input." ] }, { "cell_type": "code", "execution_count": 2, "id": "2", "metadata": {}, "outputs": [], "source": [ "# need imports\n", "%matplotlib inline\n", "import numpy as np\n", "import xarray as xr\n", "\n", "# import thermoextrap\n", "import thermoextrap as xtrap\n", "\n", "# Import idealgas module\n", "from thermoextrap import idealgas" ] }, { "cell_type": "code", "execution_count": 3, "id": "3", "metadata": {}, "outputs": [], "source": [ "# Define reference beta\n", "beta_ref = 5.6\n", "\n", "# And maximum order\n", "order = 2\n", "\n", "npart = 1000 # Number of particles (in single configuration)\n", "nconfig = 100_000 # Number of configurations\n", "\n", "# Generate all the data we could want\n", "xdata, udata = idealgas.generate_data((nconfig, npart), beta_ref)" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "Refer to {mod}`thermoextrap.data` for more information on the data classes." ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "## Basics\n", "\n", "Rather than passing data directly to `__init__` methods for creating data class objects and simultaneously telling it which dimensions mean what (or expecting that specific dimensions mean a certain thing), {mod}`thermoextrap` uses {mod}`xarray` to label the dimensions of inputs. While this is also useful in the background, it helps to clarify what is expected of user inputs.\n", "\n", "Currently, `xdata` is of the shape (nconfig), or the number of configurations generated with each entry being the average $x$ location for the associated configuration." ] }, { "cell_type": "code", "execution_count": 4, "id": "6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100000 (100000,)\n" ] } ], "source": [ "print(nconfig, xdata.shape)" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "The dimension over which independent samples vary is the \"record\" dimension, with its default name in {mod}`thermoextrap.data` being 'rec'. So when we create an {class}`xarray.DataArray` object to house the input $x$ data, we must label that dimension 'rec'. Same goes for the input potential energy data. Note that the list provided to the argument `dims` is a list of strings naming the dimensions in the array passed to {class}`xarray.DataArray`." ] }, { "cell_type": "code", "execution_count": 5, "id": "8", "metadata": {}, "outputs": [], "source": [ "xdata = xr.DataArray(xdata, dims=[\"rec\"])\n", "udata = xr.DataArray(udata, dims=[\"rec\"])" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "Now when we create a data object in {mod}`thermoextrap` to hold the data, we tell it that the \"record\" dimension, `rec_dim` is named 'rec', which is the default, but it could be named something different as long as you provided that name to `rec_dim`.\n", "\n", "Note that the `xv` is the argument for the observable $x$ and `uv` is the potential energy or appropriate Hamiltonian or thermodynamic conjugate variable." ] }, { "cell_type": "code", "execution_count": 6, "id": "10", "metadata": {}, "outputs": [], "source": [ "data = xtrap.DataCentralMomentsVals.from_vals(\n", " order=order, rec_dim=\"rec\", xv=xdata, uv=udata, central=True\n", ")" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "A couple more notes are in order about the inputs to any of the {mod}`thermoextrap.data` object variants. First, you only need to provide the order you expect to extrapolate to up front if you're using the {meth}`~thermoextrap.data.DataCentralMomentsVals.from_vals` constructor. This is because you need to specify the order of moments that will be calculated from the raw data.\n", "\n", "The next argument to be aware of is `central`. This is True by default and tells the data object to work with central moments for calculating derivatives in the background, which it turns out is much more numerically stable than non-central moments. You probably want `central` to be True, but know that you can change it if you wish." ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "## Data Structure\n", "\n", "```{eval-rst}\n", ".. currentmodule:: thermoextrap.data\n", "```\n", "\n", "\n", "\n", "A lot of data is already computed as soon as we create our data object. The original raw data is still stored in {attr}`~DataCentralMomentsVals.xv` and {attr}`~DataCentralMomentsVals.uv`, and order is {attr}`~DataCentralMomentsVals.order`, but we can already see the central moments appearing if we look at..." ] }, { "cell_type": "code", "execution_count": 7, "id": "13", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.DataArray (rec: 100000)> Size: 800kB\n", "array([0.181 , 0.1687, 0.1713, ..., 0.1763, 0.1761, 0.1829])\n", "Dimensions without coordinates: rec
<xarray.DataArray ()> Size: 8B\n", "array(0.1748)
<xarray.DataArray (umom: 3)> Size: 24B\n", "array([1.0000e+00, 1.7485e+02, 3.0601e+04])\n", "Dimensions without coordinates: umom
<xarray.DataArray (umom: 3)> Size: 24B\n", "array([ 1. , 0. , 28.2438])\n", "Dimensions without coordinates: umom
<xarray.DataArray (umom: 3)> Size: 24B\n", "array([1.7485e-01, 3.0601e+01, 5.3604e+03])\n", "Dimensions without coordinates: umom
<xarray.DataArray (umom: 3)> Size: 24B\n", "array([0. , 0.0282, 0.0078])\n", "Dimensions without coordinates: umom
<xarray.DataArray (xmom: 2, umom: 3)> Size: 48B\n", "array([[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [1.7485e-01, 2.8244e-02, 7.8139e-03]])\n", "Dimensions without coordinates: xmom, umom
<xarray.DataArray (xmom: 2, umom: 3)> Size: 48B\n", "array([[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [1.7485e-01, 2.8244e-02, 7.8139e-03]])\n", "Dimensions without coordinates: xmom, umom
<xarray.DataArray (xmom: 2, umom: 3)> Size: 48B\n", "array([[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [1.7485e-01, 2.8244e-02, 7.8139e-03]])\n", "Dimensions without coordinates: xmom, umom
<xarray.DataArray (rep: 3, xmom: 2, umom: 3)> Size: 144B\n", "array([[[1.0000e+05, 1.7485e+02, 2.8276e+01],\n", " [1.7485e-01, 2.8276e-02, 7.8536e-03]],\n", "\n", " [[1.0000e+05, 1.7483e+02, 2.8195e+01],\n", " [1.7483e-01, 2.8195e-02, 6.9730e-03]],\n", "\n", " [[1.0000e+05, 1.7487e+02, 2.8451e+01],\n", " [1.7487e-01, 2.8451e-02, 9.0861e-03]]])\n", "Dimensions without coordinates: rep, xmom, umom
<xarray.DataArray (rec: 100, xmom: 2, umom: 3)> Size: 5kB\n", "array([[[ 1.0000e+03, 1.7504e+02, 2.8681e+01],\n", " [ 1.7504e-01, 2.8681e-02, 1.0778e-02]],\n", "\n", " [[ 1.0000e+03, 1.7487e+02, 2.9507e+01],\n", " [ 1.7487e-01, 2.9507e-02, 3.6761e-03]],\n", "\n", " [[ 1.0000e+03, 1.7471e+02, 2.8956e+01],\n", " [ 1.7471e-01, 2.8956e-02, 3.4057e-02]],\n", "\n", " [[ 1.0000e+03, 1.7470e+02, 2.9232e+01],\n", " [ 1.7470e-01, 2.9232e-02, -6.4238e-03]],\n", "\n", " [[ 1.0000e+03, 1.7469e+02, 2.8072e+01],\n", " [ 1.7469e-01, 2.8072e-02, 5.0937e-03]],\n", "\n", " [[ 1.0000e+03, 1.7457e+02, 2.6426e+01],\n", " [ 1.7457e-01, 2.6426e-02, 3.2884e-03]],\n", "\n", " [[ 1.0000e+03, 1.7480e+02, 2.8580e+01],\n", " [ 1.7480e-01, 2.8580e-02, 2.9648e-03]],\n", "...\n", " [[ 1.0000e+03, 1.7491e+02, 2.8840e+01],\n", " [ 1.7491e-01, 2.8840e-02, 1.4932e-02]],\n", "\n", " [[ 1.0000e+03, 1.7478e+02, 2.7380e+01],\n", " [ 1.7478e-01, 2.7380e-02, 2.4111e-02]],\n", "\n", " [[ 1.0000e+03, 1.7492e+02, 2.6600e+01],\n", " [ 1.7492e-01, 2.6600e-02, -3.0268e-03]],\n", "\n", " [[ 1.0000e+03, 1.7489e+02, 2.8386e+01],\n", " [ 1.7489e-01, 2.8386e-02, -8.0114e-03]],\n", "\n", " [[ 1.0000e+03, 1.7477e+02, 2.6906e+01],\n", " [ 1.7477e-01, 2.6906e-02, -2.0863e-04]],\n", "\n", " [[ 1.0000e+03, 1.7486e+02, 2.6945e+01],\n", " [ 1.7486e-01, 2.6945e-02, 2.7821e-02]],\n", "\n", " [[ 1.0000e+03, 1.7496e+02, 2.7179e+01],\n", " [ 1.7496e-01, 2.7179e-02, -1.3813e-02]]])\n", "Dimensions without coordinates: rec, xmom, umom
<xarray.DataArray (rep: 3, xmom: 2, umom: 3)> Size: 144B\n", "array([[[1.0000e+05, 1.7485e+02, 2.8257e+01],\n", " [1.7485e-01, 2.8257e-02, 8.0428e-03]],\n", "\n", " [[1.0000e+05, 1.7485e+02, 2.8462e+01],\n", " [1.7485e-01, 2.8462e-02, 7.3831e-03]],\n", "\n", " [[1.0000e+05, 1.7485e+02, 2.8340e+01],\n", " [1.7485e-01, 2.8340e-02, 9.6959e-03]]])\n", "Dimensions without coordinates: rep, xmom, umom
<xarray.DataArray (vals: 2, xmom: 2, umom: 3)> Size: 96B\n", "array([[[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [1.7485e-01, 2.8244e-02, 7.8139e-03]],\n", "\n", " [[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [3.0601e-02, 9.8847e-03, 4.3329e-03]]])\n", "Coordinates:\n", " * vals (vals) <U3 24B 'x' 'xsq'\n", "Dimensions without coordinates: xmom, umom
<xarray.DataArray (rep: 3, vals: 2, xmom: 2, umom: 3)> Size: 288B\n", "array([[[[1.0000e+05, 1.7484e+02, 2.8261e+01],\n", " [1.7484e-01, 2.8261e-02, 7.9176e-03]],\n", "\n", " [[1.0000e+05, 1.7484e+02, 2.8261e+01],\n", " [3.0598e-02, 9.8903e-03, 4.3704e-03]]],\n", "\n", "\n", " [[[1.0000e+05, 1.7482e+02, 2.8255e+01],\n", " [1.7482e-01, 2.8255e-02, 7.9522e-03]],\n", "\n", " [[1.0000e+05, 1.7482e+02, 2.8255e+01],\n", " [3.0590e-02, 9.8870e-03, 4.3943e-03]]],\n", "\n", "\n", " [[[1.0000e+05, 1.7485e+02, 2.8214e+01],\n", " [1.7485e-01, 2.8214e-02, 6.9934e-03]],\n", "\n", " [[1.0000e+05, 1.7485e+02, 2.8214e+01],\n", " [3.0601e-02, 9.8735e-03, 4.0474e-03]]]])\n", "Coordinates:\n", " * vals (vals) <U3 24B 'x' 'xsq'\n", "Dimensions without coordinates: rep, xmom, umom" ], "text/plain": [ "
<xarray.DataArray (vals: 2, xmom: 2, umom: 3)> Size: 96B\n", "array([[[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [1.7485e-01, 2.8244e-02, 7.8139e-03]],\n", "\n", " [[1.0000e+05, 1.7485e+02, 2.8244e+01],\n", " [3.0601e-02, 9.8847e-03, 4.3329e-03]]])\n", "Coordinates:\n", " * vals (vals) <U3 24B 'x' 'xsq'\n", "Dimensions without coordinates: xmom, umom" ], "text/plain": [ "
<xarray.DataArray (rep: 3, vals: 2, xmom: 2, umom: 3)> Size: 288B\n", "array([[[[1.0000e+05, 1.7485e+02, 2.8203e+01],\n", " [1.7485e-01, 2.8203e-02, 6.0452e-03]],\n", "\n", " [[1.0000e+05, 1.7485e+02, 2.8203e+01],\n", " [3.0600e-02, 9.8685e-03, 3.7024e-03]]],\n", "\n", "\n", " [[[1.0000e+05, 1.7486e+02, 2.8246e+01],\n", " [1.7486e-01, 2.8246e-02, 8.5305e-03]],\n", "\n", " [[1.0000e+05, 1.7486e+02, 2.8246e+01],\n", " [3.0604e-02, 9.8867e-03, 4.5961e-03]]],\n", "\n", "\n", " [[[1.0000e+05, 1.7486e+02, 2.8203e+01],\n", " [1.7486e-01, 2.8203e-02, 1.1006e-02]],\n", "\n", " [[1.0000e+05, 1.7486e+02, 2.8203e+01],\n", " [3.0603e-02, 9.8740e-03, 5.4435e-03]]]])\n", "Coordinates:\n", " * vals (vals) <U3 24B 'x' 'xsq'\n", "Dimensions without coordinates: rep, xmom, umom" ], "text/plain": [ "