{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "0", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import logging\n", "import warnings\n", "\n", "import numpy as np\n", "\n", "np.set_printoptions(precision=4)\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.ERROR)" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "# Temperature independent average observable\n", "\n", "We first extrapolate our ideal gas test system in temperature and compare to the exact result. We also compare results with finite numbers of samples to exact results for an infinite number of samples. This is possible because we can analytically calculate derivatives for the ideal gas model at any order rather than estimating them.\n", "\n", "The first thing we want to do is define some parameters and generate data. We will define a reference inverse temperature, specific orders we want to extrapolate to, and create some reference data. The {func}`thermoextrap.idealgas.generate_data` function conveniently provides random samples of $x$ and $U$ data. This will be in the format of an array with `nconfig` elements, each being the average $x$ value over $N$ independent ideal gas particles. The relevant potential energy is an array of the same shape with a single entry for each configuration. More generally, the data that should be provided to the extrapolation/interpolation code can have any number of columns but should have a single row associated with each simulation snapshot and potential energy. This allows for simultaneous extrapolation of all points in RDFs or time correlation functions with the interparticle distance or time varying with column. Or for simultaneous extrapolation of multiple observables. Critical, however, is that the potential energy (or Hamiltonian) is that which is appropriate to the snapshot with the same index. So for dynamical quantities like time correlation functions, the appropriate Hamiltonian is that for the starting configuration. Another main point here is that the number of columns is arbitrary, with each extrapolated separately, but because of the way the code is vectorized you get much better efficiency by providing multiple columns rather than running the extrapolation multiple times for different observables.\n", "\n", "For more information on the structure of data and other options for inputting data, please see the [Data_Organization](./Data_Organization.ipynb) notebook." ] }, { "cell_type": "code", "execution_count": 2, "id": "2", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import cmomy\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "# Import idealgas module\n", "from thermoextrap import idealgas\n", "\n", "# set random seed\n", "rng = cmomy.random.default_rng(0)\n", "\n", "# Define test betas and reference beta\n", "betas = np.arange(0.1, 10.0, 0.5)\n", "beta_ref = betas[11]\n", "vol = 1.0\n", "\n", "# Define orders to extrapolate to\n", "orders = [1, 2, 4, 6]\n", "order = orders[-1]\n", "\n", "npart = 1000 # Number of particles (in single configuration)\n", "nconfig = 100_000 # Number of configurations\n", "\n", "# Generate all the data we could want\n", "xdata, udata = idealgas.generate_data((nconfig, npart), beta_ref, vol)" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "All the code below relies on inputting xarray objects. This is easily applied." ] }, { "cell_type": "code", "execution_count": 3, "id": "4", "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "\n", "xdata = xr.DataArray(xdata, dims=[\"rec\"])\n", "udata = xr.DataArray(udata, dims=[\"rec\"])" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "We now want to create and train our extrapolation and perturbation models that we will compare. By default the {class}`~thermoextrap.models.ExtrapModel` or other classes work with the simplest case of directly extrapolating an observable that is an average quantity and does not explicitly depend on temperature. Note that this covers quantities that include dependence on kinetic degrees of freedom, with the potential energy simply substituted for the full Hamiltonian (total energy). Or it works for extrapolating in temperature or pressure in the NPT ensemble - again you just have to insert the appropriate Hamiltonian instead of the potential energy. To be specific, that means that if we want to extrapolate with respect to $\\beta$ in the NPT ensemble, we need to provide $U + pV$ instead of just $U$. To extrapolate with respect to $p$ in the NPT, we only provide $\\beta V$ at each snapshot. The math then works out the same way and all of this code can be re-used. We handle other cases, like explicit temperature dependence of the quantity to be averaged, in other notebooks." ] }, { "cell_type": "code", "execution_count": 4, "id": "6", "metadata": {}, "outputs": [], "source": [ "import thermoextrap as xtrap" ] }, { "cell_type": "code", "execution_count": 5, "id": "7", "metadata": {}, "outputs": [], "source": [ "# First, create a model around the data - see \"Data_Organization\" notebook\n", "data = xtrap.DataCentralMomentsVals.from_vals(\n", " order=orders[-1], rec_dim=\"rec\", xv=xdata, uv=udata, central=True\n", ")\n", "\n", "# Create extrapolation and perturbation models\n", "xem = xtrap.beta.factory_extrapmodel(beta_ref, data)\n", "\n", "xpm = xtrap.beta.factory_perturbmodel(beta_ref, uv=udata, xv=xdata)" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "Let's test some predictions and look at the results. Notice that you can make predictions over scalars or arrays of values, but the extrapolation will only be 1D in the variable of interest (e.g. temperature, pressure, etc.). Extrapolations in multiple dimensions of state variables is not yet supported." ] }, { "cell_type": "code", "execution_count": 6, "id": "9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.045\n" ] } ], "source": [ "x = rng.random()\n", "print(f\"{x:.3f}\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Extrapolation: -23.750\n", "Perturbation:, 0.199\n", "\n", "\n", "At lower order\n", "--------------\n", "Extrapolation: 0.448\n", "Perturbation: 0.199\n", "\n", "\n", "For multiple beta values\n", "------------------------\n", "Extrapolation: [0.4484 0.4137 0.3811 0.3503]\n", "Perturbation: [0.1986 0.1986 0.1986 0.1986]\n", "\n" ] } ], "source": [ "# Make some predictions\n", "print(\n", " f\"\"\"\n", "Extrapolation: {xem.predict(betas[0]).values:.3f}\n", "Perturbation:, {xpm.predict(betas[0]).values:.3f}\n", "\"\"\"\n", ")\n", "\n", "# By default, uses maximum order, which here is inaccurate due to limited sampling (see figure below)\n", "# But can switch to any lower order if desired\n", "# Remember, order doesn't matter for perturbation\n", "print(\n", " f\"\"\"\n", "At lower order\n", "--------------\n", "Extrapolation: {xem.predict(betas[0], order=2).values:.3f}\n", "Perturbation: {xpm.predict(betas[0]).values:.3f}\n", "\"\"\"\n", ")\n", "\n", "# Can also make predictions for multiple values\n", "print(\n", " f\"\"\"\n", "For multiple beta values\n", "------------------------\n", "Extrapolation: {xem.predict(betas[:4], order=2).values}\n", "Perturbation: {xpm.predict(betas[:4]).values}\n", "\"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "All of the extrapolation/interpolation model classes come with a way to resample data efficiently to allow for bootstrapped uncertainty estimates. The {meth}`thermoextrap.data.DataCentralMomentsVals.resample` function actually resamples the data and returns a new extrapolation/interpolation class object, meaning that the necessary coefficients are automatically recomputed. However, the coefficients are computed separately for the different bootstrapped samples, which appear as different entries in the \"replication\" or \"rep\" dimension. In what we show below, this makes it easy to create bootstrap resampled data, then compute statistics on the coefficients or the predictions using built in statistical functions in xarray (e.g., {meth}`xarray.DataArray.mean` and {meth}`xarray.DataArray.std`)." ] }, { "cell_type": "code", "execution_count": 8, "id": "12", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.\n" ] }, { "data": { "text/html": [ "
<xarray.DataArray (order: 3, rep: 100)> Size: 2kB\n", "array([[ 0.1749, 0.1748, 0.1749, 0.1748, 0.1748, 0.1748, 0.1749,\n", " 0.1749, 0.1749, 0.1748, 0.1749, 0.1748, 0.1748, 0.1749,\n", " 0.1748, 0.1749, 0.1749, 0.1749, 0.1749, 0.1749, 0.1748,\n", " 0.1749, 0.1749, 0.1749, 0.1748, 0.1748, 0.1749, 0.1748,\n", " 0.1748, 0.1748, 0.1748, 0.1748, 0.1749, 0.1748, 0.1748,\n", " 0.1749, 0.1749, 0.1748, 0.1748, 0.1749, 0.1749, 0.1748,\n", " 0.1749, 0.1748, 0.1748, 0.1749, 0.1749, 0.1748, 0.1748,\n", " 0.1749, 0.1749, 0.1749, 0.1749, 0.1749, 0.1748, 0.1748,\n", " 0.1749, 0.1749, 0.1748, 0.1749, 0.1749, 0.1748, 0.1748,\n", " 0.1748, 0.1748, 0.1749, 0.1749, 0.1749, 0.1749, 0.1748,\n", " 0.1749, 0.1749, 0.1748, 0.1748, 0.1749, 0.1749, 0.1748,\n", " 0.1748, 0.1748, 0.1748, 0.1748, 0.1748, 0.1748, 0.1748,\n", " 0.1748, 0.1748, 0.1748, 0.1749, 0.1748, 0.1748, 0.1749,\n", " 0.1748, 0.1748, 0.1749, 0.1749, 0.1748, 0.1749, 0.1748,\n", " 0.1749, 0.1748],\n", " [-0.0283, -0.0282, -0.0285, -0.0283, -0.0283, -0.0282, -0.0283,\n", " -0.0282, -0.028 , -0.0281, -0.0282, -0.028 , -0.028 , -0.0281,\n", " -0.0284, -0.0281, -0.0281, -0.0284, -0.0284, -0.0284, -0.0281,\n", " -0.0282, -0.0282, -0.0282, -0.0283, -0.0284, -0.0281, -0.0282,\n", " -0.0282, -0.0282, -0.0283, -0.0284, -0.0283, -0.0282, -0.028 ,\n", "...\n", " -0.0282, -0.0284, -0.0282, -0.0283, -0.0283, -0.0284, -0.0281,\n", " -0.0283, -0.0284, -0.0282, -0.0282, -0.0281, -0.028 , -0.0282,\n", " -0.0283, -0.0284, -0.0286, -0.0284, -0.0283, -0.0284, -0.0285,\n", " -0.0282, -0.028 , -0.0282, -0.0283, -0.0283, -0.0283, -0.0279,\n", " -0.0282, -0.0282],\n", " [ 0.0039, 0.0035, 0.0045, 0.004 , 0.004 , 0.0035, 0.0038,\n", " 0.0049, 0.0032, 0.0036, 0.0034, 0.0041, 0.0044, 0.0039,\n", " 0.0041, 0.0046, 0.0041, 0.0035, 0.0035, 0.0035, 0.0043,\n", " 0.004 , 0.0033, 0.0048, 0.0038, 0.0038, 0.0034, 0.0041,\n", " 0.0035, 0.0042, 0.005 , 0.0038, 0.0054, 0.0037, 0.0048,\n", " 0.0039, 0.0039, 0.0047, 0.0048, 0.0043, 0.004 , 0.0041,\n", " 0.0041, 0.0032, 0.0041, 0.0047, 0.0044, 0.0044, 0.0037,\n", " 0.0042, 0.0038, 0.0049, 0.0046, 0.0026, 0.0045, 0.0038,\n", " 0.003 , 0.0045, 0.0034, 0.0048, 0.004 , 0.0049, 0.0027,\n", " 0.0031, 0.0041, 0.0045, 0.0039, 0.0024, 0.0037, 0.0043,\n", " 0.0037, 0.0042, 0.0051, 0.0027, 0.004 , 0.0037, 0.0047,\n", " 0.0036, 0.0053, 0.0046, 0.0046, 0.0039, 0.0033, 0.004 ,\n", " 0.0038, 0.0045, 0.0034, 0.0044, 0.0051, 0.004 , 0.0041,\n", " 0.0049, 0.004 , 0.0042, 0.0042, 0.0045, 0.004 , 0.0038,\n", " 0.0043, 0.0043]])\n", "Dimensions without coordinates: order, rep
<xarray.DataArray (beta: 4, rep: 100)> Size: 3kB\n", "array([[0.4493, 0.4352, 0.4688, 0.4502, 0.4498, 0.4348, 0.4462, 0.479 ,\n", " 0.4265, 0.439 , 0.432 , 0.454 , 0.4629, 0.449 , 0.4553, 0.4682,\n", " 0.4535, 0.4369, 0.436 , 0.4376, 0.459 , 0.4509, 0.4312, 0.4752,\n", " 0.4438, 0.4473, 0.4336, 0.4542, 0.4375, 0.4569, 0.4828, 0.4449,\n", " 0.4927, 0.442 , 0.4728, 0.4487, 0.447 , 0.4724, 0.4741, 0.4612,\n", " 0.4509, 0.4549, 0.4543, 0.4262, 0.4525, 0.4733, 0.4642, 0.4617,\n", " 0.4421, 0.4574, 0.4469, 0.4785, 0.4709, 0.4086, 0.4672, 0.4435,\n", " 0.4214, 0.4644, 0.434 , 0.4761, 0.4524, 0.4804, 0.4124, 0.4243,\n", " 0.4547, 0.4657, 0.4483, 0.4023, 0.4426, 0.4629, 0.4414, 0.4581,\n", " 0.4833, 0.4119, 0.4506, 0.4435, 0.4723, 0.4388, 0.4927, 0.4696,\n", " 0.468 , 0.4471, 0.4289, 0.4512, 0.4441, 0.4671, 0.4355, 0.4654,\n", " 0.4848, 0.4518, 0.4542, 0.4766, 0.4504, 0.4585, 0.4571, 0.4674,\n", " 0.4505, 0.4427, 0.4601, 0.4613],\n", " [0.4145, 0.4029, 0.4307, 0.4152, 0.4149, 0.4025, 0.412 , 0.439 ,\n", " 0.3956, 0.4059, 0.4001, 0.4183, 0.4257, 0.4142, 0.4196, 0.4301,\n", " 0.4179, 0.4044, 0.4036, 0.4049, 0.4224, 0.4158, 0.3995, 0.4359,\n", " 0.4099, 0.4129, 0.4014, 0.4186, 0.4047, 0.4207, 0.4422, 0.4109,\n", " 0.4504, 0.4085, 0.4338, 0.414 , 0.4126, 0.4334, 0.435 , 0.4244,\n", " 0.4159, 0.4192, 0.4187, 0.3955, 0.4171, 0.4343, 0.4269, 0.4247,\n", " 0.4086, 0.4213, 0.4125, 0.4386, 0.4324, 0.3809, 0.4293, 0.4097,\n", "...\n", " 0.377 , 0.3872, 0.3801, 0.4012, 0.3962, 0.3545, 0.3937, 0.3778,\n", " 0.363 , 0.3917, 0.3715, 0.3997, 0.3839, 0.4026, 0.3569, 0.3649,\n", " 0.3854, 0.3927, 0.381 , 0.3501, 0.3771, 0.391 , 0.3764, 0.3877,\n", " 0.4044, 0.3567, 0.3826, 0.3779, 0.397 , 0.3747, 0.4108, 0.3952,\n", " 0.3941, 0.3801, 0.3678, 0.3829, 0.3782, 0.3937, 0.3727, 0.3926,\n", " 0.4055, 0.3835, 0.3851, 0.3999, 0.3823, 0.3878, 0.3869, 0.3938,\n", " 0.3825, 0.377 , 0.3889, 0.3897],\n", " [0.3509, 0.3433, 0.3614, 0.3513, 0.3511, 0.3431, 0.3493, 0.3665,\n", " 0.3386, 0.3452, 0.3416, 0.353 , 0.3578, 0.3505, 0.3542, 0.3607,\n", " 0.3529, 0.3445, 0.3439, 0.3448, 0.3558, 0.3516, 0.3412, 0.3644,\n", " 0.3479, 0.3499, 0.3423, 0.3534, 0.3445, 0.3547, 0.3686, 0.3487,\n", " 0.3738, 0.3469, 0.3629, 0.3505, 0.3495, 0.3627, 0.3639, 0.3572,\n", " 0.3518, 0.354 , 0.3536, 0.3387, 0.3523, 0.3634, 0.3589, 0.3572,\n", " 0.3472, 0.3552, 0.3496, 0.3662, 0.3623, 0.3293, 0.3604, 0.3477,\n", " 0.3361, 0.3587, 0.3428, 0.3651, 0.3527, 0.3675, 0.3312, 0.3375,\n", " 0.3539, 0.3596, 0.3502, 0.3258, 0.3471, 0.3583, 0.3467, 0.3557,\n", " 0.3687, 0.3311, 0.3516, 0.3479, 0.3628, 0.3453, 0.3739, 0.3615,\n", " 0.3606, 0.3495, 0.3397, 0.3518, 0.3481, 0.3604, 0.3438, 0.3595,\n", " 0.3697, 0.3524, 0.3536, 0.3652, 0.3512, 0.3557, 0.355 , 0.3604,\n", " 0.3515, 0.347 , 0.3565, 0.3571]])\n", "Coordinates:\n", " * beta (beta) float64 32B 0.1 0.6 1.1 1.6\n", " dalpha (beta) float64 32B -5.5 -5.0 -4.5 -4.0\n", " beta0 float64 8B 5.6\n", "Dimensions without coordinates: rep
<xarray.DataArray (beta: 4)> Size: 32B\n", "array([0.4527, 0.4173, 0.384 , 0.3526])\n", "Coordinates:\n", " * beta (beta) float64 32B 0.1 0.6 1.1 1.6\n", " dalpha (beta) float64 32B -5.5 -5.0 -4.5 -4.0\n", " beta0 float64 8B 5.6
<xarray.DataArray (beta: 4)> Size: 32B\n", "array([0.0178, 0.0147, 0.0119, 0.0094])\n", "Coordinates:\n", " * beta (beta) float64 32B 0.1 0.6 1.1 1.6\n", " dalpha (beta) float64 32B -5.5 -5.0 -4.5 -4.0\n", " beta0 float64 8B 5.6