Build an xarray.Dataset from Scratch#

In this How-To we’ll go through the process of building up an xarray.Dataset that could be used as an input to Pipeline.calculate. We’ll generate random compositions and fake data to go along with these compositions.

The dataset generated in this notebook is the basis for the Building Pipelines tutorial.

Google Colab Setup#

Only uncomment and run the next cell if you are running this notebook in Google Colab or if don’t already have the AFL-agent package installed.

[ ]:

# !pip install git+https://github.com/usnistgov/AFL-agent.git

First Steps#

To begin, let’s import the necessary libraries for this document and then make an empty :py:class:xarray.Dataset

[1]:

import numpy as np
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

ds = xr.Dataset()
ds

[1]:

<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

Saving the Dataset to disk#

We can save this dataset to disk for use in other notebooks or to memorialize the input data used in a calculation. We’ll use the netcdf format for this:

[11]:

ds.to_netcdf('../data/example_dataset.nc')

Conclusion#

In this notebook, we demonstrated how to build an xarray.Dataset from scratch.

We:

Created an empty dataset
Added composition data for samples
Added ground truth labels for the samples
Added simulated measurement data
Added a composition grid for the agent to explore
Saved the dataset to disk in netCDF format

The resulting dataset contains all the necessary components for training and evaluating an active learning agent:

Sample compositions and their corresponding measurements
Ground truth labels for validation
A grid defining the composition space for exploration

This dataset structure represents a typical format expected by many agent pipelines in AFL.double_agent. The exact variables and variable names will change with the pipeline, but the concept of having measurement data and composition information that shares dimensions is a foundational feature of analyzing formulations and materials problems where the composition is varying.

Build an xarray.Dataset from Scratch#

Google Colab Setup#

First Steps#

Compositions#

Simulated Measurement Data#

Composition Grid#

Saving the Dataset to disk#

Conclusion#

This Page