{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/usnistgov/AFL-agent/blob/main/docs/source/tutorials/building_pipelines.ipynb)\n", "\n", "# Building Pipelines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we'll go into more detail on the Quick Start Example from [Getting Started](getting_started.rst). In this example, we'll build a pipeline that \n", "\n", "- standardized the input compositions to improve the convergence of the Gaussian Process optimization\n", "- uses a Savitzky-Golay filter to compute the first derivative of the measurement\n", "- computes the similarity between the derivatives of the measurement data\n", "- clusters (i.e., labels) the data using spectral clustering\n", "- fits a Gaussian Process classifier to the data.\n", "- chooses the next optimal measurement based on the entropy of the Gaussian Process posterior \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "Only uncomment and run the next cell if you are running this notebook in Google Colab or if don't already have the AFL-agent package installed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !pip install git+https://github.com/usnistgov/AFL-agent.git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below are the imported modules used in this tutorial" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import xarray as xr\n", "import matplotlib.pyplot as plt\n", "\n", "from AFL.double_agent import *\n", "from AFL.double_agent.plotting import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Input Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, to begin, we'll load in a pre-prepared `xarray.Dataset`. These are powerful and flexible data structures for working with multi-dimensional data, and `AFL.double_agent` uses them for all input, intermediate and output data.\n", "\n", "The dataset below contains simulated measurement data along with the compositions that this simulated data was generated at. It also has the ground truth phase labelse along with a grid of compositions that the agent will search through for the next optimal measurement. \n", "\n", "To see how this dataset is created, see [Building xarray.Datasets](../how-to/building_xarray_datasets>) page.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 164kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 0.001097 ... 0.9547 1.0\n", "Dimensions without coordinates: sample, grid\n", "Data variables:\n", " composition (sample, component) float64 2kB 5.7 1.36 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 1 1 ... 1 1 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 1.479e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 10.0 25.0
<xarray.Dataset> Size: 205kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n", "Dimensions without coordinates: sample, grid\n", "Data variables:\n", " composition (sample, component) float64 2kB 5.7 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 25.0\n", " normalized_composition (sample, component) float64 2kB 0.57 ... 0.2042\n", " normalized_composition_grid (grid, component) float64 40kB 0.0 0.0 ... 1.0
<xarray.Dataset> Size: 407kB\n", "Dimensions: (sample: 100, component: 2, x: 150,\n", " grid: 2500, log_x: 250)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 ... 0.0\n", "Dimensions without coordinates: sample, grid\n", "Data variables:\n", " composition (sample, component) float64 2kB 5.7 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 25.0\n", " normalized_composition (sample, component) float64 2kB 0.57 ... 0.2042\n", " normalized_composition_grid (grid, component) float64 40kB 0.0 0.0 ... 1.0\n", " derivative (sample, log_x) float64 200kB -3.82 ... -0.4063
<xarray.Dataset> Size: 487kB\n", "Dimensions: (sample: 100, component: 2, x: 150,\n", " grid: 2500, log_x: 250, sample_i: 100,\n", " sample_j: 100)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 ... 0.0\n", "Dimensions without coordinates: sample, grid, sample_i, sample_j\n", "Data variables:\n", " composition (sample, component) float64 2kB 5.7 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 25.0\n", " normalized_composition (sample, component) float64 2kB 0.57 ... 0.2042\n", " normalized_composition_grid (grid, component) float64 40kB 0.0 0.0 ... 1.0\n", " derivative (sample, log_x) float64 200kB -3.82 ... -0.4063\n", " similarity (sample_i, sample_j) float64 80kB 1.0 ... 1.0
<xarray.Dataset> Size: 488kB\n", "Dimensions: (sample: 100, component: 2, x: 150,\n", " grid: 2500, log_x: 250, sample_i: 100,\n", " sample_j: 100)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 ... 0.0\n", "Dimensions without coordinates: sample, grid, sample_i, sample_j\n", "Data variables:\n", " composition (sample, component) float64 2kB 5.7 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 25.0\n", " normalized_composition (sample, component) float64 2kB 0.57 ... 0.2042\n", " normalized_composition_grid (grid, component) float64 40kB 0.0 0.0 ... 1.0\n", " derivative (sample, log_x) float64 200kB -3.82 ... -0.4063\n", " similarity (sample_i, sample_j) float64 80kB 1.0 ... 1.0\n", " labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1
<xarray.Dataset> Size: 548kB\n", "Dimensions: (sample: 100, component: 2, x: 150,\n", " grid: 2500, log_x: 250, sample_i: 100,\n", " sample_j: 100)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 ... 0.0\n", "Dimensions without coordinates: sample, grid, sample_i, sample_j\n", "Data variables:\n", " composition (sample, component) float64 2kB 5.7 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 25.0\n", " normalized_composition (sample, component) float64 2kB 0.57 ... 0.2042\n", " normalized_composition_grid (grid, component) float64 40kB 0.0 0.0 ... 1.0\n", " derivative (sample, log_x) float64 200kB -3.82 ... -0.4063\n", " similarity (sample_i, sample_j) float64 80kB 1.0 ... 1.0\n", " labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " extrap_mean (grid) int64 20kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1\n", " extrap_entropy (grid) float64 20kB 0.5813 0.5687 ... 0.4603\n", " extrap_y_prob (grid) float64 20kB 0.5813 0.5687 ... 0.4603
<xarray.Dataset> Size: 568kB\n", "Dimensions: (sample: 100, component: 2, x: 150,\n", " grid: 2500, log_x: 250, sample_i: 100,\n", " sample_j: 100, AF_sample: 1)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 ... 0.0\n", "Dimensions without coordinates: sample, grid, sample_i, sample_j, AF_sample\n", "Data variables: (12/14)\n", " composition (sample, component) float64 2kB 5.7 ... 5.104\n", " ground_truth_labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " measurement (sample, x) float64 120kB 1.915e+06 ... 1.885\n", " composition_grid (grid, component) float64 40kB 0.0 0.0 ... 25.0\n", " normalized_composition (sample, component) float64 2kB 0.57 ... 0.2042\n", " normalized_composition_grid (grid, component) float64 40kB 0.0 0.0 ... 1.0\n", " ... ...\n", " labels (sample) int64 800B 1 1 0 1 0 1 ... 1 1 1 0 1 1\n", " extrap_mean (grid) int64 20kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1\n", " extrap_entropy (grid) float64 20kB 0.5813 0.5687 ... 0.4603\n", " extrap_y_prob (grid) float64 20kB 0.5813 0.5687 ... 0.4603\n", " decision_surface (grid) float64 20kB 0.7655 0.7391 ... 0.512\n", " next_sample (AF_sample, component) float64 16B 4.082 24.49