Building Pipelines#

Here we’ll go into more detail on the Quick Start Example from Getting Started. In this example, we’ll build a pipeline that

standardized the input compositions to improve the convergence of the Gaussian Process optimization
uses a Savitzky-Golay filter to compute the first derivative of the measurement
computes the similarity between the derivatives of the measurement data
clusters (i.e., labels) the data using spectral clustering
fits a Gaussian Process classifier to the data.
chooses the next optimal measurement based on the entropy of the Gaussian Process posterior

Setup#

Only uncomment and run the next cell if you are running this notebook in Google Colab or if don’t already have the AFL-agent package installed.

[ ]:

# !pip install git+https://github.com/usnistgov/AFL-agent.git

Below are the imported modules used in this tutorial

[1]:

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

from AFL.double_agent import *
from AFL.double_agent.plotting import *

Full Pipeline#

With that, we have a full Pipeline which defines the behavior of a decision agent! Let’s view the whole pipeline defined in a single context:

[23]:

with Pipeline() as my_first_pipeline:

    Standardize(
        input_variable='composition',
        output_variable='normalized_composition',
        dim='sample',
        component_dim='component',
        min_val={'A':0.0,'B':0.0},
        max_val={'A':10.0,'B':25.0},
    )

    Standardize(
        input_variable='composition_grid',
        output_variable='normalized_composition_grid',
        dim='grid',
        component_dim='component',
        min_val={'A':0.0,'B':0.0},
        max_val={'A':10.0,'B':25.0},
    )

    SavgolFilter(
        input_variable='measurement',
        output_variable='derivative',
        dim='x',
        derivative=1
        )

    Similarity(
        input_variable='derivative',
        output_variable='similarity',
        sample_dim='sample',
        params={'metric': 'laplacian','gamma':1e-4}
        )

    SpectralClustering(
        input_variable='similarity',
        output_variable='labels',
        dim='sample',
        params={'n_phases': 2}
        )


    GaussianProcessClassifier(
        feature_input_variable='normalized_composition',
        predictor_input_variable='labels',
        output_prefix='extrap',
        sample_dim='sample',
        grid_variable='normalized_composition_grid',
        grid_dim='grid',
    )

    MaxValueAF(
        input_variables=['extrap_entropy'],
        output_variable='next_sample',
        grid_variable='composition_grid',
    )

my_first_pipeline.print()

PipelineOp                               input_variable ---> output_variable
----------                               -----------------------------------
0  ) <Standardize>                       composition ---> normalized_composition
1  ) <Standardize>                       composition_grid ---> normalized_composition_grid
2  ) <SavgolFilter>                      measurement ---> derivative
3  ) <SimilarityMetric>                  derivative ---> similarity
4  ) <SpectralClustering>                similarity ---> labels
5  ) <GaussianProcessClassifier>         ['normalized_composition', 'labels', 'normalized_composition_grid'] ---> ['extrap_mean', 'extrap_entropy']
6  ) <MaxValueAF>                        ['extrap_entropy', 'composition_grid'] ---> next_sample

Input Variables
---------------
0) composition
1) composition_grid
2) measurement

Output Variables
----------------
0) extrap_mean
1) next_sample

We can also visualize the full pipeline using the .draw and .draw_plotly methods

[24]:

my_first_pipeline.draw();

../_images/tutorials_building_pipelines_65_0.png

While this doesn’t always produce the more visually appealling graphs, it is a powerful way to check the consistency and flow of complex pipelines.

Conclusion#

In this tutorial, we learned how to build pipelines in using AFL.double_agent by:

Creating a new pipeline using Pipeline()
Adding data processing steps like normalization and derivative calculation
Implementing spectral clustering for phase identification
Using Gaussian Process classification to extrapolate phase boundaries
Adding active learning with acquisition functions to guide further sampling
Visualizing the pipeline structure and results at each step

The pipeline we built demonstrates a complete workflow - from raw data processing through machine learning and active learning. This modular approach allows us to easily modify individual components while maintaining a clear data flow between steps.

For more examples of AFL pipelines and components, check out the other tutorials and examples in the documentation.

Building Pipelines#

Setup#

Load Input Data#

Step 1: Composition Preprocessing#

Step 2: Savitsky-Golay Filter#

Step 2: Calculate Similarity between Measurement Data#

Step 3: Cluster Measurement Data based on Similarity#

Step 4: Extrapolate Cluster Labels#

Step 5: Calculate Next Sample#

Full Pipeline#

Conclusion#

This Page