{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/usnistgov/AFL-agent/blob/main/docs/source/tutorials/quickstart.ipynb)\n", "\n", "# Quickstart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook demonstrates how to use AFL-agent to analyze measurement data and identify different phases. We'll create a simple pipeline that:\n", "\n", "1. Calculates derivatives of measurement data using Savitzky-Golay filtering\n", "2. Computes similarity between measurements\n", "3. Uses spectral clustering to group similar measurements into phases\n", "\n", "We'll work with synthetic data that simulates two different types of signals - a flat background and a power law decay, both with added noise.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Google Colab Setup\n", "\n", "Only uncomment and run the next cell if you are running this notebook in Google Colab or if don't already have the AFL-agent package installed." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# !pip install git+https://github.com/usnistgov/AFL-agent.git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define Pipeline" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from AFL.double_agent import *\n", "\n", "with Pipeline() as clustering_pipeline:\n", "\n", " SavgolFilter(\n", " input_variable='measurement', \n", " output_variable='derivative', \n", " dim='x', \n", " derivative=1\n", " )\n", "\n", " Similarity(\n", " input_variable='derivative', \n", " output_variable='similarity', \n", " sample_dim='sample',\n", " params={'metric': 'laplacian','gamma':1e-4}\n", " )\n", " \n", " SpectralClustering(\n", " input_variable='similarity',\n", " output_variable='labels',\n", " dim='sample',\n", " params={'n_phases': 2}\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pipeline above consists of three operations:\n", "\n", "1. `SavgolFilter`: Applies Savitzky-Golay filtering to calculate derivatives of the measurement data along the x-dimension. This helps identify changes in the signal shape.\n", "\n", "2. `Similarity`: Computes pairwise similarity between measurements using their derivatives. It uses a Laplacian kernel with gamma=1e-4 to quantify how similar each measurement is to every other measurement.\n", "\n", "3. `SpectralClustering`: Groups measurements into 2 phases based on their similarity scores. Measurements with high similarity will be grouped into the same phase.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Input Data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test this pipeline, we need some data to analyze. We'll use a synthetic dataset containing measurements from a two-phase system. Each measurement represents a signal collected from a sample with different compositions of components A and B. For details on how this dataset was created, see the [Building xarray Datasets](../how-to/building_xarray_datasets.ipynb) tutorial.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 164kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 0.001097 ... 0.9547 1.0\n", "Dimensions without coordinates: sample, grid\n", "Data variables:\n", " composition (sample, component) float64 2kB ...\n", " ground_truth_labels (sample) int64 800B ...\n", " measurement (sample, x) float64 120kB ...\n", " composition_grid (grid, component) float64 40kB ...
<xarray.Dataset> Size: 184kB\n", "Dimensions: (sample: 50, x: 150, log_x: 250, sample_i: 50, sample_j: 50)\n", "Coordinates:\n", " * x (x) float64 1kB 0.001 0.001047 0.001097 ... 0.9114 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 -2.976 ... -0.0241 -0.01205 0.0\n", "Dimensions without coordinates: sample, sample_i, sample_j\n", "Data variables:\n", " measurement (sample, x) float64 60kB 1.667e+06 1.417e+06 ... 1.736 2.463\n", " derivative (sample, log_x) float64 100kB -3.262 -3.317 ... -0.2221 -0.2423\n", " similarity (sample_i, sample_j) float64 20kB 1.0 0.9496 ... 0.9958 1.0\n", " labels (sample) int64 400B 0 1 1 1 1 1 0 0 0 1 ... 0 0 0 1 1 0 1 1 1 1