{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/usnistgov/AFL-agent/blob/main/docs/source/tutorials/quickstart.ipynb)\n", "\n", "# Quickstart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook demonstrates how to use AFL-agent to analyze measurement data and identify different phases. We'll create a simple pipeline that:\n", "\n", "1. Calculates derivatives of measurement data using Savitzky-Golay filtering\n", "2. Computes similarity between measurements\n", "3. Uses spectral clustering to group similar measurements into phases\n", "\n", "We'll work with synthetic data that simulates two different types of signals - a flat background and a power law decay, both with added noise.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Google Colab Setup\n", "\n", "Only uncomment and run the next cell if you are running this notebook in Google Colab or if don't already have the AFL-agent package installed." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# !pip install git+https://github.com/usnistgov/AFL-agent.git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define Pipeline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from AFL.double_agent import *\n", "\n", "with Pipeline() as clustering_pipeline:\n", "\n", " SavgolFilter(\n", " input_variable='measurement', \n", " output_variable='derivative', \n", " dim='x', \n", " derivative=1,\n", " name='TakeDerivative'\n", " )\n", "\n", " Similarity(\n", " input_variable='derivative', \n", " output_variable='similarity', \n", " sample_dim='sample',\n", " params={'metric': 'laplacian','gamma':1e-4},\n", " name='ComputeSimilarity'\n", " )\n", " \n", " SpectralClustering(\n", " input_variable='similarity',\n", " output_variable='labels',\n", " dim='sample',\n", " params={'n_phases': 2},\n", " name='SpectralClustering'\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pipeline above consists of three operations:\n", "\n", "1. `SavgolFilter`: Applies Savitzky-Golay filtering to calculate derivatives of the measurement data along the x-dimension. This helps identify changes in the signal shape.\n", "\n", "2. `Similarity`: Computes pairwise similarity between measurements using their derivatives. It uses a Laplacian kernel with gamma=1e-4 to quantify how similar each measurement is to every other measurement.\n", "\n", "3. `SpectralClustering`: Groups measurements into 2 phases based on their similarity scores. Measurements with high similarity will be grouped into the same phase.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Input Data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test this pipeline, we need some data to analyze. We'll use a synthetic dataset containing measurements from a two-phase system. Each measurement represents a signal collected from a sample with different compositions of components A and B. For details on how this dataset was created, see the [Building xarray Datasets](../how-to/building_xarray_datasets.ipynb) tutorial.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 164kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 0.001097 ... 0.9547 1.0\n", "Dimensions without coordinates: sample, grid\n", "Data variables:\n", " composition (sample, component) float64 2kB ...\n", " ground_truth_labels (sample) int64 800B ...\n", " measurement (sample, x) float64 120kB ...\n", " composition_grid (grid, component) float64 40kB ...
<xarray.Dataset> Size: 446kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500,\n", " log_x: 250, sample_i: 100, sample_j: 100)\n", "Coordinates:\n", " * component (component) <U1 8B 'A' 'B'\n", " * x (x) float64 1kB 0.001 0.001047 0.001097 ... 0.9547 1.0\n", " * log_x (log_x) float64 2kB -3.0 -2.988 -2.976 ... -0.01205 0.0\n", "Dimensions without coordinates: sample, grid, sample_i, sample_j\n", "Data variables:\n", " composition (sample, component) float64 2kB ...\n", " ground_truth_labels (sample) int64 800B ...\n", " measurement (sample, x) float64 120kB ...\n", " composition_grid (grid, component) float64 40kB ...\n", " derivative (sample, log_x) float64 200kB -3.828 -3.85 ... -0.04054\n", " similarity (sample_i, sample_j) float64 80kB 1.0 0.9962 ... 1.0\n", " labels (sample) int64 800B 0 0 0 0 0 1 0 0 ... 0 1 0 0 1 0 0 0