{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/usnistgov/AFL-agent/blob/main/docs/source/how-to/create_pipelineop.ipynb)\n", "\n", "# Creating a New Pipeline Operation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This guide will walk you through creating a custom Pipeline Operation in AFL-agent. Pipeline Operations are the building blocks of AFL-agent pipelines - they perform specific data transformations and analyses that can be chained together. By creating your own Pipeline Operation, you can extend AFL-agent's functionality to meet your specific needs.\n", "\n", "In this tutorial, we'll create a `PipelineOp`s that implements a new data normalization algorithm.\n", "\n", "\n", "To begin, only uncomment and run the next cell if you are running this notebook in Google Colab or if don't already have the AFL-agent package installed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !pip install git+https://github.com/usnistgov/AFL-agent.git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start the development by importing the parent PipelineOp class. All pipeline operations **must** import this directly or indirectly through another parent class." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from AFL.double_agent.PipelineOp import PipelineOp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's define the `PipelineOp` sub-class. It should have two methods:\n", "\n", "1. a constructor called __init__\n", "2. a method called calculate that takes a single argument" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "class MyNormalizer(PipelineOp):\n", " def __init__(self, input_variable, output_variable, name=\"MyNormalizer\"):\n", " # The PipelineOp constructor takes three arguments and stores them as attributes\n", " super().__init__(\n", " input_variable=input_variable,\n", " output_variable=output_variable,\n", " name=name\n", " )\n", "\n", " def calculate(self, dataset):\n", " # Extract the data variable to be normalized from the dataset\n", " data = self._get_variable(dataset)\n", "\n", " # Perform your normalization logic here\n", " normalized_data = data/data.max()\n", "\n", " # Store the normalized data in the output variable\n", " self.output[self.output_variable] = normalized_data\n", " self.output[self.output_variable].attrs[\"description\"] = \"Normalized data\"\n", "\n", " # All PipelineOps should return self\n", " return self" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's test it out! First we need to load data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 164kB\n",
       "Dimensions:              (sample: 100, component: 2, x: 150, grid: 2500)\n",
       "Coordinates:\n",
       "  * component            (component) <U1 8B 'A' 'B'\n",
       "  * x                    (x) float64 1kB 0.001 0.001047 0.001097 ... 0.9547 1.0\n",
       "Dimensions without coordinates: sample, grid\n",
       "Data variables:\n",
       "    composition          (sample, component) float64 2kB 5.7 1.36 ... 5.104\n",
       "    ground_truth_labels  (sample) int64 800B 1 1 0 1 0 1 1 1 ... 1 1 1 1 1 0 1 1\n",
       "    measurement          (sample, x) float64 120kB 1.915e+06 1.479e+06 ... 1.885\n",
       "    composition_grid     (grid, component) float64 40kB 0.0 0.0 ... 10.0 25.0
" ], "text/plain": [ " Size: 164kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500)\n", "Coordinates:\n", " * component (component) \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 284kB\n",
       "Dimensions:                 (sample: 100, component: 2, x: 150, grid: 2500)\n",
       "Coordinates:\n",
       "  * component               (component) <U1 8B 'A' 'B'\n",
       "  * x                       (x) float64 1kB 0.001 0.001047 ... 0.9547 1.0\n",
       "Dimensions without coordinates: sample, grid\n",
       "Data variables:\n",
       "    composition             (sample, component) float64 2kB 5.7 1.36 ... 5.104\n",
       "    ground_truth_labels     (sample) int64 800B 1 1 0 1 0 1 1 ... 1 1 1 1 0 1 1\n",
       "    measurement             (sample, x) float64 120kB 1.915e+06 ... 1.885\n",
       "    composition_grid        (grid, component) float64 40kB 0.0 0.0 ... 10.0 25.0\n",
       "    normalized_measurement  (sample, x) float64 120kB 0.7918 ... 7.794e-07
" ], "text/plain": [ " Size: 284kB\n", "Dimensions: (sample: 100, component: 2, x: 150, grid: 2500)\n", "Coordinates:\n", " * component (component) " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig,axes = plt.subplots(1,2,figsize=(8,3.25))\n", "\n", "ds_result.measurement.plot.line(x='x',xscale='log',yscale='log',ax=axes[0],add_legend=False)\n", "ds_result.normalized_measurement.plot.line(x='x',xscale='log',yscale='log',ax=axes[1],add_legend=False)\n", "\n", "axes[0].set(title=\"Raw Data\")\n", "axes[1].set(title=\"MyNormalized Data\")\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the differences in the y-axis!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we learned how to:\n", "\n", "- Create a custom `PipelineOp` by subclassing the base class\n", "- Define the required `calculate` method to implement our data processing logic\n", "- Add our new operation to a `Pipeline` and execute it on a dataset\n", "- Visualize the results to confirm our normalization worked as expected\n", "\n", "Custom `PipelineOp` classes allow you to extend AFL's functionality with your own data processing operations.\n" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 2 }