AFL.double_agent.Extrapolator module#

Extrapolation tools for extending discrete sample data to continuous spaces.

This module provides classes for extrapolating data from discrete sample points to continuous spaces, particularly useful in materials science and machine learning applications. The extrapolators can work with both classification and regression tasks.

Key features: - Support for Gaussian Process Classification and Regression - Handling of uncertainty in measurements - Visualization tools for extrapolation results - Flexible kernel selection for GP models - Support for different sample and grid dimensions

class AFL.double_agent.Extrapolator.DummyExtrapolator(feature_input_variable: str, predictor_input_variable: str, output_prefix: str, grid_variable: str, grid_dim: str, sample_dim: str, name='DummyExtrapolator')#

Bases: Extrapolator

Simple extrapolator that returns zero values.

This extrapolator serves as a baseline implementation, returning arrays of zeros for both mean and variance predictions. Useful for testing and as a template.

Parameters:
  • feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.

  • predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. This is typically a class label or property variable.

  • output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset

  • grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.

  • grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.

  • sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.

  • name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()

calculate(dataset: Dataset) Self#

Apply this dummy extrapolator to the supplied dataset.

Creates arrays of zeros for both mean and variance predictions.

Parameters:

dataset (xr.Dataset) – The input dataset containing the sample points and grid

Returns:

The dummy extrapolator instance with zero-valued outputs

Return type:

Self

class AFL.double_agent.Extrapolator.Extrapolator(feature_input_variable: str, predictor_input_variable: str, output_variables: List[str], output_prefix: str, grid_variable: str, grid_dim: str, sample_dim: str, name: str = 'Extrapolator')#

Bases: PipelineOp

Base class for extrapolating discrete sample data onto continuous spaces.

This abstract base class provides common functionality for extrapolating data from discrete sample points to a continuous grid. It handles data management and provides visualization capabilities.

Parameters:
  • feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.

  • predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. This is typically a class label or property variable.

  • output_variables (List[str]) – The list of variables that will be output by this class.

  • output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset

  • grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.

  • grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.

  • sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.

  • name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()

calculate(dataset: Dataset) Self#

Apply this extrapolator to the supplied dataset.

This method must be implemented by subclasses to define how the extrapolation is performed.

Parameters:

dataset (xr.Dataset) – The input dataset containing the sample points and grid

Returns:

The extrapolator instance with updated outputs

Return type:

Self

plot(**mpl_kwargs) Figure#

Plot the extrapolation results.

Creates visualization of the extrapolated data, with different plotting styles depending on the data dimensions and type.

Parameters:

**mpl_kwargs (dict) – Additional keyword arguments to pass to matplotlib plotting functions

Returns:

The matplotlib figure containing the plots

Return type:

plt.Figure

class AFL.double_agent.Extrapolator.GaussianProcessClassifier(feature_input_variable: str, predictor_input_variable: str, output_prefix: str, grid_variable: str, grid_dim: str, sample_dim: str, kernel: str = 'Matern', kernel_kwargs: dict = {'length_scale': 1.0, 'nu': 1.5}, optimizer: str = 'fmin_l_bfgs_b', name: str = 'GaussianProcessClassifier')#

Bases: Extrapolator

Gaussian Process classifier for extrapolating class labels.

This extrapolator uses scikit-learn’s GaussianProcessClassifier to predict class probabilities across the grid based on discrete labeled samples. It provides both class predictions and uncertainty estimates through entropy.

Parameters:
  • feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.

  • predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. For this PipelineOp this should be a class label vector.

  • output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset

  • grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.

  • grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.

  • sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.

  • kernel (str) – The name of the sklearn.gaussian_process.kernel to use the classifier. If not provided, will default to Matern.

  • kernel_kwargs (dict) – Additional keyword arguments to pass to the sklearn.gaussian_process.kernel

  • optimizer (str) – The name of the optimizer to use in optimizer the gaussian process parameters

  • name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()

calculate(dataset: Dataset) Self#

Apply this GP classifier to the supplied dataset.

Fits a Gaussian Process classifier to the input data and makes predictions across the grid, including class probabilities and entropy-based uncertainty.

Parameters:

dataset (xr.Dataset) – The input dataset containing labeled samples and prediction grid

Returns:

The GP classifier instance with predictions and uncertainty estimates

Return type:

Self

class AFL.double_agent.Extrapolator.GaussianProcessRegressor(feature_input_variable, predictor_input_variable, output_prefix, grid_variable, grid_dim, sample_dim, predictor_uncertainty_variable=None, optimizer='fmin_l_bfgs_b', kernel: str = 'Matern', kernel_kwargs: dict = {'length_scale': 1.0, 'nu': 1.5}, name='GaussianProcessRegressor', fix_nans=True)#

Bases: Extrapolator

Gaussian Process regressor for extrapolating continuous values.

This extrapolator uses scikit-learn’s GaussianProcessRegressor to predict continuous values across the grid based on discrete samples. It handles measurement uncertainty and provides both mean predictions and variance estimates.

Parameters:
  • feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.

  • predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. For this PipelineOp this should be a continuous value vector.

  • output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset

  • grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.

  • grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.

  • sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.

  • predictor_uncertainty_variable (str | None) – Variable containing uncertainty estimates for the predictor values

  • optimizer (str) – The name of the optimizer to use in optimizer the gaussian process parameters

  • kernel (str | None) – The name of the sklearn.gaussian_process.kernel to use the regressor. If not provided, will default to Matern.

  • name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()

  • fix_nans (bool) – Whether to handle NaN values in the input data

calculate(dataset: Dataset) Self#

Apply this GP regressor to the supplied dataset.

Fits a Gaussian Process regressor to the input data and makes predictions across the grid, including mean values and variance estimates. Can handle heteroscedastic noise if uncertainty values are provided.

Parameters:

dataset (xr.Dataset) – The input dataset containing samples and prediction grid

Returns:

The GP regressor instance with predictions and uncertainty estimates

Return type:

Self