AFL.double_agent.Extrapolator module#
Extrapolation tools for extending discrete sample data to continuous spaces.
This module provides classes for extrapolating data from discrete sample points to continuous spaces, particularly useful in materials science and machine learning applications. The extrapolators can work with both classification and regression tasks.
Key features: - Support for Gaussian Process Classification and Regression - Handling of uncertainty in measurements - Visualization tools for extrapolation results - Flexible kernel selection for GP models - Support for different sample and grid dimensions
- class AFL.double_agent.Extrapolator.DummyExtrapolator(feature_input_variable: str, predictor_input_variable: str, output_prefix: str, grid_variable: str, grid_dim: str, sample_dim: str, name='DummyExtrapolator')#
Bases:
Extrapolator
Simple extrapolator that returns zero values.
This extrapolator serves as a baseline implementation, returning arrays of zeros for both mean and variance predictions. Useful for testing and as a template.
- Parameters:
feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.
predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. This is typically a class label or property variable.
output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset
grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.
grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.
sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
- calculate(dataset: Dataset) Self #
Apply this dummy extrapolator to the supplied dataset.
Creates arrays of zeros for both mean and variance predictions.
- Parameters:
dataset (xr.Dataset) – The input dataset containing the sample points and grid
- Returns:
The dummy extrapolator instance with zero-valued outputs
- Return type:
Self
- class AFL.double_agent.Extrapolator.Extrapolator(feature_input_variable: str, predictor_input_variable: str, output_variables: List[str], output_prefix: str, grid_variable: str, grid_dim: str, sample_dim: str, name: str = 'Extrapolator')#
Bases:
PipelineOp
Base class for extrapolating discrete sample data onto continuous spaces.
This abstract base class provides common functionality for extrapolating data from discrete sample points to a continuous grid. It handles data management and provides visualization capabilities.
- Parameters:
feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.
predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. This is typically a class label or property variable.
output_variables (List[str]) – The list of variables that will be output by this class.
output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset
grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.
grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.
sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
- calculate(dataset: Dataset) Self #
Apply this extrapolator to the supplied dataset.
This method must be implemented by subclasses to define how the extrapolation is performed.
- Parameters:
dataset (xr.Dataset) – The input dataset containing the sample points and grid
- Returns:
The extrapolator instance with updated outputs
- Return type:
Self
- plot(**mpl_kwargs) Figure #
Plot the extrapolation results.
Creates visualization of the extrapolated data, with different plotting styles depending on the data dimensions and type.
- Parameters:
**mpl_kwargs (dict) – Additional keyword arguments to pass to matplotlib plotting functions
- Returns:
The matplotlib figure containing the plots
- Return type:
plt.Figure
- class AFL.double_agent.Extrapolator.GaussianProcessClassifier(feature_input_variable: str, predictor_input_variable: str, output_prefix: str, grid_variable: str, grid_dim: str, sample_dim: str, kernel: str = 'Matern', kernel_kwargs: dict = {'length_scale': 1.0, 'nu': 1.5}, optimizer: str = 'fmin_l_bfgs_b', name: str = 'GaussianProcessClassifier')#
Bases:
Extrapolator
Gaussian Process classifier for extrapolating class labels.
This extrapolator uses scikit-learn’s GaussianProcessClassifier to predict class probabilities across the grid based on discrete labeled samples. It provides both class predictions and uncertainty estimates through entropy.
- Parameters:
feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.
predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. For this PipelineOp this should be a class label vector.
output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset
grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.
grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.
sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.
kernel (str) – The name of the sklearn.gaussian_process.kernel to use the classifier. If not provided, will default to Matern.
kernel_kwargs (dict) – Additional keyword arguments to pass to the sklearn.gaussian_process.kernel
optimizer (str) – The name of the optimizer to use in optimizer the gaussian process parameters
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
- calculate(dataset: Dataset) Self #
Apply this GP classifier to the supplied dataset.
Fits a Gaussian Process classifier to the input data and makes predictions across the grid, including class probabilities and entropy-based uncertainty.
- Parameters:
dataset (xr.Dataset) – The input dataset containing labeled samples and prediction grid
- Returns:
The GP classifier instance with predictions and uncertainty estimates
- Return type:
Self
- class AFL.double_agent.Extrapolator.GaussianProcessRegressor(feature_input_variable, predictor_input_variable, output_prefix, grid_variable, grid_dim, sample_dim, predictor_uncertainty_variable=None, optimizer='fmin_l_bfgs_b', kernel: str = 'Matern', kernel_kwargs: dict = {'length_scale': 1.0, 'nu': 1.5}, name='GaussianProcessRegressor', fix_nans=True)#
Bases:
Extrapolator
Gaussian Process regressor for extrapolating continuous values.
This extrapolator uses scikit-learn’s GaussianProcessRegressor to predict continuous values across the grid based on discrete samples. It handles measurement uncertainty and provides both mean predictions and variance estimates.
- Parameters:
feature_input_variable (str) – The name of the xarray.Dataset data variable to use as the input to the model that will be extrapolating the discrete data. This is typically a sample composition variable.
predictor_input_variable (str) – The name of the xarray.Dataset data variable to use as the output of the model that will be extrapolating the discrete data. For this PipelineOp this should be a continuous value vector.
output_prefix (str) – The string prefix to apply to each output variable before inserting into the output xarray.Dataset
grid_variable (str) – The name of the xarray.Dataset data variable to use as an evaluation grid.
grid_dim (str) – The xarray dimension over each grid_point. Grid equivalent to sample.
sample_dim (str) – The xarray dimension over the discrete ‘samples’ in the feature_input_variable. This is typically a variant of sample e.g., saxs_sample.
predictor_uncertainty_variable (str | None) – Variable containing uncertainty estimates for the predictor values
optimizer (str) – The name of the optimizer to use in optimizer the gaussian process parameters
kernel (str | None) – The name of the sklearn.gaussian_process.kernel to use the regressor. If not provided, will default to Matern.
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
fix_nans (bool) – Whether to handle NaN values in the input data
- calculate(dataset: Dataset) Self #
Apply this GP regressor to the supplied dataset.
Fits a Gaussian Process regressor to the input data and makes predictions across the grid, including mean values and variance estimates. Can handle heteroscedastic noise if uncertainty values are provided.
- Parameters:
dataset (xr.Dataset) – The input dataset containing samples and prediction grid
- Returns:
The GP regressor instance with predictions and uncertainty estimates
- Return type:
Self