AFL.double_agent.Generator module#
Data generation tools for creating synthetic datasets and sampling spaces.
This module provides classes for generating various types of data structures commonly used in materials science and machine learning applications. The generators can create regular grids, compositional spaces, and specialized point distributions.
Key features: - Cartesian grid generation with flexible specifications - Barycentric grid generation for compositional spaces - Gaussian point distributions for exclusion zones - Support for multi-dimensional spaces - Integration with xarray data structures
- class AFL.double_agent.Generator.BarycentricGrid(output_variable: str, components: List[str], sample_dim: str, pts_per_row: int = 50, basis: float = 1.0, dim: int = 3, eps: float = 1e-09, name='BarycentricGridGenerator')#
Bases:
Generator
Generator that produces a grid in barycentric coordinates.
Creates a grid suitable for compositional spaces where the sum of components must equal a fixed value (typically 1.0). The grid is generated by systematically sampling points that satisfy the barycentric constraint.
- Parameters:
output_variable (str) – The name of the variable to be inserted into the dataset
components (List[str]) – List of component names for the compositional space
sample_dim (str) – Name of the dimension for different samples/points
pts_per_row (int, default=50) – Number of points to sample along each row of the simplex
basis (float, default=1.0) – The sum constraint for the compositions (typically 1.0)
dim (int, default=3) – Number of dimensions in the compositional space
eps (float, default=1e-9) – Small value for numerical stability in equality comparisons
name (str, default="BarycentricGridGenerator") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Generate the barycentric grid.
Creates a grid of points that satisfy the barycentric constraint by systematically sampling the simplex space.
- Parameters:
dataset (xr.Dataset) – The input dataset (not used by this generator)
- Returns:
The generator instance with the created barycentric grid
- Return type:
Self
- class AFL.double_agent.Generator.CartesianGrid(output_variable: str, grid_spec: Dict[str, Dict[str, int | float]], sample_dim: str, component_dim: str = 'component', name: str = 'CartesianGridGenerator')#
Bases:
Generator
Generator that produces a cartesian grid according to user-provided specifications.
Creates a regular grid in N-dimensional space where each dimension can have its own min, max, and step size specifications. The resulting grid contains all possible combinations of points along each dimension.
- Parameters:
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset
grid_spec (Dict[str, Dict[str, int | float]]) – Dictionary where each top-level key corresponds to a component in the system. Each top-level key points to a subdictionary that defines the minimum, maximum, and step size for that component with keys: min, max, steps.
sample_dim (str) – Name of the dimension for different samples/points in the grid
component_dim (str, default='component') – Name of the dimension for different components
name (str, default="CartesianGridGenerator") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Generate the cartesian grid based on specifications.
Creates a grid by taking the cartesian product of points along each dimension as specified in the grid_spec.
- Parameters:
dataset (xr.Dataset) – The input dataset (not used by this generator)
- Returns:
The generator instance with the created grid
- Return type:
Self
- class AFL.double_agent.Generator.GaussianPoints(input_variable: str, sample_dim: str, output_variable: str, grid_variable: str, grid_dim: str, comps_dim: str = 'component', exclusion_depth: float = 0.001, exclusion_radius: float = 0.001, name: str = 'GaussianPointsGenerator')#
Bases:
Generator
Generator that creates Gaussian-distributed points for exclusion zones.
This generator places Gaussian distributions centered at specified points, useful for creating exclusion zones or smooth transitions around specific locations in the sampling space.
- Parameters:
input_variable (str) – The name of the variable containing points to center Gaussians around
sample_dim (str) – Name of the dimension for different samples/points
output_variable (str) – The name of the variable to be inserted into the dataset
grid_variable (str) – The name of the grid variable to evaluate Gaussians on
grid_dim (str) – Name of the grid dimension
comps_dim (str, default="component") – Name of the components dimension
exclusion_depth (float, default=1e-3) – Maximum value of the Gaussian distributions
exclusion_radius (float, default=1e-3) – Width parameter for the Gaussian distributions
name (str, default="GaussianPointsGenerator") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Generate Gaussian-distributed points.
Places multivariate normal distributions centered at each input point, creating a field of Gaussian peaks that can be used for exclusion zones or smooth transitions.
- Parameters:
dataset (xr.Dataset) – The input dataset containing points to center Gaussians around and the grid to evaluate them on
- Returns:
The generator instance with the created Gaussian field
- Return type:
Self
- class AFL.double_agent.Generator.Generator(output_variable: str, input_variable: str = 'Generator', name: str = 'GeneratorBase')#
Bases:
PipelineOp
Base class for all data generation operations.
This abstract base class provides common functionality for generating synthetic data or sampling spaces. Unlike most PipelineOps, Generators typically don’t require input data but instead create new data based on parameters.
- Parameters:
input_variable (str) – Generators generally do not use input variables but this can be used to name the input node for a generator
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
- calculate(dataset: Dataset) Self #
Apply this generator to the supplied dataset.
This method must be implemented by subclasses to define how the data generation is performed.
- Parameters:
dataset (xr.Dataset) – The input dataset (typically not used by generators)
- Returns:
The generator instance with generated outputs
- Return type:
Self