AFL.double_agent.Preprocessor module#

PipelineOps for Data Preprocessing

This module contains preprocessing operations that transform, normalize, and prepare data for analysis. Preprocessors handle tasks such as: - Scaling and normalizing data - Transforming between coordinate systems - Filtering and smoothing signals - Extracting features from raw measurements - Converting between different data representations

Each preprocessor is implemented as a PipelineOp that can be composed with others in a processing pipeline.

class AFL.double_agent.Preprocessor.ArrayToVars(input_variable: str, output_variables: list, split_dim: list, postfix: str = '', squeeze: bool = False, name: str = 'DatasetToVars')#

Bases: Preprocessor

Convert an array into multiple variables

Parameters:

input_variable (str) – The name of the array variable to split into separate variables
output_variables (list) – The names of the variables to create from the array
split_dim (str) – The dimension to split the array along
postfix (str, default='') – String to append to output variable names
squeeze (bool, default=False) – Whether to squeeze out single-dimension axes
name (str, default='DatasetToVars') – The name to use when added to a Pipeline

calculate(dataset)#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.BarycentricToTernaryXY(input_variable: str, output_variable: str, sample_dim: str, name: str = 'BarycentricToTernaryXY')#

Bases: Preprocessor

Transform from ternary coordinates to xy coordinates

Note —- Adapted from BaryCentric transform mpltern: yuzie007/mpltern

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
sample_dim (str) – The dimension to use when calculating the data minimum
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search(

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.Destandardize(input_variable: str, output_variable: str, dim: str, component_dim: str | None = 'component', scale_variable: str | None = None, min_val: Number | None = None, max_val: Number | None = None, name: str = 'Destandardize')#

Bases: Preprocessor

Transform the data from 0->1 scaling

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension used for calculating the data minimum
component_dim (Optional[str], default="component") – The dimension for component-wise operations
scale_variable (Optional[str], default=None) – If specified, the min/max of this data variable in the supplied xarray.Dataset will be used to scale the data rather than min/max of the input_variable or the supplied min_val or max_val
min_val (Optional[Number], default=None) – Value used to scale the data minimum
max_val (Optional[Number], default=None) – Value used to scale the data maximum
name (str, default="Destandardize") – The name to use when added to a Pipeline

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.Extrema(input_variable: str, output_variable: str, dim: str, return_coords: bool = False, operator='max', slice: List | None = None, slice_dim: str | None = None, name: str = 'Extrema')#

Bases: Preprocessor

Find the extrema of a data variable

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.Preprocessor(input_variable: str = None, output_variable: str = None, name: str = 'PreprocessorBase')#

Bases: PipelineOp

Base class stub for all preprocessors

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.SavgolFilter(input_variable: str, output_variable: str, dim: str = 'q', xlo: Number | None = None, xhi: Number | None = None, xlo_isel: int | None = None, xhi_isel: int | None = None, pedestal: Number | None = None, npts: int = 250, derivative: int = 0, window_length: int = 31, polyorder: int = 2, apply_log_scale: bool = True, name: str = 'SavgolFilter')#

Bases: Preprocessor

Smooth and take derivatives of input data via a Savitsky-Golay filter

This PipelineOp cleans measurement data and takes smoothed derivatives using scipy.signal.savgol_filter. Below is a summary of the steps taken.

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension in the xarray.Dataset to apply this filter over
xlo (Optional[Number]) – The values of the input dimension (dim, above) to trim the data to
xhi (Optional[Number]) – The values of the input dimension (dim, above) to trim the data to
xlo_isel (Optional[int]) – The integer indices of the input dimension (dim, above) to trim the data to
xhi_isel (Optional[int]) – The integer indices of the input dimension (dim, above) to trim the data to
pedestal (Optional[Number]) – This value is added to the input_variable to establish a fixed data ‘floor’
npts (int) – The size of the grid to interpolate onto
derivative (int) – The order of the derivative to return. If derivative=0, the data is smoothed with no derivative taken.
window_length (int) – The width of the window used in the savgol smoothing. See scipy.signal.savgol_filter for more information.
polyorder (int) – The order of polynomial used in the savgol smoothing. See scipy.signal.savgol_filter for more information.
apply_log_scale (bool) – If True, the input_variable and associated dim coordinated are scaled with numpy.log10
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()

Notes

This PipelineOp performs the following steps:

1. Data is trimmed to (xlo, xhi) and then (xlo_isel, xhi_isel) in that order. The former trims the data to a numerical while the latter trims to integer indices. It is generally not advisable to supply both varieties and a warning will be raised if this is attempted.

2. If apply_log_scale = True, both the input_variable and dim data will be scaled with numpy.log10. A new xarray dimension and coordinate will be created with the name log_{dim}.

3. All duplicate data (multiple data values at the same dim coordinates) are removed by taking the average of the duplicates.

If pedestal is specified, the pedestal value is added to the data and all NaNs are filled with the pedestal
The data is interpolated onto a constant grid with npts values from the trimmed minimum to the trimmed maximum. If apply_log_scale=True, the grid is geometrically rather than linearly spaced.
All remaining NaN values are dropped along dim

7. Finally, scipy.signal.savgol_filter is applied with the window_length, polyorder, and derivative parameters specified in the constructor.

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.Standardize(input_variable: str, output_variable: str, dim: str, component_dim: str | None = 'component', scale_variable: str | None = None, min_val: Number | None = None, max_val: Number | None = None, name: str = 'Standardize')#

Bases: Preprocessor

Standardize the data to have min 0 and max 1

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension used for calculating the data minimum
component_dim (Optional[str], default="component") – The dimension for component-wise operations
scale_variable (Optional[str], default=None) – If specified, the min/max of this data variable in the supplied xarray.Dataset will be used to scale the data rather than min/max of the input_variable or the supplied min_val or max_val
min_val (Optional[Number], default=None) – Value used to scale the data minimum
max_val (Optional[Number], default=None) – Value used to scale the data maximum
name (str, default="Standardize") – The name to use when added to a Pipeline

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.Subtract(input_variable: str, output_variable: str, dim: str, value: float | str, coord_value: bool = True, name: str = 'Subtract')#

Bases: Preprocessor

Baseline input variable by subtracting a value

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.SubtractMin(input_variable: str, output_variable: str, dim: str, name: str = 'SubtractMin')#

Bases: Preprocessor

Baseline input variable by subtracting minimum value

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.SympyTransform(input_variable: str, output_variable: str, sample_dim: str, transforms: Dict[str, object], transform_dim: str, component_dim: str = 'component', name: str = 'SympyTransform')#

Bases: Preprocessor

Transform data using sympy expressions

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
sample_dim (str) – The sample dimension i.e., the dimension of compositions or grid points
component_dim (str, default="component") – The dimension of the component of each gridpoint
transforms (Dict[str,object]) – A dictionary of transforms (sympy expressions) to evaluate to generate new variables. For this method to function, the transforms must be completely specified except for the names in component_dim of the input_variable
transform_dim (str) – The name of the dimension that the ‘component_dim’ will be transformed to
name (str, default="SympyTransform") – The name to use when added to a Pipeline

Example

```python from AFL.double_agent import * import sympy with Pipeline() as p:

CartesianGrid(
output_variable=’comps’, grid_spec={

‘A’:{‘min’:1,’max’:25,’steps’:5}, ‘B’:{‘min’:1,’max’:25,’steps’:5}, ‘C’:{‘min’:1,’max’:25,’steps’:5},

}, sample_dim=’grid’

)

A,B,C = sympy.symbols(‘A B C’) vA = A/(A+B+C) vB = B/(A+B+C) vC = C/(A+B+C) SympyTransform(

input_variable=’comps’, output_variable=’trans_comps’, sample_dim=’grid’, transforms={‘vA’:vA,’vB’:vB,’vC’:vC}, transform_dim=’trans_component’

)

p.calculate(xr.Dataset())# returns dataset with grid and transformed grid ```

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.TernaryXYToBarycentric(input_variable: str, output_variable: str, sample_dim: str, name: str = 'TernaryXYToBarycentric')#

Bases: Preprocessor

Transform to ternary coordinates from xy coordinates

Note

Adapted from BaryCentric transform mpltern: yuzie007/mpltern

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
sample_dim (str) – The dimension to use when calculating the data minimum
name (str, default="TernaryXYToBarycentric") – The name to use when added to a Pipeline

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.VarsToArray(input_variables: List, output_variable: str, variable_dim: str, squeeze: bool = False, variable_mapping: Dict = None, name: str = 'VarsToArray')#

Bases: Preprocessor

Convert multiple variables into a single array

Parameters:

input_variables (List) – List of input variables to combine into an array
output_variable (str) – The name of the variable to be inserted into the dataset
variable_dim (str) – The dimension name for the variables in the output array
squeeze (bool, default=False) – Whether to squeeze out single-dimension axes
variable_mapping (Dict, default=None) – Optional mapping to rename variables
name (str, default='VarsToArray') – The name to use when added to a Pipeline

calculate(dataset)#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.Zscale(input_variable: str, output_variable: str, dim: str, name: str = 'Zscale')#

Bases: Preprocessor

Z-scale the data to have mean 0 and standard deviation scaling

Parameters:

input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension to use when calculating the data minimum
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search(

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

class AFL.double_agent.Preprocessor.ZscaleError(input_variables: str | None | List[str], output_variable: str, dim: str, name: str = 'Zscale_error')#

Bases: Preprocessor

Scale the y_err data, first input is y, second input is y_err

Parameters:

input_variables (Union[Optional[str], List[str]]) – The names of the input variables - first is y, second is y_err
output_variable (str) – The name of the variable to be inserted into the dataset
dim (str) – The dimension to use when calculating the data minimum
name (str, default="Zscale_error") – The name to use when added to a Pipeline

calculate(dataset: Dataset) → Self#: Apply this PipelineOp to the supplied xarray.dataset

AFL.double_agent.Preprocessor module#

This Page