AFL.double_agent.Preprocessor module#
PipelineOps for Data Preprocessing
This module contains preprocessing operations that transform, normalize, and prepare data for analysis. Preprocessors handle tasks such as: - Scaling and normalizing data - Transforming between coordinate systems - Filtering and smoothing signals - Extracting features from raw measurements - Converting between different data representations
Each preprocessor is implemented as a PipelineOp that can be composed with others in a processing pipeline.
- class AFL.double_agent.Preprocessor.ArrayToVars(input_variable: str, output_variables: list, split_dim: list, postfix: str = '', squeeze: bool = False, name: str = 'DatasetToVars')#
Bases:
Preprocessor
Convert an array into multiple variables
- Parameters:
input_variable (str) – The name of the array variable to split into separate variables
output_variables (list) – The names of the variables to create from the array
split_dim (str) – The dimension to split the array along
postfix (str, default='') – String to append to output variable names
squeeze (bool, default=False) – Whether to squeeze out single-dimension axes
name (str, default='DatasetToVars') – The name to use when added to a Pipeline
- calculate(dataset)#
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.BarycentricToTernaryXY(input_variable: str, output_variable: str, sample_dim: str, name: str = 'BarycentricToTernaryXY')#
Bases:
Preprocessor
Transform from ternary coordinates to xy coordinates
Note —- Adapted from BaryCentric transform mpltern: yuzie007/mpltern
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
sample_dim (str) – The dimension to use when calculating the data minimum
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search(
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.Destandardize(input_variable: str, output_variable: str, dim: str, component_dim: str | None = 'component', scale_variable: str | None = None, min_val: Number | None = None, max_val: Number | None = None, name: str = 'Destandardize')#
Bases:
Preprocessor
Transform the data from 0->1 scaling
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension used for calculating the data minimum
component_dim (Optional[str], default="component") – The dimension for component-wise operations
scale_variable (Optional[str], default=None) – If specified, the min/max of this data variable in the supplied xarray.Dataset will be used to scale the data rather than min/max of the input_variable or the supplied min_val or max_val
min_val (Optional[Number], default=None) – Value used to scale the data minimum
max_val (Optional[Number], default=None) – Value used to scale the data maximum
name (str, default="Destandardize") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.Extrema(input_variable: str, output_variable: str, dim: str, return_coords: bool = False, operator='max', slice: List | None = None, slice_dim: str | None = None, name: str = 'Extrema')#
Bases:
Preprocessor
Find the extrema of a data variable
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.Preprocessor(input_variable: str = None, output_variable: str = None, name: str = 'PreprocessorBase')#
Bases:
PipelineOp
Base class stub for all preprocessors
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.SavgolFilter(input_variable: str, output_variable: str, dim: str = 'q', xlo: Number | None = None, xhi: Number | None = None, xlo_isel: int | None = None, xhi_isel: int | None = None, pedestal: Number | None = None, npts: int = 250, derivative: int = 0, window_length: int = 31, polyorder: int = 2, apply_log_scale: bool = True, name: str = 'SavgolFilter')#
Bases:
Preprocessor
Smooth and take derivatives of input data via a Savitsky-Golay filter
This PipelineOp cleans measurement data and takes smoothed derivatives using scipy.signal.savgol_filter. Below is a summary of the steps taken.
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension in the xarray.Dataset to apply this filter over
xlo (Optional[Number]) – The values of the input dimension (dim, above) to trim the data to
xhi (Optional[Number]) – The values of the input dimension (dim, above) to trim the data to
xlo_isel (Optional[int]) – The integer indices of the input dimension (dim, above) to trim the data to
xhi_isel (Optional[int]) – The integer indices of the input dimension (dim, above) to trim the data to
pedestal (Optional[Number]) – This value is added to the input_variable to establish a fixed data ‘floor’
npts (int) – The size of the grid to interpolate onto
derivative (int) – The order of the derivative to return. If derivative=0, the data is smoothed with no derivative taken.
window_length (int) – The width of the window used in the savgol smoothing. See scipy.signal.savgol_filter for more information.
polyorder (int) – The order of polynomial used in the savgol smoothing. See scipy.signal.savgol_filter for more information.
apply_log_scale (bool) – If True, the input_variable and associated dim coordinated are scaled with numpy.log10
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search()
Notes
This PipelineOp performs the following steps:
1. Data is trimmed to (xlo, xhi) and then (xlo_isel, xhi_isel) in that order. The former trims the data to a numerical while the latter trims to integer indices. It is generally not advisable to supply both varieties and a warning will be raised if this is attempted.
2. If apply_log_scale = True, both the input_variable and dim data will be scaled with numpy.log10. A new xarray dimension and coordinate will be created with the name log_{dim}.
3. All duplicate data (multiple data values at the same dim coordinates) are removed by taking the average of the duplicates.
If pedestal is specified, the pedestal value is added to the data and all NaNs are filled with the pedestal
The data is interpolated onto a constant grid with npts values from the trimmed minimum to the trimmed maximum. If apply_log_scale=True, the grid is geometrically rather than linearly spaced.
All remaining NaN values are dropped along dim
7. Finally, scipy.signal.savgol_filter is applied with the window_length, polyorder, and derivative parameters specified in the constructor.
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.Standardize(input_variable: str, output_variable: str, dim: str, component_dim: str | None = 'component', scale_variable: str | None = None, min_val: Number | None = None, max_val: Number | None = None, name: str = 'Standardize')#
Bases:
Preprocessor
Standardize the data to have min 0 and max 1
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension used for calculating the data minimum
component_dim (Optional[str], default="component") – The dimension for component-wise operations
scale_variable (Optional[str], default=None) – If specified, the min/max of this data variable in the supplied xarray.Dataset will be used to scale the data rather than min/max of the input_variable or the supplied min_val or max_val
min_val (Optional[Number], default=None) – Value used to scale the data minimum
max_val (Optional[Number], default=None) – Value used to scale the data maximum
name (str, default="Standardize") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.Subtract(input_variable: str, output_variable: str, dim: str, value: float | str, coord_value: bool = True, name: str = 'Subtract')#
Bases:
Preprocessor
Baseline input variable by subtracting a value
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.SubtractMin(input_variable: str, output_variable: str, dim: str, name: str = 'SubtractMin')#
Bases:
Preprocessor
Baseline input variable by subtracting minimum value
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.SympyTransform(input_variable: str, output_variable: str, sample_dim: str, transforms: Dict[str, object], transform_dim: str, component_dim: str = 'component', name: str = 'SympyTransform')#
Bases:
Preprocessor
Transform data using sympy expressions
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
sample_dim (str) – The sample dimension i.e., the dimension of compositions or grid points
component_dim (str, default="component") – The dimension of the component of each gridpoint
transforms (Dict[str,object]) – A dictionary of transforms (sympy expressions) to evaluate to generate new variables. For this method to function, the transforms must be completely specified except for the names in component_dim of the input_variable
transform_dim (str) – The name of the dimension that the ‘component_dim’ will be transformed to
name (str, default="SympyTransform") – The name to use when added to a Pipeline
Example
```python from AFL.double_agent import * import sympy with Pipeline() as p:
- CartesianGrid(
output_variable=’comps’, grid_spec={
‘A’:{‘min’:1,’max’:25,’steps’:5}, ‘B’:{‘min’:1,’max’:25,’steps’:5}, ‘C’:{‘min’:1,’max’:25,’steps’:5},
}, sample_dim=’grid’
)
A,B,C = sympy.symbols(‘A B C’) vA = A/(A+B+C) vB = B/(A+B+C) vC = C/(A+B+C) SympyTransform(
input_variable=’comps’, output_variable=’trans_comps’, sample_dim=’grid’, transforms={‘vA’:vA,’vB’:vB,’vC’:vC}, transform_dim=’trans_component’
)
p.calculate(xr.Dataset())# returns dataset with grid and transformed grid ```
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.TernaryXYToBarycentric(input_variable: str, output_variable: str, sample_dim: str, name: str = 'TernaryXYToBarycentric')#
Bases:
Preprocessor
Transform to ternary coordinates from xy coordinates
Note
Adapted from BaryCentric transform mpltern: yuzie007/mpltern
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
sample_dim (str) – The dimension to use when calculating the data minimum
name (str, default="TernaryXYToBarycentric") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.VarsToArray(input_variables: List, output_variable: str, variable_dim: str, squeeze: bool = False, variable_mapping: Dict = None, name: str = 'VarsToArray')#
Bases:
Preprocessor
Convert multiple variables into a single array
- Parameters:
input_variables (List) – List of input variables to combine into an array
output_variable (str) – The name of the variable to be inserted into the dataset
variable_dim (str) – The dimension name for the variables in the output array
squeeze (bool, default=False) – Whether to squeeze out single-dimension axes
variable_mapping (Dict, default=None) – Optional mapping to rename variables
name (str, default='VarsToArray') – The name to use when added to a Pipeline
- calculate(dataset)#
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.Zscale(input_variable: str, output_variable: str, dim: str, name: str = 'Zscale')#
Bases:
Preprocessor
Z-scale the data to have mean 0 and standard deviation scaling
- Parameters:
input_variable (str) – The name of the xarray.Dataset data variable to extract from the input dataset
output_variable (str) – The name of the variable to be inserted into the xarray.Dataset by this PipelineOp
dim (str) – The dimension to use when calculating the data minimum
name (str) – The name to use when added to a Pipeline. This name is used when calling Pipeline.search(
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset
- class AFL.double_agent.Preprocessor.ZscaleError(input_variables: str | None | List[str], output_variable: str, dim: str, name: str = 'Zscale_error')#
Bases:
Preprocessor
Scale the y_err data, first input is y, second input is y_err
- Parameters:
input_variables (Union[Optional[str], List[str]]) – The names of the input variables - first is y, second is y_err
output_variable (str) – The name of the variable to be inserted into the dataset
dim (str) – The dimension to use when calculating the data minimum
name (str, default="Zscale_error") – The name to use when added to a Pipeline
- calculate(dataset: Dataset) Self #
Apply this PipelineOp to the supplied xarray.dataset