AFL.double_agent.Labeler module#
Phase labeling tools for clustering and classification of materials data.
This module provides classes for automatically identifying and labeling phases in materials science data. It includes various clustering algorithms and methods for determining optimal number of phases using silhouette analysis.
Key features: - Multiple clustering algorithms (Spectral, GMM, Affinity Propagation) - Automatic determination of number of phases via Silhouette analysis - Support for precomputed similarity/distance matrices - Integration with scikit-learn clustering algorithms
- class AFL.double_agent.Labeler.AffinityPropagation(input_variable, output_variable, dim, params=None, name='AffinityPropagation')#
Bases:
Labeler
Affinity Propagation for phase identification.
Uses Affinity Propagation clustering to identify phases. This method automatically determines the number of phases based on the data structure and similarity matrix.
- Parameters:
input_variable (str) – The name of the variable containing the similarity matrix
output_variable (str) – The name of the variable where labels will be stored
dim (str) – The dimension name for samples in the dataset
params (dict, optional) – Additional parameters for Affinity Propagation. Defaults include: - damping: 0.75 - max_iter: 5000 - convergence_iter: 250 - affinity: ‘precomputed’
name (str, default='AffinityPropagation') – The name to use when added to a Pipeline
- calculate(dataset)#
Apply Affinity Propagation clustering to the dataset.
- Parameters:
dataset (xr.Dataset) – The input dataset containing the similarity matrix
- Returns:
The AffinityPropagation instance with computed labels
- Return type:
Self
- class AFL.double_agent.Labeler.GaussianMixtureModel(input_variable, output_variable, dim, params=None, name='GaussianMixtureModel')#
Bases:
Labeler
Gaussian Mixture Model for phase identification.
Uses a Gaussian Mixture Model to identify phases based on their distribution in feature space. Particularly useful when phases are expected to have Gaussian distributions.
- Parameters:
input_variable (str) – The name of the variable containing the feature data
output_variable (str) – The name of the variable where labels will be stored
dim (str) – The dimension name for samples in the dataset
params (dict, optional) – Additional parameters for the GMM
name (str, default='GaussianMixtureModel') – The name to use when added to a Pipeline
- calculate(dataset)#
Apply GMM clustering to the dataset.
- Parameters:
dataset (xr.Dataset) – The input dataset containing the feature data
- Returns:
The GaussianMixtureModel instance with computed labels
- Return type:
Self
- class AFL.double_agent.Labeler.Labeler(input_variable, output_variable, dim='sample', use_silhouette=False, params=None, name='PhaseLabeler')#
Bases:
PipelineOp
Base class for phase labeling operations.
This abstract base class provides common functionality for labeling phases in materials data. It supports various clustering approaches and includes methods for label manipulation and silhouette analysis.
- Parameters:
input_variable (str) – The name of the variable containing the data to be labeled
output_variable (str) – The name of the variable where labels will be stored
dim (str, default='sample') – The dimension name for samples in the dataset
use_silhouette (bool, default=False) – Whether to use silhouette analysis to determine optimal number of phases
params (dict, optional) – Additional parameters for the labeling algorithm
name (str, default='PhaseLabeler') – The name to use when added to a Pipeline
- remap_labels_by_count()#
Remap phase labels to be ordered by frequency.
Reorders labels so that the most common phase is labeled 0, second most common is 1, etc.
- silhouette(W)#
Perform silhouette analysis to determine optimal number of phases.
Uses silhouette scores to automatically determine the best number of phases by trying different numbers of clusters and analyzing the clustering quality.
- Parameters:
W (array-like) – The similarity/distance matrix to use for clustering
Notes
This method modifies the instance’s n_phases and labels attributes based on the silhouette analysis results.
- class AFL.double_agent.Labeler.SpectralClustering(input_variable, output_variable, dim, params=None, name='SpectralClustering', use_silhouette=False)#
Bases:
Labeler
Spectral clustering for phase identification.
Uses spectral clustering to identify phases based on a similarity matrix. Particularly effective for non-spherical clusters and when working with similarity rather than distance metrics.
- Parameters:
input_variable (str) – The name of the variable containing the similarity matrix
output_variable (str) – The name of the variable where labels will be stored
dim (str) – The dimension name for samples in the dataset
params (dict, optional) – Additional parameters for spectral clustering
name (str, default='SpectralClustering') – The name to use when added to a Pipeline
use_silhouette (bool, default=False) – Whether to use silhouette analysis to determine optimal number of phases
- calculate(dataset)#
Apply spectral clustering to the dataset.
- Parameters:
dataset (xr.Dataset) – The input dataset containing the similarity matrix
- Returns:
The SpectralClustering instance with computed labels
- Return type:
Self