AFL.double_agent.Labeler module#

Phase labeling tools for clustering and classification of materials data.

This module provides classes for automatically identifying and labeling phases in materials science data. It includes various clustering algorithms and methods for determining optimal number of phases using silhouette analysis.

Key features: - Multiple clustering algorithms (Spectral, GMM, Affinity Propagation) - Automatic determination of number of phases via Silhouette analysis - Support for precomputed similarity/distance matrices - Integration with scikit-learn clustering algorithms

class AFL.double_agent.Labeler.AffinityPropagation(input_variable, output_variable, dim, params=None, name='AffinityPropagation')#

Bases: Labeler

Affinity Propagation for phase identification.

Uses Affinity Propagation clustering to identify phases. This method automatically determines the number of phases based on the data structure and similarity matrix.

Parameters:
  • input_variable (str) – The name of the variable containing the similarity matrix

  • output_variable (str) – The name of the variable where labels will be stored

  • dim (str) – The dimension name for samples in the dataset

  • params (dict, optional) – Additional parameters for Affinity Propagation. Defaults include: - damping: 0.75 - max_iter: 5000 - convergence_iter: 250 - affinity: ‘precomputed’

  • name (str, default='AffinityPropagation') – The name to use when added to a Pipeline

calculate(dataset)#

Apply Affinity Propagation clustering to the dataset.

Parameters:

dataset (xr.Dataset) – The input dataset containing the similarity matrix

Returns:

The AffinityPropagation instance with computed labels

Return type:

Self

class AFL.double_agent.Labeler.GaussianMixtureModel(input_variable, output_variable, dim, params=None, name='GaussianMixtureModel')#

Bases: Labeler

Gaussian Mixture Model for phase identification.

Uses a Gaussian Mixture Model to identify phases based on their distribution in feature space. Particularly useful when phases are expected to have Gaussian distributions.

Parameters:
  • input_variable (str) – The name of the variable containing the feature data

  • output_variable (str) – The name of the variable where labels will be stored

  • dim (str) – The dimension name for samples in the dataset

  • params (dict, optional) – Additional parameters for the GMM

  • name (str, default='GaussianMixtureModel') – The name to use when added to a Pipeline

calculate(dataset)#

Apply GMM clustering to the dataset.

Parameters:

dataset (xr.Dataset) – The input dataset containing the feature data

Returns:

The GaussianMixtureModel instance with computed labels

Return type:

Self

class AFL.double_agent.Labeler.Labeler(input_variable, output_variable, dim='sample', use_silhouette=False, params=None, name='PhaseLabeler')#

Bases: PipelineOp

Base class for phase labeling operations.

This abstract base class provides common functionality for labeling phases in materials data. It supports various clustering approaches and includes methods for label manipulation and silhouette analysis.

Parameters:
  • input_variable (str) – The name of the variable containing the data to be labeled

  • output_variable (str) – The name of the variable where labels will be stored

  • dim (str, default='sample') – The dimension name for samples in the dataset

  • use_silhouette (bool, default=False) – Whether to use silhouette analysis to determine optimal number of phases

  • params (dict, optional) – Additional parameters for the labeling algorithm

  • name (str, default='PhaseLabeler') – The name to use when added to a Pipeline

remap_labels_by_count()#

Remap phase labels to be ordered by frequency.

Reorders labels so that the most common phase is labeled 0, second most common is 1, etc.

silhouette(W)#

Perform silhouette analysis to determine optimal number of phases.

Uses silhouette scores to automatically determine the best number of phases by trying different numbers of clusters and analyzing the clustering quality.

Parameters:

W (array-like) – The similarity/distance matrix to use for clustering

Notes

This method modifies the instance’s n_phases and labels attributes based on the silhouette analysis results.

class AFL.double_agent.Labeler.SpectralClustering(input_variable, output_variable, dim, params=None, name='SpectralClustering', use_silhouette=False)#

Bases: Labeler

Spectral clustering for phase identification.

Uses spectral clustering to identify phases based on a similarity matrix. Particularly effective for non-spherical clusters and when working with similarity rather than distance metrics.

Parameters:
  • input_variable (str) – The name of the variable containing the similarity matrix

  • output_variable (str) – The name of the variable where labels will be stored

  • dim (str) – The dimension name for samples in the dataset

  • params (dict, optional) – Additional parameters for spectral clustering

  • name (str, default='SpectralClustering') – The name to use when added to a Pipeline

  • use_silhouette (bool, default=False) – Whether to use silhouette analysis to determine optimal number of phases

calculate(dataset)#

Apply spectral clustering to the dataset.

Parameters:

dataset (xr.Dataset) – The input dataset containing the similarity matrix

Returns:

The SpectralClustering instance with computed labels

Return type:

Self