Using Example Datasets#

AFL-agent comes with example datasets that you can use to learn and experiment with the library. These datasets are accessible through the AFL.double_agent.datasets module.

Loading Example Datasets#

You can load the example datasets using the following code:

from AFL.double_agent.datasets import example_dataset1

# Load the example dataset
ds = example_dataset1()

# Print information about the dataset
print(f"Dataset dimensions: {dict(ds.sizes)}")
print(f"Dataset variables: {list(ds.data_vars)}")
print(f"Dataset coordinates: {list(ds.coords)}")

Available Datasets#

Currently, the following datasets are available:

  • example_dataset1: A synthetic dataset with compositions, measurements, and ground truth labels.

Listing Available Datasets#

You can list all available datasets using the list_datasets function:

from AFL.double_agent.datasets import list_datasets

# List all available datasets
print(list_datasets())

Loading a Dataset by Name#

You can also load a dataset by name using the load_dataset function:

from AFL.double_agent.datasets import load_dataset

# Load a dataset by name
ds = load_dataset("example_dataset")

Dataset Location#

The example datasets are stored in the AFL/double_agent/data directory within the package. The datasets module automatically locates and loads these files when you import and use the dataset functions.

Example: Using the Example Dataset with a Pipeline#

Here’s an example of how to use the example dataset with a pipeline:

from AFL.double_agent import Pipeline, SavgolFilter, Similarity, SpectralClustering
from AFL.double_agent.datasets import example_dataset1

# Load the example dataset
ds = example_dataset1()

# Create a pipeline
with Pipeline() as clustering_pipeline:
    SavgolFilter(
        input_variable='measurement',
        output_variable='derivative',
        dim='x',
        derivative=1
    )

    Similarity(
        input_variable='derivative',
        output_variable='similarity',
        sample_dim='sample',
    )

    SpectralClustering(
        input_variable='similarity',
        output_variable='labels',
        n_clusters=2
    )

# Run the pipeline
result = clustering_pipeline.calculate(ds)

# Compare the predicted labels with the ground truth
print("Predicted labels:", result.labels.values)
print("Ground truth labels:", ds.ground_truth_labels.values)