Using Example Datasets#
AFL-agent comes with example datasets that you can use to learn and experiment with the library. These datasets are accessible through the AFL.double_agent.datasets
module.
Loading Example Datasets#
You can load the example datasets using the following code:
from AFL.double_agent.datasets import example_dataset1
# Load the example dataset
ds = example_dataset1()
# Print information about the dataset
print(f"Dataset dimensions: {dict(ds.sizes)}")
print(f"Dataset variables: {list(ds.data_vars)}")
print(f"Dataset coordinates: {list(ds.coords)}")
Available Datasets#
Currently, the following datasets are available:
example_dataset1
: A synthetic dataset with compositions, measurements, and ground truth labels.
Listing Available Datasets#
You can list all available datasets using the list_datasets
function:
from AFL.double_agent.datasets import list_datasets
# List all available datasets
print(list_datasets())
Loading a Dataset by Name#
You can also load a dataset by name using the load_dataset
function:
from AFL.double_agent.datasets import load_dataset
# Load a dataset by name
ds = load_dataset("example_dataset")
Dataset Location#
The example datasets are stored in the AFL/double_agent/data
directory within the package. The datasets module automatically locates and loads these files when you import and use the dataset functions.
Example: Using the Example Dataset with a Pipeline#
Here’s an example of how to use the example dataset with a pipeline:
from AFL.double_agent import Pipeline, SavgolFilter, Similarity, SpectralClustering
from AFL.double_agent.datasets import example_dataset1
# Load the example dataset
ds = example_dataset1()
# Create a pipeline
with Pipeline() as clustering_pipeline:
SavgolFilter(
input_variable='measurement',
output_variable='derivative',
dim='x',
derivative=1
)
Similarity(
input_variable='derivative',
output_variable='similarity',
sample_dim='sample',
)
SpectralClustering(
input_variable='similarity',
output_variable='labels',
n_clusters=2
)
# Run the pipeline
result = clustering_pipeline.calculate(ds)
# Compare the predicted labels with the ground truth
print("Predicted labels:", result.labels.values)
print("Ground truth labels:", ds.ground_truth_labels.values)