Using Example Datasets
===================

AFL-agent comes with example datasets that you can use to learn and experiment with the library. These datasets are accessible through the ``AFL.double_agent.datasets`` module.

Loading Example Datasets
-----------------------

You can load the example datasets using the following code:

.. code-block:: python

    from AFL.double_agent.datasets import example_dataset1
    
    # Load the example dataset
    ds = example_dataset1()
    
    # Print information about the dataset
    print(f"Dataset dimensions: {dict(ds.sizes)}")
    print(f"Dataset variables: {list(ds.data_vars)}")
    print(f"Dataset coordinates: {list(ds.coords)}")

Available Datasets
-----------------

Currently, the following datasets are available:

- ``example_dataset1``: A synthetic dataset with compositions, measurements, and ground truth labels.

Listing Available Datasets
-------------------------

You can list all available datasets using the ``list_datasets`` function:

.. code-block:: python

    from AFL.double_agent.datasets import list_datasets
    
    # List all available datasets
    print(list_datasets())

Loading a Dataset by Name
------------------------

You can also load a dataset by name using the ``load_dataset`` function:

.. code-block:: python

    from AFL.double_agent.datasets import load_dataset
    
    # Load a dataset by name
    ds = load_dataset("example_dataset")

Dataset Location
---------------

The example datasets are stored in the ``AFL/double_agent/data`` directory within the package. The datasets module automatically locates and loads these files when you import and use the dataset functions.

Example: Using the Example Dataset with a Pipeline
-------------------------------------------------

Here's an example of how to use the example dataset with a pipeline:

.. code-block:: python

    from AFL.double_agent import Pipeline, SavgolFilter, Similarity, SpectralClustering
    from AFL.double_agent.datasets import example_dataset1
    
    # Load the example dataset
    ds = example_dataset1()
    
    # Create a pipeline
    with Pipeline() as clustering_pipeline:
        SavgolFilter(
            input_variable='measurement', 
            output_variable='derivative', 
            dim='x', 
            derivative=1
        )
        
        Similarity(
            input_variable='derivative', 
            output_variable='similarity', 
            sample_dim='sample',
        )
        
        SpectralClustering(
            input_variable='similarity',
            output_variable='labels',
            n_clusters=2
        )
    
    # Run the pipeline
    result = clustering_pipeline.calculate(ds)
    
    # Compare the predicted labels with the ground truth
    print("Predicted labels:", result.labels.values)
    print("Ground truth labels:", ds.ground_truth_labels.values)