Do Model Selection with AutoSAS#

Google Colab Setup#

Only uncomment and run the next cell if you are running this notebook in Google Colab or if don’t already have the AFL-agent package installed.

[ ]:

# !pip install git+https://github.com/usnistgov/AFL-agent.git

Defining the Fit#

The first step to AutoSAS is to define which models we’d like to fit. Since we have three representations of data in our dataset, we’ll define three models to fit it. In real, experimental scenarios, you often don’t know how many structures you might encounter, so you can define as many as you’d like in the model_input list.

[20]:

model_inputs = [
    {
        "name": "surface_fractal", # your name for the model, can be anything
        "sasmodel": "power_law", # the name of the sasmodel in the sasmodels library
        'q_min':0.001,
        'q_max':1.0,
        "fit_params": {
            "power": {"value": 4, "bounds": (3, 4)},
            "scale": {"value": 1.0, "bounds": (1e-6,1e-3)},
            "background": {"value": 1.0},
        },
    },
    {
        "name": "mass_fractal",
        "sasmodel": "power_law",
        'q_min':0.001,
        'q_max':1.0,
        "fit_params": {
            "power": {"value": 4, "bounds": (1.7, 3)},
            "scale": {"value": 1.0, "bounds": (1e-4,1e-1)},
            "background": {"value": 1.0},
        },
    },
    {
        "name": "polymer",
        "sasmodel": "polymer_excl_volume",
        'q_min':0.001,
        'q_max':1.0,
        "fit_params": {
            "scale": {"value": 1.0, "bounds": (1e-2,1e2)},
            "rg": {"value": 60.0, "bounds": (10,150)},
            "background": {"value": 1.0},
        },
    }
]

Evaluating the Fit Results#

Okay, now we must evaluate the quality of the fits. If the parameters of the model_inputs dialog are too constraining, then the fit might not be able to be fully optimized.

We’ll plot all three fits to a given measurement (defined by data_index) below and along with the residual for the fit. Vary the data_index in order to assess the quality of the models for different data.

[42]:

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(5, 6), height_ratios=[4, 1], sharex=True)

data_index = 0

# Top plot - data and fits
ds_result.isel(sample=data_index).I.plot.line(x='q', xscale='log', yscale='log', marker='.', ls='None', label='data', ax=ax1)
ds_result.isel(sample=data_index).fit_I_surface_fractal.plot.line(x='fit_q_surface_fractal', xscale='log', yscale='log', label='surface_fractal', ax=ax1)
ds_result.isel(sample=data_index).fit_I_mass_fractal.plot.line(x='fit_q_mass_fractal', xscale='log', yscale='log', label='mass_fractal', ax=ax1)
ds_result.isel(sample=data_index).fit_I_polymer.plot.line(x='fit_q_polymer', xscale='log', yscale='log', label='polymer', ax=ax1)
ax1.legend()
ax1.set(xlabel=None,ylabel='Intensity [A.U.]')

ax1.get_lines()[0].set_color('C0')  # Keep data points as first color


# Bottom plot - residuals
ds_result.isel(sample=data_index).residuals_surface_fractal.plot(x='fit_q_surface_fractal', xscale='log',  ax=ax2)
ds_result.isel(sample=data_index).residuals_mass_fractal.plot(x='fit_q_mass_fractal', xscale='log',  ax=ax2)
ds_result.isel(sample=data_index).residuals_polymer.plot(x='fit_q_polymer', xscale='log',  ax=ax2)
ax2.axhline(y=0, color='k', linestyle='--', alpha=0.5)
ax2.get_lines()[1].set_color('C2')  # Skip C1, use C2 for surface_fractal
ax2.get_lines()[2].set_color('C3')  # Use C3 for mass_fractal
ax2.get_lines()[3].set_color('C4')  # Use C4 for polymer
ax2.set(xlabel='q [$\AA^{-1}$]',ylabel='Residual')

plt.tight_layout()

../_images/how-to_autosas_modal_selection_18_0.png

The residuals (differences between the model and data) provide a key way to assess the quality of the fits. A good fit should show residuals that:

Are randomly scattered around zero
Have no clear systematic trends or patterns
Are roughly within ±2-3 standard deviations of zero

Let’s plot the residuals for the first sample to assess the quality of our surface fractal fits:

Conclusion#

In this example, we demonstrated a complete workflow for fitting multiple SAS models to data and automatically selecting the best model based on chi-squared values. While this example used simulated data that was specifically generated to be well-separated between different model types (making the fitting and model selection relatively straightforward), it illustrates the key components and capabilities of the AutoSAS pipeline:

Fitting multiple models to the same dataset
Comparing fit quality across models
Automatically selecting the best model for each sample

With real experimental data, the fitting and model selection process may be more challenging due to:

Noise and experimental uncertainties
Samples that could be described by multiple models
More complex scattering patterns requiring more sophisticated models

However, the workflow demonstrated here provides a foundation for approaching these more complex cases in a systematic way.

Do Model Selection with AutoSAS#

Google Colab Setup#

Getting Started#

Defining the Fit#

Building and Executing the Pipeline#

Evaluating the Fit Results#

Adding Model Selection to the Pipeline#

Conclusion#

This Page