The interlab Project class¶
An interlab analysis is logically divided into interlaboratory comparison projects. A project is represented by a Project
object which contains the following items:
 Sample labels that represent the physical objects that have been distributed for measurement
 Dataset labels that identify the origin of each set of measurement results
 Experimental spectral data containing the measurement results on the objects that have been analyzed
 Interspectral distance functions that will be used to calculate the spread of the data and identify outliers
 A distribution function that will be used to estimate outliers. By default, this is a lognormal distribution, but can be any distribution function supported by
scipy.stats.rv_continuous
.
The intent is that the user will interact primarily with the Project
object, loading it with the necessary information to conduct its analysis and then using the built in methods. Documentation for the ExperimentGroup
and InterlabArray
objects is included separately.
Creating Projects¶
A blank Project
can be instantiated with no arguments:
my_project = interlab.Project()
This creates a Project
with no data or metadata of any kind. This information can be loaded later, or it can be provided when the code is initialized:
my_project = interlab.Project(x_data_list=xdata,
Sample_names=Sample_names,
Data_set_names=Data_set_names_dict,
distance_metrics=distance_metric_list,
rdata=data_dict,rawdata=rawdata_dict
)
When the Project
object is created, it will automatically create ExperimentGroup
, DistanceMetric
, and Population
objects for the experiment groups and distance metrics that have been assigned.
Defining distance metrics¶
The distance metrics are defined by a list of dictionaries. Each dictionary must have the name of the metric as a text string and the function used to call the metric. The function must either be a callable that accepts two inputs or a string that is recognized by scipy.spatial.pdist()
. The following are two examples:
jeffries = r'Symmetric KullbackLiebler'
mahalanobis = r'Mahalanobis'
nmr_distance_metrics = [dict(metric=mahalanobis,function='mahalanobis'),
#'mahalanobis' is recognized by pdist()
dict(metric=jeffries,function=interlab.jeffries),
#interlab.jeffries is a distance included in this package
]
Project workflow¶
Once the project has been created with the basic data and metadata needed for the analysis, the basic workflow is as follows:
my_project.process_mahalanobis() #This calculated the mean and covariance of the samples and is only needed if the Mahalanobis distance is included in the project
my_project.set_distances()
my_project.fit_zscores()
my_project.find_outliers()
my_project.extract_matrices()
my_project.find_lab_outliers()
This will, in order:
 Calculate the interspectral distances
 Fit the project’s distribution function to the distance data and calculate the corresponding scores.
 Identify outliers within each spectral population
 Conduct a principal components analysis on the scores and compute the projected statistical distance
 Use the projected statistical distance to determine the data set outliers.
Documentation¶
Method summary¶
Analysis functions¶
Project ([data, rawdata, distance_metrics, …]) 
The toplevel project class for the interlaboratory comparison module 
Project.set_distances () 
Calculates the interspectral distances for each experiment group and metric 
Project.fit_zscores () 
Fits the samplelevel zscores for each experiment group and metric. 
Project.find_outliers (**kwargs) 
Finds the sample outliers for each experiment group and metric. 
Project.extract_matrices (**kwargs) 
Runs extract_experimental_matrix() for each experimental matrix 
Project.extract_experimental_matrix ([sets, …]) 
Extracts the zscore data from the dictofvectors format and casts it as a 2D array. 
Project.fit_lab_zscores () 
Fits the lablevel zscores for each metric. 
Project.find_lab_outliers (**kwargs) 
Finds the lab outliers for each metric. 
Plotting functions¶
Project.plot_distance_fig ([plot_range, …]) 
For each sample, generates the following plots: 
Project.plot_zscore_distances (metric[, …]) 
Plots a bar chart of the average interspectral distance for each sample, annotated with the generalized Z score for each sample 
Project.plot_histograms (metric[, …]) 
Plots a histogram of the average interspectral distance for each sample, along with the corresponding fit 
Project.plot_zscore_outliers (metric[, …]) 
Plots the principal component scores for each lab along with the final distribution used to calculate the outliers 
Project.plot_projected_zscores ([…]) 
Plots the projected statistical distances annotated with the corresponding laboratorylevel Z scores. 
Project.plot_zscore_loadings ([…]) 
Plots the principal component loadings for the statistical distances 
Full documentation¶

class
project.
Project
(data=None, rawdata=None, distance_metrics=None, Sample_names=None, Data_set_names=None, x_data_list=None, range_to_use=None, distribution_function=<scipy.stats._continuous_distns.lognorm_gen object>, outlier_dist=None)[source]¶ The toplevel project class for the interlaboratory comparison module
Key Sample_names: List of sample names, used as keys for the dictionaries of data set names and processed and raw data. Each key in this list will correspond to a ExperimentGroup
objectKey Data_set_names: Dictionary of data sets (labs) with data for each sample Key data: Dictionary of data to be used for the interlab analysis Key rawdata: Dictionary of unprocessed data, if different from data Key distance_metrics: List of distance metrics. Each metric in this list will be used to create a DistanceMetric
object within eachExperimentGroup
objectKey x_data_list: The list of x data in the data array. For 2D data, this is not used Key range_to_use: Used to screen certain parts of the spectral data from consideration in the experimental comparison Key distribution_function: Which distribution will be assumed when assigning Z scores to each measurement of a sample. The default is sp.stats.lognorm Key outlier_dist: Which distribution will be assumed when detecting outliers. The default is the same as distribution_function 
set_distances
()[source]¶ Calculates the interspectral distances for each experiment group and metric

extract_experimental_matrix
()[source]¶ Extracts the zscore data from the dictofvectors format and casts it as a 2D array.
The dictionary of samplelevel zscores is recast as an array, with one dimension corresponding to sample names and the other corresponding to laboratory.
Key sets: Sets to extract for the interlab comparison Key metric: Distance metric that will be used Key screen_outliers: Whether to remove outlier measurements before imputing missing values Key imputation_axis: Axis along which to impute missing values

plot_distance_fig
()[source]¶  For each sample, generates the following plots:
 A plot of the spectra generated for that sample by each laboratory
 For each metric, a heat map plot of the interspectral distance matrix
Key plot_range: An iterable of integers specifying which sample labels to plot Key cmap: The color map that will be used for the distance heat maps Key linecolor: The line color that will be used for the spectral data Key distance_metrics: A list of the distance metrics for which heat maps will be plotted. If None, plot heat maps for all metrics in this project Key plot_data: Boolean that tells whether the raw spectral data will be plotted Key wspace: Horizontal spacing between the heat maps Key ylabel_buffer: Space allocated for the y axis label (in inches) Key rightlabel_buffer: Space allocated for the colorbar label (in inches) Key xlabel_buffer: Returns: distance_fig, the distance measure figure matplotlib object.

plot_zscore_distances
()[source]¶ Plots a bar chart of the average interspectral distance for each sample, annotated with the generalized Z score for each sample
Parameters: metric – The metric for which the distances will be plotted Key plot_range: An iterable of integers specifying which sample labels to plot Key numcols: The number of columns in the distance plot Key xlabel_buffer: Space allocated for xaxis labels (in inches) Key ylabel_buffer: Space allocated for yaxis labels (in inches) Key rotation: Specifies the orientation of the zscore labels for individual labs Returns zscorefig: The distances and scores plot as a matplotlib figure object

plot_histograms
()[source]¶ Plots a histogram of the average interspectral distance for each sample, along with the corresponding fit
Parameters: metrics – The metric for which the distances will be plotted Key plot_range: An iterable of integers specifying which sample labels to plot Key numcols: The number of columns in the distance plot Key xlabel_buffer: Space allocated for xaxis labels (in inches) Key rotation: Specifies the orientation of the zscore labels for individual labs Returns pdffig: The distances and scores plot as a matplotlib figure object

plot_zscore_outliers
()[source]¶ Plots the principal component scores for each lab along with the final distribution used to calculate the outliers
Parameters: metric – The metric used to calculate the interspectral distances Key y_component: Which principal component to use on the Y axis, if not the first Key text: Whether to label the plot with the name of the Returns: zscore_outliers_fig, the Z score outlier plot as a matplotlib figure object

plot_projected_zscores
()[source]¶ Plots the projected statistical distances annotated with the corresponding laboratorylevel Z scores.
Key distance_metrics: A list of the distance metrics for which statistical distances will be plotted. If None, plot statistical distances for all metrics in this project Key xlabel_buffer: Space allocated for xaxis labels (in inches) Key rotation: Specifies the orientation of the zscore labels for individual labs Returns: zscorefig, the projected statistical distances plot as a matplotlib figure object

plot_zscore_loadings
()[source]¶ Plots the principal component loadings for the statistical distances
Key distance_metrics: A list of the distance metrics for which loadings will be plotted. If None, plot loadings for all metrics in this project Key xlabel_buffer: Space allocated for xaxis labels (in inches) Returns: loadfig, the projected statistical loadings plot as a matplotlib figure object
