The interlab Project class¶

An interlab analysis is logically divided into interlaboratory comparison projects. A project is represented by a Project object which contains the following items:

Sample labels that represent the physical objects that have been distributed for measurement

Dataset labels that identify the origin of each set of measurement results

Experimental spectral data containing the measurement results on the objects that have been analyzed

Interspectral distance functions that will be used to calculate the spread of the data and identify outliers

A distribution function that will be used to estimate outliers. By default, this is a lognormal distribution, but can be any distribution function supported by scipy.stats.rv_continuous.

The intent is that the user will interact primarily with the Project object, loading it with the necessary information to conduct its analysis and then using the built in methods. Documentation for the ExperimentGroup and InterlabArray objects is included separately.

Creating Projects¶

A blank Project can be instantiated with no arguments:

my_project = interlab.Project()

This creates a Project with no data or metadata of any kind. This information can be loaded later, or it can be provided when the code is initialized:

my_project = interlab.Project(x_data_list=xdata,
                              Sample_names=Sample_names,
                              Data_set_names=Data_set_names_dict,
                              distance_metrics=distance_metric_list,
                              rdata=data_dict,rawdata=rawdata_dict
                              )

When the Project object is created, it will automatically create ExperimentGroup, DistanceMetric, and Population objects for the experiment groups and distance metrics that have been assigned.

Defining distance metrics¶

The distance metrics are defined by a list of dictionaries. Each dictionary must have the name of the metric as a text string and the function used to call the metric. The function must either be a callable that accepts two inputs or a string that is recognized by scipy.spatial.pdist(). The following are two examples:

jeffries = r'Symmetric Kullback-Liebler'
mahalanobis = r'Mahalanobis'
nmr_distance_metrics = [dict(metric=mahalanobis,function='mahalanobis'),
                        #'mahalanobis' is recognized by pdist()
                        dict(metric=jeffries,function=interlab.jeffries),
                        #interlab.jeffries is a distance included in this package
                       ]

Project workflow¶

Once the project has been created with the basic data and metadata needed for the analysis, the basic workflow is as follows:

my_project.process_mahalanobis() #This calculated the mean and covariance of the samples and is only needed if the Mahalanobis distance is included in the project
my_project.set_distances()
my_project.fit_zscores()
my_project.find_outliers()
my_project.extract_matrices()
my_project.find_lab_outliers()

This will, in order:

Calculate the interspectral distances

Fit the project’s distribution function to the distance data and calculate the corresponding scores.

Identify outliers within each spectral population

Conduct a principal components analysis on the scores and compute the projected statistical distance

Use the projected statistical distance to determine the data set outliers.

Documentation¶

Method summary¶

Analysis functions¶

`Project`([data, rawdata, distance_metrics, …])	The top-level project class for the interlaboratory comparison module
`Project.set_distances`()	Calculates the interspectral distances for each experiment group and metric
`Project.fit_zscores`()	Fits the sample-level zscores for each experiment group and metric.
`Project.find_outliers`(**kwargs)	Finds the sample outliers for each experiment group and metric.
`Project.extract_matrices`(**kwargs)	Runs `extract_experimental_matrix()` for each experimental matrix
`Project.extract_experimental_matrix`([sets, …])	Extracts the zscore data from the dict-of-vectors format and casts it as a 2D array.
`Project.fit_lab_zscores`()	Fits the lab-level zscores for each metric.
`Project.find_lab_outliers`(**kwargs)	Finds the lab outliers for each metric.

Plotting functions¶

`Project.plot_distance_fig`([plot_range, …])	For each sample, generates the following plots:
`Project.plot_zscore_distances`(metric[, …])	Plots a bar chart of the average interspectral distance for each sample, annotated with the generalized Z score for each sample
`Project.plot_histograms`(metric[, …])	Plots a histogram of the average interspectral distance for each sample, along with the corresponding fit
`Project.plot_zscore_outliers`(metric[, …])	Plots the principal component scores for each lab along with the final distribution used to calculate the outliers
`Project.plot_projected_zscores`([…])	Plots the projected statistical distances annotated with the corresponding laboratory-level Z scores.
`Project.plot_zscore_loadings`([…])	Plots the principal component loadings for the statistical distances

Full documentation¶

class project.Project(data=None, rawdata=None, distance_metrics=None, Sample_names=None, Data_set_names=None, x_data_list=None, range_to_use=None, distribution_function=<scipy.stats._continuous_distns.lognorm_gen object>, outlier_dist=None)[source]¶

The top-level project class for the interlaboratory comparison module

Key Sample_names:
	List of sample names, used as keys for the dictionaries of data set names and processed and raw data. Each key in this list will correspond to a `ExperimentGroup` object
Key Data_set_names:
	Dictionary of data sets (labs) with data for each sample
Key data:	Dictionary of data to be used for the interlab analysis
Key rawdata:	Dictionary of unprocessed data, if different from data
Key distance_metrics:
	List of distance metrics. Each metric in this list will be used to create a `DistanceMetric` object within each `ExperimentGroup` object
Key x_data_list:
	The list of x data in the data array. For 2D data, this is not used
Key range_to_use:
	Used to screen certain parts of the spectral data from consideration in the experimental comparison
Key distribution_function:
	Which distribution will be assumed when assigning Z scores to each measurement of a sample. The default is sp.stats.lognorm
Key outlier_dist:
	Which distribution will be assumed when detecting outliers. The default is the same as distribution_function

set_distances()[source]¶: Calculates the interspectral distances for each experiment group and metric

fit_zscores()[source]¶: Fits the sample-level zscores for each experiment group and metric.

find_outliers()[source]¶: Finds the sample outliers for each experiment group and metric.

extract_matrices()[source]¶: Runs extract_experimental_matrix() for each experimental matrix

extract_experimental_matrix()[source]¶

Extracts the zscore data from the dict-of-vectors format and casts it as a 2D array.

The dictionary of sample-level z-scores is recast as an array, with one dimension corresponding to sample names and the other corresponding to laboratory.

Key screen_outliers:
Key sets:	Sets to extract for the interlab comparison
Key metric:	Distance metric that will be used
	Whether to remove outlier measurements before imputing missing values
Key imputation_axis:
	Axis along which to impute missing values

fit_lab_zscores()[source]¶: Fits the lab-level zscores for each metric.

find_lab_outliers()[source]¶: Finds the lab outliers for each metric.

plot_distance_fig()[source]¶

For each sample, generates the following plots:

A plot of the spectra generated for that sample by each laboratory
For each metric, a heat map plot of the interspectral distance matrix

Key distance_metrics:
Key plot_range:	An iterable of integers specifying which sample labels to plot
Key cmap:	The color map that will be used for the distance heat maps
Key linecolor:	The line color that will be used for the spectral data
	A list of the distance metrics for which heat maps will be plotted. If None, plot heat maps for all metrics in this project
Key plot_data:	Boolean that tells whether the raw spectral data will be plotted
Key wspace:	Horizontal spacing between the heat maps
Key ylabel_buffer:
	Space allocated for the y axis label (in inches)
Key rightlabel_buffer:
	Space allocated for the colorbar label (in inches)
Key xlabel_buffer:

Returns:	distance_fig, the distance measure figure matplotlib object.

plot_zscore_distances()[source]¶

Plots a bar chart of the average interspectral distance for each sample, annotated with the generalized Z score for each sample

Key xlabel_buffer:
Parameters:	metric – The metric for which the distances will be plotted
Key plot_range:	An iterable of integers specifying which sample labels to plot
Key numcols:	The number of columns in the distance plot
	Space allocated for x-axis labels (in inches)
Key ylabel_buffer:
	Space allocated for y-axis labels (in inches)
Key rotation:	Specifies the orientation of the z-score labels for individual labs
Returns zscorefig:
	The distances and scores plot as a matplotlib figure object

plot_histograms()[source]¶

Plots a histogram of the average interspectral distance for each sample, along with the corresponding fit

Key xlabel_buffer:
Parameters:	metrics – The metric for which the distances will be plotted
Key plot_range:	An iterable of integers specifying which sample labels to plot
Key numcols:	The number of columns in the distance plot
	Space allocated for x-axis labels (in inches)
Key rotation:	Specifies the orientation of the z-score labels for individual labs
Returns pdffig:	The distances and scores plot as a matplotlib figure object

plot_zscore_outliers()[source]¶

Plots the principal component scores for each lab along with the final distribution used to calculate the outliers

Key y_component:
Parameters:	metric – The metric used to calculate the interspectral distances
	Which principal component to use on the Y axis, if not the first
Key text:	Whether to label the plot with the name of the
Returns:	zscore_outliers_fig, the Z score outlier plot as a matplotlib figure object

plot_projected_zscores()[source]¶

Plots the projected statistical distances annotated with the corresponding laboratory-level Z scores.

Key distance_metrics:
	A list of the distance metrics for which statistical distances will be plotted. If None, plot statistical distances for all metrics in this project
Key xlabel_buffer:
	Space allocated for x-axis labels (in inches)
Key rotation:	Specifies the orientation of the z-score labels for individual labs
Returns:	zscorefig, the projected statistical distances plot as a matplotlib figure object

plot_zscore_loadings()[source]¶

Plots the principal component loadings for the statistical distances

Key distance_metrics:
	A list of the distance metrics for which loadings will be plotted. If None, plot loadings for all metrics in this project
Key xlabel_buffer:
	Space allocated for x-axis labels (in inches)
Returns:	loadfig, the projected statistical loadings plot as a matplotlib figure object