Module Reference

Function Summary

cross_validate([xdata, ydata, PLS_model, …]) Conducts a cross-validation analysis on a set of data using a regression algorithm.
simple_bootstrap([xdata, ydata, PLS_model, …]) Conducts a simple residual bootstrap analysis on a set of data.
bootstrap([xdata, ydata, validdata, …]) Conducts a simple residual bootstrap analysis on a set of data.
bootstrap_unc([xdata, ydata, valid_data, …]) Computes the uncertainty in a bootstrap analysis by leave-one-out cross-validation.
pca_bootstrap([xdata, ydata, groups, …]) Conducts a residual bootstrap analysis on a set of data.
misclass_probability(probability_zero, …) Estimate the misclassification probability of a sample, which is based on the confidence level of the prediction compared to the true value.

Function documentation

ml_uncertainty.cross_validate(xdata=None, ydata=None, PLS_model=None, cv_object=None, sk_model=<class 'sklearn.cross_decomposition.pls_.PLSRegression'>, PLS_kw=None, class_value=0.5)[source]

Conducts a cross-validation analysis on a set of data using a regression algorithm.

This function is essentially a pass-through to sklearn.model_selection.cross_val_predict, and then does PLS-DA class assignments

Key xdata:The X data used to fit the model (default None)
Key ydata:The Y data used to fit the model (default None)
Key PLS_model:The scikit-learn model that will be fit using X and Y
Key cv_object:The cross-validation model that will be used for calculating cross-validation statistics
Key sk_model:If PLS_model,PLS_cv,or PLS_bootstrap is None, this scikit-learn model will be used to create them
Key PLS_kw:The keyword arguments that will be passed to sk_model
Key class_value:
 The value separating the classes in PLS-DA
Returns:class_assigned_cv, which is the dummy variable array in PLS-DA, and class_predicted_cv, the array of PLS predictions
ml_uncertainty.simple_bootstrap(xdata=None, ydata=None, PLS_model=None, PLS_cv=None, PLS_bootstrap=None, sk_model=<class 'sklearn.cross_decomposition.pls_.PLSRegression'>, cv_object=None, class_value=0.5, samples=1000, PLS_kw=None, return_boot=False)[source]

Conducts a simple residual bootstrap analysis on a set of data. Computes cross-validation uncertainty.

This function relies on the Y-data being bootstrapped to be one-dimensional. It also requires the model to be accept two-dimensional data. The bootstrapping is done by generating \(samples\) random variations on the Y-data and then concatenating them into a two-dimensional array.

If PLS_model is None, then PLS_cv and PLS_bootstrap are ignored. The function will create independent instances of :py:class:sk_model for each of PLS_model, PLS_cv, and PLS_boostrap.

If PLS_model is not None, then it will be reused for PLS_cv and PLS_bootstrap.

Key xdata:The X data used to fit the model (default None)
Key ydata:The Y data used to fit the model (default None)
Key PLS_model:The scikit-learn model that will be fit using X and Y
Key PLS_cv:The scikit-learn model that will be used for cross-validation
Key PLS_bootstrap:
 The scikit-learn model that will be used for bootstrapping
Key sk_model:If PLS_model,PLS_cv,or PLS_bootstrap is None, this scikit-learn model will be used to create them
Key cv_object:The cross-validation model that will be used for calculating cross-validation statistics
Key class_value:
 The value separating the classes in PLS-DA
Key samples:The number of samples for bootstrapping
Key PLS_kw:The keyword arguments that will be passed to sk_model
Key return_boot:
 If True, returns the PLS_bootstrap model as part of the output
ml_uncertainty.bootstrap(xdata=None, ydata=None, validdata=None, PLS_model=None, PLS_cv=None, PLS_bootstrap=None, sk_model=<class 'sklearn.cross_decomposition.pls_.PLSRegression'>, regression=False, cv_object=None, class_value=0.5, samples=1000, PLS_kw=None, return_scores=False, return_loadings=False, tq=True)[source]

Conducts a simple residual bootstrap analysis on a set of data. Computes cross-validation uncertainty.

This function performs a full bootstrap and makes no assumption about the shape or structure of the Y data. Each bootstrap sample will have an independent model fit to it.

If PLS_model is None, then PLS_cv and PLS_bootstrap are ignored. The function will create independent instances of :py:class:sk_model for each of PLS_model, PLS_cv, and PLS_boostrap.

If PLS_model is not None, then it will be reused for PLS_cv and PLS_bootstrap.

Key xdata:The X data used to fit the model (default None)
Key ydata:The Y data used to fit the model (default None)
Key validdata:Additional data not used to fit the model but for which uncertainty will be calculated
Key PLS_model:The scikit-learn model that will be fit using X and Y. If None, a new model will be created from sk_model
Key PLS_cv:The scikit-learn model that will be used for cross-validation. If None, same as PLS_model.
Key PLS_bootstrap:
 The scikit-learn model that will be used for bootstrapping. If None, same as PLS_model.
Key sk_model:If PLS_model,PLS_cv,or PLS_bootstrap is None, this scikit-learn model will be used to create them
Key cv_object:The cross-validation model that will be used for calculating cross-validation statistics
Key class_value:
 The value separating the classes in PLS-DA
Key samples:The number of samples for bootstrapping
Key PLS_kw:The keyword arguments that will be passed to sk_model
Key return_scores:
 If True, returns the scores of the PLS_bootstrap model as part of the output
Key return_loadings:
 If True, returns the loadings of the PLS_bootstrap model as part of the output
ml_uncertainty.bootstrap_unc(xdata=None, ydata=None, valid_data=None, cv_object=None, samples=1000, class_value=0.5, PLS_kw=None, return_scores=False, tq=True)[source]

Computes the uncertainty in a bootstrap analysis by leave-one-out cross-validation.

For each sample, the uncertainty is calculated by fitting the other samples to the model, calculating the bootstrap uncertainty and then calculating the uncertainty in the held-out sample.

Key xdata:The X data used to fit the model (default None)
Key ydata:The Y data used to fit the model (default None)
Key valid_data:Additional data not used to fit the model but for which uncertainty will be calculated
Key cv_object:The cross-validation model that will be used for calculating cross-validation statistics
Key samples:The number of samples for bootstrapping
Key class_value:
 The value separating the classes in PLS-DA
Key PLS_kw:The keyword arguments that will be passed to sk_model
Key return_scores:
 If True, returns the scores of the PLS_bootstrap model as part of the output
ml_uncertainty.pca_bootstrap(xdata=None, ydata=None, groups=None, validdata=None, PCA_model=None, PCA_cv=None, PCA_bootstrap=None, skmodel=<class 'sklearn.decomposition.pca.PCA'>, scaler=None, cv_object=None, samples=1000, PCA_kw=None, tq=True)[source]

Conducts a residual bootstrap analysis on a set of data. Computes cross-validation uncertainty.

This function is the same as bootstrap() but works for unsupervised models such as PCA

If PLS_model is None, then PLS_cv and PLS_bootstrap are ignored. The function will create independent instances of :py:class:sk_model for each of PLS_model, PLS_cv, and PLS_boostrap.

If PLS_model is not None, then it will be reused for PLS_cv and PLS_bootstrap.

Key xdata:The X data used to fit the model (default None)
Key ydata:The Y data used to fit the model (default None)
Key PCA_model:The scikit-learn model that will be fit using X
Key PCA_cv:The scikit-learn model that will be used for cross-validation
Key PCA_bootstrap:
 The scikit-learn model that will be used for bootstrapping
Key sk_model:If PLS_model,PLS_cv,or PLS_bootstrap is None, this scikit-learn model will be used to create them
Key scaler:The scikit-learn preprocessing object used to preprocess the data. This will be put into a
Key cv_object:The cross-validation model that will be used for calculating cross-validation statistics
Key samples:The number of samples for bootstrapping
Key PCA_kw:The keyword arguments that will be passed to sk_model
Key return_boot:
 If True, returns the PLS_bootstrap model as part of the output
ml_uncertainty.misclass_probability(probability_zero, misclass_mask)[source]

Estimate the misclassification probability of a sample, which is based on the confidence level of the prediction compared to the true value.