pymcr package¶
Subpackages¶
Submodules¶
pymcr.condition module¶
Functions to condition / preprocess data
pymcr.constraints module¶
Built-in constraints
All classes need a transform class. Note, unlike sklearn, transform can copy or overwrite input depending on copy attribute.
-
class
pymcr.constraints.
Constraint
(copy=True)[source]¶ Bases:
abc.ABC
Abstract class for constraints
- Parameters
copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintNonneg
(copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Non-negativity constraint. All negative entries made 0.
- Parameters
copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintCumsumNonneg
(axis=-1, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
- Cumulative-Summation non-negativity constraint. All negative
entries made 0.
- Parameters
copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintZeroEndPoints
(axis=-1, span=1, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Enforce the endpoints (or the mean over a range) is zero
- Parameters
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintZeroCumSumEndPoints
(nodes=None, axis=-1, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Enforce the endpoints of the cumsum (or the mean over a range) is near-zero. Note: this is an approximation.
- Parameters
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintNorm
(axis=-1, fix=None, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Normalization constraint.
- Parameters
axis (int) – Which axis of input matrix A to apply normalization acorss.
fix (list) – Keep fix-axes as-is and normalize the remaining axes based on the residual of the fixed axes.
set_zeros_to_feature (int) –
- Set all samples which sum-to-zero across axis to 1 for a particular
feature (See Notes)
copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)
Notes
- For set_zeros_to_feature, assuming the data represents concentration
with a matrix [n_samples, n_features] and the axis is across the features, for every sample that sums to 0 across axis, would be replaced with a vector [n_features] of zeros except at set_zeros_to_feature, which would equal 1. I.e., this pixel is now pure substance of index value set_zeros_to_feature.
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintCutBelow
(value=0, axis_sumnz=None, exclude=None, exclude_axis=-1, copy=False)[source]¶ Bases:
pymcr.constraints._CutExclude
Cut values below (and not-equal to) a certain threshold.
- Parameters
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintCutAbove
(value=0, axis_sumnz=None, exclude=None, exclude_axis=-1, copy=False)[source]¶ Bases:
pymcr.constraints._CutExclude
Cut values above (and not-equal to) a certain threshold
- Parameters
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintCompressBelow
(value=0, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Compress values below (and not-equal to) a certain threshold (set to value)
- Parameters
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintCutAbove
(value=0, axis_sumnz=None, exclude=None, exclude_axis=-1, copy=False)[source] Bases:
pymcr.constraints._CutExclude
Cut values above (and not-equal to) a certain threshold
- Parameters
-
_abc_impl
= <_abc_data object>
-
transform
(A)[source] Apply cut-above value constraint
-
class
pymcr.constraints.
ConstraintCompressAbove
(value=0, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Compress values above (and not-equal to) a certain threshold (set to value)
- Parameters
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintReplaceZeros
(axis=-1, feature=None, fval=1, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Samples that sum-to-zero across axis are replaced with a vector of 0’s except for a 1 at feature if a single value. In a concentration context, e.g., samples with no concentration are replaced with 100% concentration of a set feature. If multiple features given, equal amounts of each feature (summing to 1) are used.
- Parameters
axis (int) – Which axis of input matrix A to apply normalization acorss.
- Set all samples which sum-to-zero across axis to fval for a particular
feature (or fractional) for multiple features.
fval (float) – Value of summation across axis of replacement vector.
copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)
-
_abc_impl
= <_abc_data object>¶
-
class
pymcr.constraints.
ConstraintPlanarize
(target, shape, use_vals_above=None, use_vals_below=None, lims_to_plane=True, scaler=None, recalc_scaler=False, copy=False)[source]¶ Bases:
pymcr.constraints.Constraint
Set a particular target to a plane
- Parameters
target (int, list, tuple) – Target numbers to set to a fitted plane
use_vals_above (float) – Only calculate based on values above (not including)
use_vals_below (float) – Only calculate based on values below (not including)
lims_to_plane (bool) – The returned plane will be limited to the range of the optionally supplied use_vals_below, use_vals above.
scaler (float) – A large value that is much bigger than any values in the input array. Needed to ensure SVD properly creates plane. If None, auto-calculates.
recalc_scaler (bool) – Auto-calculate for every new input (does not use previously provided or calculated value)
copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)
Notes
This uses an SVD to calculate the vector normal to the plane that fits the input data. It assumes that the 3rd singular vector is the normal; thus, the x and y vectors for the data need be larger than the variance of the input data. Scaler enables this by scaling the auto-generated x and y vectors to be much larger than the max-min of the input data
-
_abc_impl
= <_abc_data object>¶
pymcr.mcr module¶
MCR Main Class for Computation
-
class
pymcr.mcr.
McrAR
(c_regr=<pymcr.regressors.OLS object>, st_regr=<pymcr.regressors.OLS object>, c_fit_kwargs={}, st_fit_kwargs={}, c_constraints=[<pymcr.constraints.ConstraintNonneg object>], st_constraints=[<pymcr.constraints.ConstraintNonneg object>], max_iter=50, err_fcn=<function mse>, tol_increase=0.0, tol_n_increase=10, tol_err_change=None, tol_n_above_min=10)[source]¶ Bases:
object
Multivariate Curve Resolution - Alternating Regression
D = CS^T
- Parameters
c_regr (str, class) – Instantiated regression class (or string, see Notes) for calculating the C matrix
st_regr (str, class) – Instantiated regression class (or string, see Notes) for calculating the S^T matrix
c_fit_kwargs (dict) – kwargs sent to c_regr.fit method
st_fit_kwargs (dict) – kwargs sent to st_regr.fit method
c_constraints (list) – List of constraints applied to calculation of C matrix
st_constraints (list) – List of constraints applied to calculation of S^T matrix
max_iter (int) – Maximum number of iterations. One iteration calculates both C and S^T
err_fcn (function) – Function to calculate error/differences after each least squares calculation (ie twice per iteration). Outputs to err attribute.
tol_increase (float) – Factor increase to allow in err attribute. Set to 0 for no increase allowed. E.g., setting to 1.0 means the err can double per iteration.
tol_n_increase (int) – Number of consecutive iterations for which the err attribute can increase
tol_err_change (float) – If err changes less than tol_err_change, per iteration, break.
tol_n_above_min (int) – Number of half-iterations that can be performed without reaching a new error-minimum
-
err
¶ List of calculated errors (from err_fcn) after each least squares (ie twice per iteration)
- Type
-
C_
¶ Most recently calculated C matrix (that did not cause a tolerance failure)
- Type
ndarray [n_samples, n_targets]
-
ST_
¶ Most recently calculated S^T matrix (that did not cause a tolerance failure)
- Type
ndarray [n_targets, n_features]
-
C_opt_
¶ [Optimal] C matrix for lowest err attribute
- Type
ndarray [n_samples, n_targets]
-
ST_opt_
¶ [Optimal] ST matrix for lowest err attribute
- Type
ndarray [n_targets, n_features]
-
exit_max_iter_reached
¶ Exited iterations due to maximum number of iteration reached (max_iter parameter)
- Type
-
exit_tol_increase
¶ Exited iterations due to maximum fractional increase in error metric (via err_fcn)
- Type
-
exit_tol_n_increase
¶ Exited iterations due to maximum number of consecutive increases in error metric (via err fcn)
- Type
-
exit_tol_err_change
¶ Exited iterations due to error metric change that is smaller than tol_err_change
- Type
-
exit_tol_n_above_min
¶ Exited iterations due to maximum number of half-iterations for which the error metric increased above the minimum error
- Type
Notes
Built-in regressor classes (str can be used): OLS (ordinary least squares), NNLS (non-negatively constrained least squares). See mcr.regressors.
Built-in regressor methods can be given as a string to c_regr, st_regr; though instantiating an imported class gives more flexibility.
Setting any tolerance to None turns that check off
-
property
D_
¶ D matrix with current C and S^T matrices
-
property
D_opt_
¶ D matrix with optimal C and S^T matrices
-
_check_regr
(mth)[source]¶ Check regressor method. If acceptable strings, instantiate and return object. If instantiated class, make sure it has a fit attribute.
-
fit
(D, C=None, ST=None, st_fix=None, c_fix=None, c_first=True, verbose=False, post_iter_fcn=None, post_half_fcn=None)[source]¶ Perform MCR-AR. D = CS^T. Solve for C and S^T iteratively.
- Parameters
D (ndarray) – D matrix
C (ndarray) – Initial C matrix estimate. Only provide initial C OR S^T.
ST (ndarray) – Initial S^T matrix estimate. Only provide initial C OR S^T.
st_fix (list) – The spectral component numbers to keep fixed.
c_fix (list) – The concentration component numbers to keep fixed.
c_first (bool) – Calculate C first when both C and ST are provided. c_fix and st_fix must also be provided in this circumstance.
verbose (bool) – Log iteration and per-least squares err results. See Notes.
post_iter_fcn (function) – Function to perform after each iteration
post_half_fcn (function) – Function to perform after half-iteration
Notes
pyMCR (>= 0.3.1) uses the native Python logging module rather than print statements; thus, to see the messages, one will need to log-to-file or stream to stdout. More info is available in the docs.
-
property
n_features
Number of features
-
property
n_samples
Number of samples
-
property
n_targets
Number of targets
pymcr.regressors module¶
Built-in least squares / regression methods.
All models will follow the formalism, AX = B, solve for X.
NOTE: coef_ will be X.T, which is the formalism that scikit-learn follows
-
class
pymcr.regressors.
LinearRegression
[source]¶ Bases:
abc.ABC
Abstract class for linear regression methods
-
_abc_impl
= <_abc_data object>¶
-
property
coef_
¶ The transposed form of X. This is the formalism of scikit-learn
-
-
class
pymcr.regressors.
NNLS
(*args, **kwargs)[source]¶ Bases:
pymcr.regressors.LinearRegression
Non-negative constrained least squares regression
AX = B, solve for X (coeffients.T)
-
coef_
¶ Regression coefficients
- Type
ndarray
-
residual_
¶ Residual (sum-of-squares)
- Type
ndarray
Notes
This is simply a wrapped version of NNLS (scipy.optimize.nnls).
coef_ is X.T, which is the formalism of scikit-learn
-
_abc_impl
= <_abc_data object>¶
-
-
class
pymcr.regressors.
OLS
(*args, **kwargs)[source]¶ Bases:
pymcr.regressors.LinearRegression
Ordinary least squares regression
AX = B, solve for X (coefficients.T)
-
coef_
¶ Regression coefficients (X.T)
- Type
ndarray
-
residual_
¶ Residual (sum-of-squares)
- Type
ndarray
-
svs_
¶ Singular values of matrix A
- Type
ndarray
Notes
This is simply a wrapped version of Ordinary Least Squares (scipy.linalg.lstsq).
coef_ is X.T, which is the formalism of scikit-learn
-
_abc_impl
= <_abc_data object>¶
-
Module contents¶
pyMCR: Pythonic Multivariate Curve Resolution - Alternating Least Squares