pymcr package

Submodules

pymcr.condition module

Functions to condition / preprocess data

pymcr.condition.standardize(X, mean_ctr=True, with_std=True, axis=-1, copy=True)[source]

Standardization of data

Parameters
  • X (ndarray) – Data array

  • mean_ctr (bool) – Mean-center data

  • with_std (bool) – Normalize by the standard deviation of the data

  • axis (int) – Axis from which to calculate mean and standard deviation

  • copy (bool) – Copy data (X) if True, overwite if False

pymcr.constraints module

Built-in constraints

All classes need a transform class. Note, unlike sklearn, transform can copy or overwrite input depending on copy attribute.

class pymcr.constraints.Constraint(copy=True)[source]

Bases: abc.ABC

Abstract class for constraints

Parameters

copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
abstract transform(A)[source]

Transform A input based on constraint

class pymcr.constraints.ConstraintNonneg(copy=False)[source]

Bases: pymcr.constraints.Constraint

Non-negativity constraint. All negative entries made 0.

Parameters

copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply nonnegative constraint

class pymcr.constraints.ConstraintCumsumNonneg(axis=-1, copy=False)[source]

Bases: pymcr.constraints.Constraint

Cumulative-Summation non-negativity constraint. All negative

entries made 0.

Parameters

copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply cumsum nonnegative constraint

class pymcr.constraints.ConstraintZeroEndPoints(axis=-1, span=1, copy=False)[source]

Bases: pymcr.constraints.Constraint

Enforce the endpoints (or the mean over a range) is zero

Parameters
  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

  • axis (int) – Axis to operate on

  • span (int) – Number of pixels along the ends to average.

_abc_impl = <_abc_data object>
transform(A)[source]

Apply cumsum nonnegative constraint

class pymcr.constraints.ConstraintZeroCumSumEndPoints(nodes=None, axis=-1, copy=False)[source]

Bases: pymcr.constraints.Constraint

Enforce the endpoints of the cumsum (or the mean over a range) is near-zero. Note: this is an approximation.

Parameters
  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

  • nodes (list of int) – In addition to end-points, other points to ensure are approximately 0

  • axis (int) – Axis to operate on

  • span (int) – Number of pixels along the ends to average.

_abc_impl = <_abc_data object>
transform(A)[source]

Apply cumsum nonnegative constraint

class pymcr.constraints.ConstraintNorm(axis=-1, fix=None, copy=False)[source]

Bases: pymcr.constraints.Constraint

Normalization constraint.

Parameters
  • axis (int) – Which axis of input matrix A to apply normalization acorss.

  • fix (list) – Keep fix-axes as-is and normalize the remaining axes based on the residual of the fixed axes.

  • set_zeros_to_feature (int) –

    Set all samples which sum-to-zero across axis to 1 for a particular

    feature (See Notes)

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

Notes

  • For set_zeros_to_feature, assuming the data represents concentration

    with a matrix [n_samples, n_features] and the axis is across the features, for every sample that sums to 0 across axis, would be replaced with a vector [n_features] of zeros except at set_zeros_to_feature, which would equal 1. I.e., this pixel is now pure substance of index value set_zeros_to_feature.

_abc_impl = <_abc_data object>
transform(A)[source]

Apply normalization constraint

class pymcr.constraints.ConstraintCutBelow(value=0, axis_sumnz=None, exclude=None, exclude_axis=-1, copy=False)[source]

Bases: pymcr.constraints._CutExclude

Cut values below (and not-equal to) a certain threshold.

Parameters
  • value (float) – Cutoff value

  • axis_sumnz (int) – If not None, cut below value only applied where sum across specified axis does not go to 0, i.e. all values cut.

  • exclude (int, list , tuple, ndarray) – Exclude targets

  • exclude_axis (int) – Along which axis to enumerate targets

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply cut-below value constraint

class pymcr.constraints.ConstraintCutAbove(value=0, axis_sumnz=None, exclude=None, exclude_axis=-1, copy=False)[source]

Bases: pymcr.constraints._CutExclude

Cut values above (and not-equal to) a certain threshold

Parameters
  • value (float) – Cutoff value

  • axis_sumnz (int) – If not None, cut above value only applied where sum across specified axis does not go to 0, i.e. all values cut.

  • exclude (int, list , tuple, ndarray) – Exclude targets

  • exclude_axis (int) – Along which axis to enumerate targets

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply cut-above value constraint

class pymcr.constraints.ConstraintCompressBelow(value=0, copy=False)[source]

Bases: pymcr.constraints.Constraint

Compress values below (and not-equal to) a certain threshold (set to value)

Parameters
  • value (float) – Cutoff value

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply compress-below value constraint

class pymcr.constraints.ConstraintCutAbove(value=0, axis_sumnz=None, exclude=None, exclude_axis=-1, copy=False)[source]

Bases: pymcr.constraints._CutExclude

Cut values above (and not-equal to) a certain threshold

Parameters
  • value (float) – Cutoff value

  • axis_sumnz (int) – If not None, cut above value only applied where sum across specified axis does not go to 0, i.e. all values cut.

  • exclude (int, list , tuple, ndarray) – Exclude targets

  • exclude_axis (int) – Along which axis to enumerate targets

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply cut-above value constraint

class pymcr.constraints.ConstraintCompressAbove(value=0, copy=False)[source]

Bases: pymcr.constraints.Constraint

Compress values above (and not-equal to) a certain threshold (set to value)

Parameters
  • value (float) – Cutoff value

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply compress-above value constraint

class pymcr.constraints.ConstraintReplaceZeros(axis=-1, feature=None, fval=1, copy=False)[source]

Bases: pymcr.constraints.Constraint

Samples that sum-to-zero across axis are replaced with a vector of 0’s except for a 1 at feature if a single value. In a concentration context, e.g., samples with no concentration are replaced with 100% concentration of a set feature. If multiple features given, equal amounts of each feature (summing to 1) are used.

Parameters
  • axis (int) – Which axis of input matrix A to apply normalization acorss.

  • feature (int, list, tuple) –

    Set all samples which sum-to-zero across axis to fval for a particular

    feature (or fractional) for multiple features.

  • fval (float) – Value of summation across axis of replacement vector.

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

_abc_impl = <_abc_data object>
transform(A)[source]

Apply constraint

class pymcr.constraints.ConstraintPlanarize(target, shape, use_vals_above=None, use_vals_below=None, lims_to_plane=True, scaler=None, recalc_scaler=False, copy=False)[source]

Bases: pymcr.constraints.Constraint

Set a particular target to a plane

Parameters
  • target (int, list, tuple) – Target numbers to set to a fitted plane

  • shape (tuple, list) – Shape of array (M,N) which is (Y,X)

  • use_vals_above (float) – Only calculate based on values above (not including)

  • use_vals_below (float) – Only calculate based on values below (not including)

  • lims_to_plane (bool) – The returned plane will be limited to the range of the optionally supplied use_vals_below, use_vals above.

  • scaler (float) – A large value that is much bigger than any values in the input array. Needed to ensure SVD properly creates plane. If None, auto-calculates.

  • recalc_scaler (bool) – Auto-calculate for every new input (does not use previously provided or calculated value)

  • copy (bool) – Make copy of input data, A; otherwise, overwrite (if mutable)

Notes

  • This uses an SVD to calculate the vector normal to the plane that fits the input data. It assumes that the 3rd singular vector is the normal; thus, the x and y vectors for the data need be larger than the variance of the input data. Scaler enables this by scaling the auto-generated x and y vectors to be much larger than the max-min of the input data

_abc_impl = <_abc_data object>
_setup_xy(scaler)[source]
transform(A)[source]

Set targets, t, to fit planes

pymcr.mcr module

MCR Main Class for Computation

class pymcr.mcr.McrAR(c_regr=<pymcr.regressors.OLS object>, st_regr=<pymcr.regressors.OLS object>, c_fit_kwargs={}, st_fit_kwargs={}, c_constraints=[<pymcr.constraints.ConstraintNonneg object>], st_constraints=[<pymcr.constraints.ConstraintNonneg object>], max_iter=50, err_fcn=<function mse>, tol_increase=0.0, tol_n_increase=10, tol_err_change=None, tol_n_above_min=10)[source]

Bases: object

Multivariate Curve Resolution - Alternating Regression

D = CS^T

Parameters
  • c_regr (str, class) – Instantiated regression class (or string, see Notes) for calculating the C matrix

  • st_regr (str, class) – Instantiated regression class (or string, see Notes) for calculating the S^T matrix

  • c_fit_kwargs (dict) – kwargs sent to c_regr.fit method

  • st_fit_kwargs (dict) – kwargs sent to st_regr.fit method

  • c_constraints (list) – List of constraints applied to calculation of C matrix

  • st_constraints (list) – List of constraints applied to calculation of S^T matrix

  • max_iter (int) – Maximum number of iterations. One iteration calculates both C and S^T

  • err_fcn (function) – Function to calculate error/differences after each least squares calculation (ie twice per iteration). Outputs to err attribute.

  • tol_increase (float) – Factor increase to allow in err attribute. Set to 0 for no increase allowed. E.g., setting to 1.0 means the err can double per iteration.

  • tol_n_increase (int) – Number of consecutive iterations for which the err attribute can increase

  • tol_err_change (float) – If err changes less than tol_err_change, per iteration, break.

  • tol_n_above_min (int) – Number of half-iterations that can be performed without reaching a new error-minimum

err

List of calculated errors (from err_fcn) after each least squares (ie twice per iteration)

Type

list

C_

Most recently calculated C matrix (that did not cause a tolerance failure)

Type

ndarray [n_samples, n_targets]

ST_

Most recently calculated S^T matrix (that did not cause a tolerance failure)

Type

ndarray [n_targets, n_features]

C_opt_

[Optimal] C matrix for lowest err attribute

Type

ndarray [n_samples, n_targets]

ST_opt_

[Optimal] ST matrix for lowest err attribute

Type

ndarray [n_targets, n_features]

n_iter

Total number of iterations performed

Type

int

n_features

Total number of features, e.g. spectral frequencies.

Type

int

n_samples

Total number of samples (e.g., pixels)

Type

int

n_targets

Total number of targets (e.g., pure analytes)

Type

int

n_iter_opt

Iteration when optimal C and ST calculated

Type

int

exit_max_iter_reached

Exited iterations due to maximum number of iteration reached (max_iter parameter)

Type

bool

exit_tol_increase

Exited iterations due to maximum fractional increase in error metric (via err_fcn)

Type

bool

exit_tol_n_increase

Exited iterations due to maximum number of consecutive increases in error metric (via err fcn)

Type

bool

exit_tol_err_change

Exited iterations due to error metric change that is smaller than tol_err_change

Type

bool

exit_tol_n_above_min

Exited iterations due to maximum number of half-iterations for which the error metric increased above the minimum error

Type

bool

Notes

  • Built-in regressor classes (str can be used): OLS (ordinary least squares), NNLS (non-negatively constrained least squares). See mcr.regressors.

  • Built-in regressor methods can be given as a string to c_regr, st_regr; though instantiating an imported class gives more flexibility.

  • Setting any tolerance to None turns that check off

property D_

D matrix with current C and S^T matrices

property D_opt_

D matrix with optimal C and S^T matrices

_check_regr(mth)[source]

Check regressor method. If acceptable strings, instantiate and return object. If instantiated class, make sure it has a fit attribute.

_ismin_err(val)[source]

Is the current error the minimum

fit(D, C=None, ST=None, st_fix=None, c_fix=None, c_first=True, verbose=False, post_iter_fcn=None, post_half_fcn=None)[source]

Perform MCR-AR. D = CS^T. Solve for C and S^T iteratively.

Parameters
  • D (ndarray) – D matrix

  • C (ndarray) – Initial C matrix estimate. Only provide initial C OR S^T.

  • ST (ndarray) – Initial S^T matrix estimate. Only provide initial C OR S^T.

  • st_fix (list) – The spectral component numbers to keep fixed.

  • c_fix (list) – The concentration component numbers to keep fixed.

  • c_first (bool) – Calculate C first when both C and ST are provided. c_fix and st_fix must also be provided in this circumstance.

  • verbose (bool) – Log iteration and per-least squares err results. See Notes.

  • post_iter_fcn (function) – Function to perform after each iteration

  • post_half_fcn (function) – Function to perform after half-iteration

Notes

  • pyMCR (>= 0.3.1) uses the native Python logging module rather than print statements; thus, to see the messages, one will need to log-to-file or stream to stdout. More info is available in the docs.

property n_features

Number of features

property n_samples

Number of samples

property n_targets

Number of targets

pymcr.metrics module

Metrics used in pyMCR

All functions must take C, ST, D_actual, D_calculated

pymcr.metrics.mse(C, ST, D_actual, D_calculated)[source]

Mean square error

pymcr.regressors module

Built-in least squares / regression methods.

All models will follow the formalism, AX = B, solve for X.

NOTE: coef_ will be X.T, which is the formalism that scikit-learn follows

class pymcr.regressors.LinearRegression[source]

Bases: abc.ABC

Abstract class for linear regression methods

_abc_impl = <_abc_data object>
property coef_

The transposed form of X. This is the formalism of scikit-learn

abstract fit(A, B)[source]

AX = B, solve for X

class pymcr.regressors.NNLS(*args, **kwargs)[source]

Bases: pymcr.regressors.LinearRegression

Non-negative constrained least squares regression

AX = B, solve for X (coeffients.T)

coef_

Regression coefficients

Type

ndarray

residual_

Residual (sum-of-squares)

Type

ndarray

Notes

This is simply a wrapped version of NNLS (scipy.optimize.nnls).

coef_ is X.T, which is the formalism of scikit-learn

_abc_impl = <_abc_data object>
fit(A, B)[source]

Solve for X: AX = B

class pymcr.regressors.OLS(*args, **kwargs)[source]

Bases: pymcr.regressors.LinearRegression

Ordinary least squares regression

AX = B, solve for X (coefficients.T)

coef_

Regression coefficients (X.T)

Type

ndarray

residual_

Residual (sum-of-squares)

Type

ndarray

rank_

Effective rank of matrix A

Type

int

svs_

Singular values of matrix A

Type

ndarray

Notes

This is simply a wrapped version of Ordinary Least Squares (scipy.linalg.lstsq).

coef_ is X.T, which is the formalism of scikit-learn

_abc_impl = <_abc_data object>
fit(A, B)[source]

Solve for X: AX = B

Module contents

pyMCR: Pythonic Multivariate Curve Resolution - Alternating Least Squares