Models for Gaussian process regression (gp_models)#

Classes:

DerivativeKernel(kernel_expr, obs_dims[, ...])

Creates a differentiable kernel based on a sympy expression.

HetGaussianNoiseGP(data[, noise_kernel])

EXPERIMENTAL! NOT INTENDED FOR USE, BUT USEFUL FOR FUTURE WORK!

FullyHeteroscedasticGPR(data, kernel[, ...])

EXPERIMENTAL! NOT INTENDED FOR USE, BUT USEFUL FOR FUTURE WORK!

HetGaussianSimple(cov[, init_scale])

NOTE MAINTAINED, MAY BE OUT OF DATE AND NOT COMPATIBLE.

HetGaussianDeriv(cov, obs_dims[, p, s, ...])

Heteroscedastic Gaussian likelihood with variance provided and no modeling of noise variance.

HeteroscedasticGPR_analytical_scale(data, kernel)

EXPERIMENTAL! NOT INTENDED FOR USE, BUT MAYBE INTERESTING TO CONSIDER IN FUTURE!

HeteroscedasticGPR(data, kernel[, ...])

Implements a GPR model with heteroscedastic input noise (full noise covariance matrix).

ConstantMeanWithDerivs(y_data)

Constant mean function that takes derivative-augmented X as input.

LinearWithDerivs(x_data, y_data)

Linear mean function that can be applied to derivative data - in other words, the 0th order derivative is fit with a linear fit, so the 1st derivative also has to be modified (by a constant that is the slope).

SympyMeanFunc(expr, x_data, y_data[, params])

Mean function based on sympy expression.

Functions:

multioutput_multivariate_normal(x, mu, L)

Follows gpflow.logdensities.multivariate_normal exactly, but changes reducing sums so that multiple outputs with DIFFERENT covariance matrices can be taken into account.

class thermoextrap.gpr_active.gp_models.DerivativeKernel(kernel_expr, obs_dims, kernel_params=None, active_dims=None, **kwargs)[source]#

Bases: Kernel

Creates a differentiable kernel based on a sympy expression.

Given observations that are tagged with the order of the derivative, builds the appropriate kernel. Be warned that your kernel_expr will not be checked to make sure it is positive definite, stationary, etc. There are rules for kernel_expr and kernel_params that guarantee consistency. First, the variable names supplied as keys to kernel_params should match the symbol names in kernel_expr. Symbol names for the inputs should be ‘x1’ and ‘x2’ (ignoring case). For multidimensional kernels, the dimensions of ‘x1’ and ‘x2’ should be indexed such as ‘x1_0’, ‘x1_1’, and ‘x2_0’, ‘x2_1’, etc. These will be identified from the provided expression and sorted to guarantee specific ordering when taking derivatives.

Parameters:
  • kernel_expr (Expr) – Expression for the kernel that can be differentiated - must have at least 2 symbols (symbol names should be ‘x1’ and ‘x2’, case insensitive, if have only 2).

  • obs_dims (int) – Number of dimensions for observable input (input should be twice this with obs_dims values then obs_dims derivative labels each row)

  • kernel_params (mapping) – A dictionary of kernel parameters that can be optimized by tensorflow (key should be name, then references list with value then another dict with kwargs for gpflow.Parameter, i.e., {‘variance’, [1.0, {‘transform’:gpflow.utilities.positive()}]} so if you don’t want to set any kwargs, just pass empty dictionary. NOTE THAT THE KEYS MUST MATCH THE SYMBOL NAMES IN kernel_expr OTHER THAN ‘x1’ and ‘x2’. Default is empty dict, so will mine names from kernel_expr and set all parameters to 1.0.

Attributes:

ard

Whether ARD behavior is active, following gpflow.kernels.Stationary

property ard#

Whether ARD behavior is active, following gpflow.kernels.Stationary

class thermoextrap.gpr_active.gp_models.HetGaussianNoiseGP(data, noise_kernel=None, **kwargs)[source]#

Bases: ScalarLikelihood

EXPERIMENTAL! NOT INTENDED FOR USE, BUT USEFUL FOR FUTURE WORK!

Intended to model the noise associated with a GPR model using another GP contained within the likelihood. In other words, the likelihood, which usually describes the distribution for the added noise, is based on a GP that predicts the noise based on a specific input location, allowing for heteroscedastic noise modeling. Typically, you will want to actually model the logarithm of the noise variance as a function of the input, but this likelihood is more general than that.

Specifically, the GP over noise is self.noise_GP, and is a standard gpflow.models.GPR model with a kernel specified by noise_kernel. If not provided, the default kernel used is a Matern52 with separate lengthscales over the different input dimensions.

class thermoextrap.gpr_active.gp_models.FullyHeteroscedasticGPR(data, kernel, mean_function=None, noise_kernel=None)[source]#

Bases: GPModel, InternalDataTrainingLossMixin

EXPERIMENTAL! NOT INTENDED FOR USE, BUT USEFUL FOR FUTURE WORK!

Implements a fully heteroscedastic GPR model in which the noise is modeled with another Gaussian Process. To accomplish this, the likelihood is set to contain a simple GPR model that predicts the logarithm of the noise based on noise estimates passed into the model. The full likelihood involves that of both the outer heteroscedastic GPR using the predicted noise values and the GP on the noise, as proposed by Binois, et al. 2018. However, since we do not want to model the “full N” data (i.e., all of the outputs for each sim configuration), but instead just the means from each simulation (guaranteed to be Gaussian by the CLT), we really follow the protocol of Ankenman et al., 2010 but allow noise in the GP over noise so that smoothing is applied. And, as mentioned above, both likelihoods are combined, not fit separately, as in Binois, et al. 2018.

The input X data just has to match whatever kernel function is used. For the input Y data, there must be three columns: (1) the values to model, (2) the variance associated with each value, and (3) the number of sim frames or configurations used to calculate the provided value and variance.

Methods:

maximum_log_likelihood_objective()

Objective for maximum likelihood estimation.

predict_f(Xnew[, full_cov, full_output_cov])

See gpflow.models.GPModel.predict_f() for further details.

predict_y(Xnew[, full_cov, full_output_cov])

See gpflow.models.GPModel.predict_y() for further details.

predict_log_density(data[, full_cov, ...])

Compute the log of the probability density of the data at the new data points.

maximum_log_likelihood_objective()[source]#

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Returns:

  • return has shape [].

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]#

See gpflow.models.GPModel.predict_f() for further details.

predict_y(Xnew, full_cov=False, full_output_cov=False)[source]#

See gpflow.models.GPModel.predict_y() for further details.

predict_log_density(data, full_cov=False, full_output_cov=False)[source]#

Compute the log of the probability density of the data at the new data points.

Parameters:

data

  • data[0] has shape [batch…, N, D].

  • data[1] has shape [batch…, N, P].

Returns:

  • return has shape [batch…, N].

class thermoextrap.gpr_active.gp_models.HetGaussianSimple(cov, init_scale=1.0, **kwargs)[source]#

Bases: ScalarLikelihood

NOTE MAINTAINED, MAY BE OUT OF DATE AND NOT COMPATIBLE.

Heteroscedastic Gaussian likelihood with variance provided and no modeling of noise variance. Note that the noise variance can be provided as a matrix or a 1D array. If a 1D array, it is assumed that the off-diagonal elements of the noise covariance matrix are all zeros, otherwise the noise covariance is used. For diagonal elements, it would make sense to also provide this information as an additional column in the target outputs, Y. However, this is not possible for a provided covariance matrix, when some of the noise values may be correlated as for derivatives at the same input location, X, measured from the same simulation. Just be careful to make sure shapes of Y and F (predicted GP mean values) match shape of provided covariance matrix - if matrix is NxN, each of Y and F should be N.

Methods:

build_scaled_cov_mat()

Creates scaled covariance matrix using noise scale parameters.

build_scaled_cov_mat()[source]#

Creates scaled covariance matrix using noise scale parameters.

thermoextrap.gpr_active.gp_models.multioutput_multivariate_normal(x, mu, L)[source]#

Follows gpflow.logdensities.multivariate_normal exactly, but changes reducing sums so that multiple outputs with DIFFERENT covariance matrices can be taken into account. This still assumes that data in different columns of x are independent, but allows for a different Cholesky decomposition for each column or dimension. In the code for GPflow, everything would work if supplied x.T[…, None] was supplied with an L with leading batch dimension of the same dimensionality as the last dimension of x, EXCEPT that the last tf.reduce_sum over the diagonal part of L would sum over all independent matrices, which we do not want. This could all be accomplished with a loop over dimensions and separate applications of multivariate_normal, but hopefully this parallelizes.

Parameters:
  • x (array) – Shape N x D where here N is the number of input locations and D is the dimensionality

  • mu (array) – Shape N x D, or broadcastable to NxD. mean values

  • L (array) – Shape DxNxN Cholesky decomposition of D independent covariance matrices

Returns:

p (array) – Shape of length D. Vector of log probabilities for each dimension (summed over input locations) Since covariance matrices independent across dimensions but convey covariances across locations, makes sense to sum over locations as would for multivariate Gaussian over each dimension

class thermoextrap.gpr_active.gp_models.HetGaussianDeriv(cov, obs_dims, p=10.0, s=0.0, transform_p=<tfp.bijectors.Softplus 'softplus' batch_shape=[] forward_min_event_ndims=0 inverse_min_event_ndims=0 dtype_x=? dtype_y=?>, transform_s=None, constrain_p=False, constrain_s=True, **kwargs)[source]#

Bases: ScalarLikelihood

Heteroscedastic Gaussian likelihood with variance provided and no modeling of noise variance.

Note that the noise variance can be provided as a matrix or a 1D array. If a 1D array, it is assumed that the off-diagonal elements of the noise covariance matrix are all zeros, otherwise the noise covariance is used. For diagonal elements, it would make sense to also provide this information as an additional column in the target outputs, Y. However, this is not possible for a provided covariance matrix, when some of the noise values may be correlated as for derivatives at the same input location, X, measured from the same simulation. Just be careful to make sure shapes of Y and F (predicted GP mean values) match shape of provided covariance matrix - if matrix is NxN, each of Y and F should be N.

Additionally, takes derivative orders of each input point. This model by default will scale noise differently for different derivative orders, effectively assuming that uncertainty is likely to be estimated incorrectly at some orders and accurately at others.

Won’t learn full model on noise, but can still allow scaling of it to be learned Imagine adding parameter to indicate “trust” of given noise and scale it So just add parameter to train that scales noise For scaling model, effectively model logarithm of each element in covariance matrix

\[\ln {\rm cov}_{i,j} = \ln {\rm cov}_{i,j,0} + p sum(d_i + d_j) + s\]

or

\[{\rm cov}_{i,j} = {\rm cov}_{i,j,0} \exp[ p sum(d_i + d_j)] \exp(s)\]

Note that the summation over derivative orders is over all of the input dimensions (i.e., if the input is 3D, we sum over three derivative orders.

We can accomplish the above while keeping the scaled covariance matrix positive semidefinite by making the scaling matrix diagonal with positive entries If we then take S*Cov*S, with S being the diagonal scaling matrix with positive entries, the result will be positive semi-definite because S is positive definite and Cov is positive semidefinite

The scaling matrix is given by \(exp(s + p*d_i,j)\) if \(i=j\) and 0 otherwise While could make parameters s and p unconstrained, default will set s=0, p>=0`. This means that we CANNOT decrease the uncertainty, only increase it Further, if we increase the uncertainty, we must do it MORE for higher order derivatives

Rationale is that it’s only a really big deal if underestimate uncertainty Further, tend to have more numerical issues, bias, etc. in derivatives Even if derivatives actually more certain, typically want to focus on Fitting the function itself, not the derivatives In that case, can set p effectively to zero and will emphasize derivatives more

Parameters:
  • cov (array) – (fixed) covariance matrix (or its diagonal) for the uncertainty (noise) in the data

  • obs_dims (int) – number of dimensions in the input/observation, X; the first obs_dims columns of X will be treated as input locations while the final obs_dims entries will be derivative orders of each data point (see DerivativeKernel)

  • p (float, default 10.0) – scaling of the covariance matrix dependent on derivative order

  • s (float, default 0.0) – scaling of the covariance matrix independent of derivative order

  • transform_p (object, optional) – Defaults to gpflow.utilities.positive() transformation of p during training of the GP model; the default is to require it be positive

  • transform_s (object, optional) – transformation of s during GP model training

  • constrain_p (bool, default False) – whether or not p should be constrained and not altered during GP model training

  • constrain_s (bool, default True) – whether or not to constrain s during GP model training

  • **kwargs – Extra keyword arguments passed to gpflow.likelihoods.ScalarLikelihood

Methods:

build_scaled_cov_mat(X)

Creates scaled covariance matrix using noise scale parameters

build_scaled_cov_mat(X)[source]#

Creates scaled covariance matrix using noise scale parameters

class thermoextrap.gpr_active.gp_models.HeteroscedasticGPR_analytical_scale(data, kernel, mean_function=None, scale_fac=None)[source]#

Bases: GPModel, InternalDataTrainingLossMixin

EXPERIMENTAL! NOT INTENDED FOR USE, BUT MAYBE INTERESTING TO CONSIDER IN FUTURE!

Implements a GPR model with heteroscedastic input noise, which can be just a vector (diagonal noise covariance matrix) or the full noise covariance matrix if noise is correlated within some of the input data. The latter is useful for derivatives from the same simulation at the same input location. The covariance matrix is expected to be the third element of the input data tuple (X, Y, noise_cov).

Methods:

maximum_log_likelihood_objective()

Objective for maximum likelihood estimation.

predict_f(Xnew[, full_cov, full_output_cov])

See gpflow.models.GPModel.predict_f() for further details.

predict_y(Xnew[, full_cov, full_output_cov])

See gpflow.models.GPModel.predict_y() for further details.

predict_log_density(data[, full_cov, ...])

Compute the log of the probability density of the data at the new data points.

maximum_log_likelihood_objective()[source]#

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Returns:

  • return has shape [].

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]#

See gpflow.models.GPModel.predict_f() for further details.

predict_y(Xnew, full_cov=False, full_output_cov=False)[source]#

See gpflow.models.GPModel.predict_y() for further details.

predict_log_density(data, full_cov=False, full_output_cov=False)[source]#

Compute the log of the probability density of the data at the new data points.

Parameters:

data

  • data[0] has shape [batch…, N, D].

  • data[1] has shape [batch…, N, P].

Returns:

  • return has shape [batch…, N].

class thermoextrap.gpr_active.gp_models.HeteroscedasticGPR(data, kernel, mean_function=None, scale_fac=1.0, likelihood_kwargs=None)[source]#

Bases: GPModel, InternalDataTrainingLossMixin

Implements a GPR model with heteroscedastic input noise (full noise covariance matrix).

The full covariance matrix is necessary for derivatives from the same simulation at the same input location, which will likely be correlated. If the output is multidimensional, a separate covariance matrix may be specified for each dimension of the output - if this is not the case, the same covariance matrix will be used for all output dimensions. The consequence of this structure is that the model is independent across output dimensions, which means that, for multidimensional output, a gpflow shared or separate independent multioutput kernel should be used to wrap whatever kernel has been specified. If it is detected that the kernel does not satisfy this property, the model will attempt to appropriately wrap the specified kernel. The covariance matrix is expected to be the third element of the input data tuple (X, Y, noise_cov). Specific shapes should be X.shape == (N, 2*D_x), Y.shape == (N, D_y), and noise_cov.shape == (N, D_y, D_y) or (D_y, D_y), where N is the number of input locations and D_x is the input dimensionality, and D_y is the output dimensionality. Note that the first D_x columns of X are for the locations and the next D_x columns are for the derivative order (with respect to the corresponding input dimension) of the observation at that location. As an example, for a single observation (row of X or Y), X may be [0.5, 0.5, 1.0, 3.0], indicating that at the point (0.5, 0.5), the corresponding observation in Y is a 1st partial derivative with respect to the first X dimension and a 3rd partial derivative with respect to the second.

Parameters:
  • data (list of tuple) – A list or tuple of the input locations, output data, and noise covariance matrix, in that order

  • kernel (DerivativeKernel object) – The kernel to use; must be DerivativeKernel or compatible subclass expecting derivative information provided in extra columns of the input locations

  • mean_function (callable(), optional) – Mean function to be used (probably should be one that handles inputs including the derivative order)

  • scale_fac (float, default 1.0) – scaling factor on the output data; can apply to each dimension separately if an array; helpful to ensure all output dimensions have similar variance

  • likelihood_kwargs, dict, optional – Dictionary of keyword arguments to pass to the HetGaussianDeriv likelihood model used by this GP model

Methods:

maximum_log_likelihood_objective()

Objective for maximum likelihood estimation.

predict_f(Xnew[, full_cov, full_output_cov])

See gpflow.models.GPModel.predict_f() for further details.

predict_y(Xnew[, full_cov, full_output_cov])

See gpflow.models.GPModel.predict_y() for further details.

predict_log_density(data[, full_cov, ...])

Compute the log of the probability density of the data at the new data points.

maximum_log_likelihood_objective()[source]#

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Returns:

  • return has shape [].

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]#

See gpflow.models.GPModel.predict_f() for further details.

predict_y(Xnew, full_cov=False, full_output_cov=False)[source]#

See gpflow.models.GPModel.predict_y() for further details.

predict_log_density(data, full_cov=False, full_output_cov=False)[source]#

Compute the log of the probability density of the data at the new data points.

Parameters:

data

  • data[0] has shape [batch…, N, D].

  • data[1] has shape [batch…, N, P].

Returns:

  • return has shape [batch…, N].

class thermoextrap.gpr_active.gp_models.ConstantMeanWithDerivs(y_data)[source]#

Bases: MeanFunction

Constant mean function that takes derivative-augmented X as input. Only applies mean function constant to zeroth order derivatives. Because added constant, adding mean function does not change variance or derivatives.

Parameters:

y_data (array-like) – The data for which the mean should be taken

Methods:

__call__(X)

Call self as a function.

__call__(X)[source]#

Call self as a function.

class thermoextrap.gpr_active.gp_models.LinearWithDerivs(x_data, y_data)[source]#

Bases: MeanFunction

Linear mean function that can be applied to derivative data - in other words, the 0th order derivative is fit with a linear fit, so the 1st derivative also has to be modified (by a constant that is the slope). Currently handles y of multiple dimensions, but scalar output only (so fits hyperplane). Columns of y_data should be different dimensions while rows are observations.

Parameters:
  • x_data (array-like) – input locations of data points

  • y_data (array-like) – output data to learn linear function for based on input locations

Methods:

__call__(X)

Call self as a function.

__call__(X)[source]#

Call self as a function.

class thermoextrap.gpr_active.gp_models.SympyMeanFunc(expr, x_data, y_data, params=None)[source]#

Bases: MeanFunction

Mean function based on sympy expression. This way, can take derivatives up to any order. In the provided expression, the input variable should be ‘x’ or ‘X’, otherwise this will not work. For consistency with other mean functions, only fit based on zero-order data, rather than fitting during training of full GP model. params is an optional dictionary specifying starting parameter values.

Parameters:
  • expr (Expr) – Representing the functional form of the mean function.

  • x_data (array-like) – the input locations of the data

  • y_data (array-like) – the output values of the data to fit the mean function to

  • params (dict, optional) – dictionary specifying starting parameter values for the mean function; in other words, these values will be substituted into the sympy expression to start with

Methods:

__call__(X)

Closely follows K_diag from DerivativeKernel.

__call__(X)[source]#

Closely follows K_diag from DerivativeKernel.