OptBayesExpt class

class optbayesexpt.obe_base.OptBayesExpt(measurement_model, setting_values, parameter_samples, constants, n_draws=30, choke=None, use_jit=True, utility_method='variance_approx', selection_method='optimal', pickiness=15, default_noise_std=1.0, **kwargs)[source]

Bases: ParticlePDF

An implementation of sequential Bayesian experiment design.

OptBayesExpt is a manager that calculates strategies for efficient measurement runs. OptBayesExpt incorporates measurement data, and uses that information to select settings for measurements with high predicted benefit / cost ratios.

The use cases are situations where the goal is to find the parameters of a parametric model.

The primary functions of this class are to interpret measurement data and to calculate effective settings. The corresponding methods that perform these functions are OptBayesExpt.pdf_update() for interpretation of new data and either OptBayesExpt.opt_setting() or OptBayesExpt.good_setting() for calculation of effective settings.

Instances of OptBayesExpt may be used for cases where

  1. Reported measurement data includes measurement uncertainty,

  2. Every measurement is assumed to cost the same amount.

  3. The measurement noise is assumed to be constant, independent of parameters and settings.

OptBayesExpt may be inherited by child classes to allow additional flexibility. Examples in the demos folder show several extensions including unknown noise, and setting-dependent costs.

Arguments:
  • measurement_model (function) – Evaluates the experimental model from (settings, parameters, constants) arguments, returning single values or arrays depending on the arguments. The model_function is very similar to the fit function in a least-squares regression. The model_function() must allow evaluation in both of the following forms:

    • model_function(tuple_of_single_settings, tuple_of_parameter_arrays, tuple_of_constants), returning an array with the same size as one of the parameter arrays.

    • model_function(tuple_of_setting_arrays, tuple_of_single_parameters, tuple_of_constants), returning an array with the same size as one of the setting arrays.

    The broadcasting feature of numpy arrays provides a convenient way to write this type of function for simple analytical models.

    Version 1.1.0 and later support model functions that return multiple output channels, e. g. real and imaginary parts or vectors expressed as tuples, lists or arrays. The number of output channels, n_channels is deduced by evaluating the measurement model function.

  • setting_values (tuple of ndarray) – Each array in the setting_values tuple contains the allowed discrete values of a measurement setting. Applied voltage, excitation frequency, and a knob that goes to eleven are all examples of settings. For computational speed, it is important to keep setting arrays appropriately sized. Settings arrays that cover unused setting values, or that use overly fine discretization will slow the calculations. Settings that are held constant belong in the constants array.

  • parameter_samples (tuple of ndarray) – In a simple example model, \(y = m * x + b\), the parameters are \(m\) and \(b\). Each array in the parameter_samples tuple contains samples from the prior distribution of a parameter. Traditionally, the prior is described as expressing the state of belief about the parameter value before measurement, so the prior can be used to include results of other measurements. For a mostly independent measurement, the prior samples should cover the full range of plausible values. Parameters that can be assumed constant belong in the constants array.

  • constants (tuple of float) – Model constants. Examples include experimental settings that are rarely changed, and model parameters that are well-known from previous measurement results.

Keyword Arguments:
  • n_draws (int) – specifies the number of parameter samples used in the utility calculation. Default 30.

  • choke (float) – If choke is specified, the likelihood will be raised to the choke power. Occasionally, simulated measurement runs will “get stuck,” and converge to incorrect parameter values. The choke argument provides a heuristic fix for better reliability at the expense of speed. For values 0.0 < choke < 1.0 choking reduces the max/min ratio of the likelihood and allows more data to influence the parameter distribution between resampling events. Default None.

  • use_jit (Boolean) – If numba is installed, pre-compile the likelihood calculation for faster execution. Arg use_jit is also passed as a keyword arg to ParticlPDF. Default True.

  • utility_method (string) – ['variance_approx' | 'pseudo_utility' | 'full_kld_utility' | 'max_min']: Specifies the utility algorithm as described in [1]. With 'max_min', n_draws=2 is recommended. Default 'variance_approx'.

  • selection_method (string) – ['optimal' | 'good' | 'random']: Specifies how the setting is selected based on the utility. If 'optimal', the setting at maximum utility is selected. If 'good', the utility is raised to a power given by pickiness parameter and normalized. The setting is selected with probability proportional to utility ** pickiness. If 'random, the utility is disregarded and the setting is chosen randomly from the allowed settings.

  • pickiness (float) – When selection_method is 'normal', this parameter affects the probability of picking a setting near a maximum in the utilty function. Default 15.

  • default_noise_std (float or ndarray) – Measurement noise standard deviation used in utility calculations. If float, the value populates entries of a \(n_{channels} \times 1\) ndarray where \(n_{ channels}\) corresponds to the number of measurement channels, e.g. 2 if data is collected from \(X\) and \(Y\) outputs of an instrument. If \(n_{channels} \times 1\) ndarray, entries are noise standard deviations corresponding to the measurement channels.

  • **kwargs – Keyword arguments passed to the parent ParticlePDF class.

Attributes:

N_DRAWS

Stores the n_draws argument.

Type:

int

allsettings

Arrays containing all possible combinations of the : setting values provided in the` setting_values argument.

Type:

list of ndarray

choke

Stores the choke argument.

Type:

float

cons

Stores the constants argument.

Type:

tuple of:obj:float

cost_estimate()[source]

A stub for estimating the cost of prospective measurements

An estimate of the cost of measurement resources. (e.g. setup time + data collection time). This estimate goes in the denominator of the utility function, yielding a benefit/cost ratio. Returns a single float if cost is the same for all settings, or an array with dimensions of self.setting_indices. :returns: 1.0. :rtype: float, or ndarray Default

default_noise_std

A noise level estimate for each channel used in setting selection used by y_var_noise_model().

Type:

ndarray

enforce_parameter_constraints()[source]

A stub for enforcing constraints on parameters

for example:

# find the particles with disallowed parameter values
# (negative parameter values in this example)
bad_ones = np.argwhere(self.parameters[3] < 0)
    for index in bad_ones:
        # setting a weight = 0 effectively eliminates the particle
        self.particle_weights[index] = 0
# renormalize
self.particle_weights = self.particle_weights / np.sum(self.particle_weights)
eval_over_all_parameters(onesettingset)[source]

Evaluates the experimental model.

Evaluates the model for one combination of measurement settings and all parameter combinations in self.parameters. Called by pdf_update() for likelihood() and Bayesian inference processing of measurement results.

This method and eval_over_all_settings() both call model_function(), but with different argument types. If the broadcasting properties of numpy arrays are not able to resolve this polymorphism, this method may be replaced by a separate method for model evaluation.

Parameters:

onesettingset (tuple of float) – a single set of measurement settings

Returns:

(ndarray) array of model values with dimensions of one element of self.allparams.

eval_over_all_settings(oneparamset)[source]

Evaluates the experimental model.

Evaluates the model for all combinations of measurement settings in self.allsettings and one set of parameters. Called N_DRAWS times by yvar_from_parameter_draws() as part of the utility() calculation

Parameters:

oneparamset (tuple of float) – a set of single model parameter values.

Returns:

(ndarray) array of model values with dimensions self.setting_indices.

get_setting()[source]

Selects settings for the next measurement.

A wrapper for the method selected by the selection_method argument. See opt_setting, good_setting() and random().

Returns:

A settings tuple.

good_setting(pickiness=None)[source]

Calculate a setting with a good utility

Selects settings using a weighted random selection using the utility function to calculate a weight. The weight function is utility( ) raised to the pickiness power. In comparison to the opt_setting() method, where the measurements select only the very best setting, good_setting() yields a more diverse series of settings. Selected by selection_method='good' argument.

Parameters:

pickiness (float) – A setting selection tuning parameter. Pickiness=0 produces random settingss. With pickiness values greater than about 10 the behavior is similar to opt_setting().

Returns:

A settings tuple.

last_setting_index

The most recent setting choice as an index into the allsettings arrays.

Type:

int

likelihood(y_model, measurement_record)[source]

Calculates the likelihood of a measurement result.

For each parameter combination, estimate the probability of obtaining the results provided in measurement_record. This default method relies on several assumptions:

  • The uncertainty in measurement results is well-described by normally-distributed (Gaussian) noise.

  • The the standard deviation of the noise, \(\sigma\) is known.

Under these assumptions, and model values \(y_{model}\) as a function of parameters, the likelihood is a Gaussian function proportional to \(\sigma^{-1} \exp [-(y_{model} - y_{meas})^2 / (2 \sigma^2)]\).

Parameters:
  • y_model (ndarray) – model_function() results evaluated for all parameters.

  • measurement_record (tuple) –

    The measurement conditions and results, supplied by the user to update_pdf(). The elements of measurement_record are:

    • settings (tuple)

    • measurement value (float or tuple)

    • std uncertainty (float or tuple)

Returns:

an array of probabilities corresponding to the parameters in self.allparameters.

measurement_results

list Records of accumulated measurement results for output to data files and / or plotting.

model_function

equal to the measurement model parameter above. with added text

Type:

function

n_channels

The number of measurement values per experiment, e.g. 2 for an : experiment that reports two voltages. Deduced from model outputs.

Type:

int

opt_setting()[source]

Find the setting with maximum utility

Selects settings based on the maximum value of the utility. Calls utility() for an estimate of the benfit/cost ratio for all allowed settings, and returns the settings corresponding to the maximum value. Selected by selection_method='optimal' argument.

Returns:

A settings tuple.

parameters

The most recently set of parameter samples the parameter distribution. self.parameters is a view of PartcilePDF.particles.

Type:

ndarray of ndarray

pdf_update(measurement_record, y_model_data=None)[source]

Refines the parameters’ probability distribution function given a measurement result.

This is where measurement results are entered. An implementation of Bayesian inference, uses the model to calculate the likelihood of obtaining the measurement result as a function of parameter values, and uses that likelihood to generate a refined posterior ( after-measurement) distribution from the prior ( pre-measurement) parameter distribution.

Warning

OptBayesExpt requires the input data to contain good estimates of measurement uncertainty. The uncertainty values entered here can influence both mean values and widths of the inferred parameter distribution. When measurement uncertainty is not well-known, OptBayesExptNoiseParameter is recommended to determine measurement uncertainty from the measured values.

Parameters:
  • measurement_record (tuple) –

    The measurement conditions and results, supplied by the user to update_pdf(). The elements of measurement_record are:

    • settings (tuple): the settings used for the

      measurement. May be different from the requested settings.

    • measurement result (float or tuple) Use a tuple for

      multi-channel measurements

    • std uncertainty (float or tuple) An uncertainty

      estimate for the measurement result.

  • y_model_data (ndarray) – The result of self.eval_over_all_parameters() This argument allows model evaluation to run before measurement data is available, e.g. while measurements are being made. Default = None.

pickiness

Stores the pickiness argument

Type:

float

random_setting()[source]

Pick a random setting for the next measurement

Randomly selects a setting from all possible setting combinations. Selected by selection_method='random' argument.

Returns:

A settings tuple.

set_n_draws(n_draws=None)[source]

Sets OptBayesExpt.N_DRAWS attribute.

Sets or queries the number of parameter samples to use in the utility calculation.

Parameters:

n_draws (int or 'default' or None) – An integer argument sets N_DRAWS, ‘default’ sets the default value of 30, and set_n_draws() returns the current value.

Returns: N_DRAWS

setting_indices

indices in to the allsettings arrays. Used in#: opt_setting() and good_setting().

Type:

ndarray of int

setting_values

A record of the setting_values argument.

Type:

tuple of ndarray

utility()[source]

Estimate the utility as a function of setting options

The utility \(U(d)\) is the predicted benefit/cost ratio of proposed measurement designs \(d\).

Note

Traditionally, utility is given in terms of a change in the information entropy. However, information entropy is a logarithmic quantity, and we are accustomed to thinking about cost on a linear scale. To facilitate estimating benefit/cost, the utility algorithms below return a ‘linearized’ utility: \(exp(U(d))-1.0\)

The utility() function is a wrapper for the algorithm selected

by the utility_method argument.

Returns: linearized utility

utility_full_kld()[source]

Estimate the utility as a function of settings.

Used in selecting measurement settings. The utility is the predicted benefit/cost ratio of a new measurement where the benefit is given in terms of a change in the information entropy of the parameter distribution. This algorithm corresponds to the “full-KLD algorithm” of [1].

Among the provided utility algorithms, utility_KLD comes closest to the information-theoretic analytical result.

Returns:

Approximate utility as an ndarray with dimensions of self.setting_indices.

utility_max_min()[source]

Estimate utility using the max-min algorithm

This algorithm corresponds to the “max-min algorithm” of [1].

In this algorithm, we use the maximum and minimum modeled outputs produced by N_DRAWS samples of the parameter distribution and the variance of the measurement noise are calculated separately.

This algorithm provides slightly lower quality setting choices than the other utility algorithms, but it executes very fast. Speed and quality of choices are both best when N_DRAWS = 2.

Returns:

Linearized utility as an ndarray with dimensions of self.setting_indices.

utility_pseudo()[source]

Estimate the utility as a function of settings.

Used in selecting measurement settings. The utility is the predicted benefit/cost ratio of a new measurement where the benefit is given in terms of a change in the information entropy of the parameter distribution. This algorithm corresponds to the “pseudo-H algorithm” of [1], and it is included here mostly for historical reasons.

In this algorithm, the idea is to mimic the utility_KLD() algorithm more closely than utility_variance(). We calculate the differential entropy of the model outputs produced by N_DRAWS samples of the parameter distribution. We then compute the variance of a normal (Gaussian) distribution that has the same information entropy. This effective variance is combined with the noise variance as in utility_variance().

Returns:

Approximate utility as an ndarray with dimensions of self.setting_indices.

utility_variance()[source]

Estimate the utility as a function of settings.

The utility is the predicted benefit/cost ratio of a new measurement where the benefit is given in terms of a change in the information entropy of the parameter distribution. This algorithm corresponds to the “variance algorithm” of [1].

In this algorithm, we use the logarithm of variance as an approximation for the information entropy. The variance of model outputs produced by N_DRAWS samples of the parameter distribution and the variance of the measurement noise are calculated separately.

Execution of utility_variance is faster than utility_variance and utility_pseudo and the decision quality is very similar to utility_KLD.

Returns:

Approximate utility as an ndarray with dimensions of self.setting_indices.

y_var_noise_model()[source]

For backawards compatibiilty, a wrapper for yvar_noise_model.

yvar_from_entropy()[source]

Models the entropy of the model values due to the parameter distributions

Evaluates the effect of the distribution of parameter values on the distribution of model outputs for every setting combination. This calculation is done as part of the utility calculation as an approximation to the information entropy. For each of self.N_DRAWS samples from the parameter distribution, this method models a noise-free experimental output for all setting combinations and returns the entropy of the model values for each setting combination, cast as a variance

Returns:

ndarray with shape of self.setting_indices

yvar_from_parameter_draws()[source]

Models the measurement variance solely due to parameter distributions.

Evaluates the effect of the distribution of parameter values on the distribution of model outputs for every setting combination. This calculation is done as part of the utility calculation as an approximation to the information entropy. For each of self.N_DRAWS samples from the parameter distribution, this method models a noise-free experimental output for all setting combinations and returns the variance of the model values for each setting combination.

Returns:

ndarray with shape of self.setting_indices

yvar_max_min()[source]

Crudely approximates the signal variance using max - min.

Returns: ndarray with shape of self.setting_indices

yvar_noise_model()[source]

A stub for models of the measurement noise

A model of measurement variance (noise) as a function of settings, averaged over parameters if parameter-dependent. Used in the utility calculation.

In general, the measurement noise could depend on both settings and parameters, and the model would require evaluation of the noise model over all parameters, averaged over draws from the parameter distribution. Measurement noise that depends on the measurement value, like root(N), Poisson-like counting noise is an example of such a situation. Fortunately, this noise estimate only affects the utility function, which only affects setting choices, where the “runs good” philosophy of the project allows a little approximation.

Returns:

If measurement noise is independent of settings, a float, otherwise an ndarray with the shape of an element of allsettings. Default: default_noise_std ** 2.

References