Routines to combine \(\ln \Pi\) data (combine)#

Exceptions:

OverlapError

Specific error for missing overlaps.

Functions:

check_windows_overlap(overlap_table, ...[, ...])

Check that window overlaps form a connected graph.

combine_scaled_lnpi(tables[, ...])

Combine multiple windows by scaling each \(\ln \Pi\).

combine_dropfirst(tables[, window_name, ...])

Combine windows by dropping first elements that overlap previous window.

combine_updown_mean(table[, by, as_index, ...])

Combine up/down probabilities using weighted average.

updown_from_collectionmatrix(table[, ...])

Add up/down probabilities from collection matrix.

delta_lnpi_from_updown(table[, up_name, ...])

Add \(\Delta \ln \Pi(N) = \ln \Pi(N) - \ln \Pi(N-1)\) from up/down probabilities.

lnpi_from_updown(table[, lnpi_name, ...])

Assign \(\ln \Pi(N)\) from up/down sorted probabilities.

normalize_lnpi(lnpi)

Normalize \(\ln\Pi\) series or array.

exception lnpy.combine.OverlapError[source]#

Bases: ValueError

Specific error for missing overlaps.

add_note()#

Exception.add_note(note) – add a note to the exception

with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

lnpy.combine.check_windows_overlap(overlap_table, windows, window_index_name, macrostate_names='state', verbose=True)[source]#

Check that window overlaps form a connected graph.

Parameters:

overlap_table (DataFrame) – Frame should contain columns macrostate_names and window_index_name and rows corresponding to overlaps.

Raises:

OverlapError – If the overlaps do not form a connected graph, then raise a OverlapError.

lnpy.combine.combine_scaled_lnpi(tables, macrostate_names='state', lnpi_name='ln_prob', window_name='window', use_sparse=True, check_connected=False)[source]#

Combine multiple windows by scaling each \(\ln \Pi\).

This performs least squares on the problem:

\[\min_{C_0, C_1, ..., C_{W-1}} \sum_{\rm{overlap}_m \in \rm{overlaps}} \, \sum_{N_m, k \in \rm{overlap}_m} [\ln \bar{\Pi}(N_m) - (\ln \Pi_k (N_m) + C_k)]^2\]

where,

  • \(C_j\) : shift for sample \(j\). \(W\) : number of

  • overlapping samples \(\Pi_k(N)\) : transition matrix at particle

  • number \(N\) for the kth sample \(\rm{overlap}_m\) : a particular

  • overlap at particle number \(N_m\) and over samples \(k\),

  • \(\ln \bar{\Pi}\) is the to be determined average value.

This can be reduced to a matrix problem of the form:

\[S C_j - \sum_{k \in \rm{overlap}_m} C_k = - (S \ln \Pi_j(N_m) - \sum_{k \in \rm{overlap}_m} \ln \Pi_k(N_m))\]

the sum runs over all samples with overlap at state \(N_m\), \(S\) is the number of such overlaps (i.e., \(S = \sum_{k \in \rm{overlap}_m} 1\)). There are such equations for all \(j \in \rm{overlap}_m\).

Parameters:
  • tables – Individual sample windows. If pass in a single DataFrame, it must contain the column window_name. Otherwise, the individual frames will be concatenated and the window_name column will be added (or replaced if already present).

  • macrostate_names – Column name(s) corresponding to a single “state”. For example, for a single component system, this could be macrostate_names="n", and for a binary system macrostate_names=["n_0", "n_1"]

  • lnpi_name – Column name corresponding to \(\ln \Pi\).

  • window_name – Column name corresponding to “window”, i.e., an individual simulation. Note that this is only used if passing in a single dataframe with multiple windows.

  • use_sparse (bool, default True) – Use coo_array array in matrix equation. This is often faster than using a numpy.ndarray.

  • check_connected – If True, check that all windows form a connected graph.

Returns:

DataFrame – Combined table with appropriately shifted lnpi_name column. Note that the table is not yet averaged over macrostate_names.

Examples

>>> states = pd.DataFrame(range(5), columns=["state"])
>>> tables = [states.iloc[:3], states.iloc[2:]]
>>> tables = [
...     table.assign(lnpi=lambda x: x["state"] + i * 10)
...     for i, table in enumerate(tables)
... ]
>>> print(tables[0])
   state  lnpi
0      0     0
1      1     1
2      2     2
>>> print(tables[1])
   state  lnpi
2      2    12
3      3    13
4      4    14
>>> combined_table = combine_scaled_lnpi(tables, lnpi_name="lnpi")
>>> print(combined_table)
   state  lnpi
0      0   0.0
1      1   1.0
2      2   2.0
2      2   2.0
3      3   3.0
4      4   4.0
lnpy.combine.combine_dropfirst(tables, window_name='window', state_name='state', check_connected=False)[source]#

Combine windows by dropping first elements that overlap previous window.

For example, if have two windows A and B with states state_A=[0,1,2] and state_B=[1,2,3] and observable x_A(state_A), x_B(state_B), then the combined result will be state=[0,1,2,3], x = [x_A(0), x_A(1), x_A(2), x_B(3)].

Parameters:
  • tables – Individual sample windows. If pass in a single DataFrame, it must contain the column window_name. Otherwise, the individual frames will be concatenated and the window_name column will be added (or replaced if already present).

  • window_name – Column name corresponding to “window”, i.e., an individual simulation. Note that this is only used if passing in a single dataframe with multiple windows.

  • state_name – Column name corresponding to simulation state. For example, state="state".

  • check_connected – If True, check that all windows form a connected graph.

Returns:

DataFrame – Combined table.

Note

If there is not expanded ensemble sampling (i.e., non-integer state values) in windows, you should prefer using combine_updown_mean().

Examples

>>> states = pd.DataFrame(range(5), columns=["state"])
>>> tables = [states.iloc[:3], states.iloc[2:]]
>>> tables = [
...     table.assign(lnpi=lambda x: x["state"] + i * 10)
...     for i, table in enumerate(tables)
... ]
>>> print(tables[0])
   state  lnpi
0      0     0
1      1     1
2      2     2
>>> print(tables[1])
   state  lnpi
2      2    12
3      3    13
4      4    14
>>> combined_table = combine_dropfirst(tables)
>>> print(combined_table)
   state  lnpi
0      0     0
1      1     1
2      2     2
3      3    13
4      4    14
lnpy.combine.combine_updown_mean(table, by='state', as_index=False, weight_name='n_trials', down_name='P_down', up_name='P_up', use_running=False, **kwargs)[source]#

Combine up/down probabilities using weighted average.

This can be used to splice overlapping windows into a single window, combine across replicate simulations, or both.

Notes

If any windows use expanded ensemble (i.e., non-integer state values), then this method should not be used to splice across windows. Instead, use combine_dropfirst()

Parameters:
  • table – Table containing by, up_name, down_name, and weight_name columns.

  • by – Groupby column(s).

  • as_indexas_index keyword argument (see pandas.DataFrame.groupby()).

  • weight_name – Column name corresponding to “weight” of probability.

  • down_name – Column name corresponding to “down” probability.

  • up_name – Column name corresponding to “up” probability.

  • use_running – If False (default), use straight weighted average. If True, use running weighted average. The latter can be slower, but numerically stable.

  • **kwargs – Extra arguments to pandas.DataFrame.groupby()

Returns:

DataFrame – Combined transition matrix.

lnpy.combine.updown_from_collectionmatrix(table, matrix_names=['c0', 'c1', 'c2'], weight_name='n_trials', down_name='P_down', up_name='P_up')[source]#

Add up/down probabilities from collection matrix.

Parameters:
  • tablepandas.DataFrame

  • matrix_names – Column names for collection matrix.

  • weight_name – Column name corresponding to “weight” of probability.

  • down_name – Column name corresponding to “down” probability.

  • up_name – Column name corresponding to “up” probability.

Returns:

DataFrame – New dataframe with assigned columns.

lnpy.combine.delta_lnpi_from_updown(table, up_name='P_up', down_name='P_down', delta_lnpi_name='delta_lnpi')[source]#

Add \(\Delta \ln \Pi(N) = \ln \Pi(N) - \ln \Pi(N-1)\) from up/down probabilities.

This assumes table is sorted by state value. This function is useful if the simulation windows use extended ensemble sampling and have non-integer steps in the state variable. The deltas can be combined with combine_dropfirst(), cumalitively summed, then non integer states dropped.

Parameters:
  • up_name – Column name corresponding to “up” probability.

  • down_name – Column name corresponding to “down” probability.

  • delta_lnpi_name – Name of output column.

Returns:

DataFrame – Table with delta_lnpi_name column.

lnpy.combine.lnpi_from_updown(table, lnpi_name='ln_prob', down_name='P_down', up_name='P_up', norm=True)[source]#

Assign \(\ln \Pi(N)\) from up/down sorted probabilities.

This assumes table is sorted by state value.

Parameters:
  • tablepandas.DataFrame

  • lnpi_name – Column name corresponding to \(\ln \Pi\).

  • down_name – Column name corresponding to “down” probability.

  • up_name – Column name corresponding to “up” probability.

  • use_prod – If true (default), calculate from cumulative product (on probability). Otherwise, calculate from cumulative sum (on log of probability).

  • norm – If true (default), normalize distribution.

Returns:

DataFrame – New dataframe with assigned \(\ln \Pi\).

lnpy.combine.normalize_lnpi(lnpi)[source]#

Normalize \(\ln\Pi\) series or array.