Routines to combine \(\ln \Pi\) data (combine
)#
Exceptions:
Specific error for missing overlaps. |
Functions:
|
Check that window overlaps form a connected graph. |
|
Combine multiple windows by scaling each \(\ln \Pi\). |
|
Combine windows by dropping first elements that overlap previous window. |
|
Combine up/down probabilities using weighted average. |
|
Add up/down probabilities from collection matrix. |
|
Add \(\Delta \ln \Pi(N) = \ln \Pi(N) - \ln \Pi(N-1)\) from up/down probabilities. |
|
Assign \(\ln \Pi(N)\) from up/down sorted probabilities. |
|
Normalize \(\ln\Pi\) series or array. |
- exception lnpy.combine.OverlapError[source]#
Bases:
ValueError
Specific error for missing overlaps.
- add_note()#
Exception.add_note(note) – add a note to the exception
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- lnpy.combine.check_windows_overlap(overlap_table, windows, window_index_name, macrostate_names='state', verbose=True)[source]#
Check that window overlaps form a connected graph.
- Parameters:
overlap_table (
DataFrame
) – Frame should contain columnsmacrostate_names
andwindow_index_name
and rows corresponding to overlaps.- Raises:
OverlapError – If the overlaps do not form a connected graph, then raise a
OverlapError
.
- lnpy.combine.combine_scaled_lnpi(tables, macrostate_names='state', lnpi_name='ln_prob', window_name='window', use_sparse=True, check_connected=False)[source]#
Combine multiple windows by scaling each \(\ln \Pi\).
This performs least squares on the problem:
\[\min_{C_0, C_1, ..., C_{W-1}} \sum_{\rm{overlap}_m \in \rm{overlaps}} \, \sum_{N_m, k \in \rm{overlap}_m} [\ln \bar{\Pi}(N_m) - (\ln \Pi_k (N_m) + C_k)]^2\]where,
\(C_j\) : shift for sample \(j\). \(W\) : number of
overlapping samples \(\Pi_k(N)\) : transition matrix at particle
number \(N\) for the kth sample \(\rm{overlap}_m\) : a particular
overlap at particle number \(N_m\) and over samples \(k\),
\(\ln \bar{\Pi}\) is the to be determined average value.
This can be reduced to a matrix problem of the form:
\[S C_j - \sum_{k \in \rm{overlap}_m} C_k = - (S \ln \Pi_j(N_m) - \sum_{k \in \rm{overlap}_m} \ln \Pi_k(N_m))\]the sum runs over all samples with overlap at state \(N_m\), \(S\) is the number of such overlaps (i.e., \(S = \sum_{k \in \rm{overlap}_m} 1\)). There are such equations for all \(j \in \rm{overlap}_m\).
- Parameters:
tables – Individual sample windows. If pass in a single
DataFrame
, it must contain the columnwindow_name
. Otherwise, the individual frames will be concatenated and thewindow_name
column will be added (or replaced if already present).macrostate_names – Column name(s) corresponding to a single “state”. For example, for a single component system, this could be
macrostate_names="n"
, and for a binary systemmacrostate_names=["n_0", "n_1"]
lnpi_name – Column name corresponding to \(\ln \Pi\).
window_name – Column name corresponding to “window”, i.e., an individual simulation. Note that this is only used if passing in a single dataframe with multiple windows.
use_sparse (
bool
, defaultTrue
) – Usecoo_array
array in matrix equation. This is often faster than using anumpy.ndarray
.check_connected – If
True
, check that all windows form a connected graph.
- Returns:
DataFrame
– Combined table with appropriately shiftedlnpi_name
column. Note that the table is not yet averaged overmacrostate_names
.
Examples
>>> states = pd.DataFrame(range(5), columns=["state"]) >>> tables = [states.iloc[:3], states.iloc[2:]] >>> tables = [ ... table.assign(lnpi=lambda x: x["state"] + i * 10) ... for i, table in enumerate(tables) ... ] >>> print(tables[0]) state lnpi 0 0 0 1 1 1 2 2 2 >>> print(tables[1]) state lnpi 2 2 12 3 3 13 4 4 14
>>> combined_table = combine_scaled_lnpi(tables, lnpi_name="lnpi") >>> print(combined_table) state lnpi 0 0 0.0 1 1 1.0 2 2 2.0 2 2 2.0 3 3 3.0 4 4 4.0
- lnpy.combine.combine_dropfirst(tables, window_name='window', state_name='state', check_connected=False)[source]#
Combine windows by dropping first elements that overlap previous window.
For example, if have two windows A and B with states state_A=[0,1,2] and state_B=[1,2,3] and observable x_A(state_A), x_B(state_B), then the combined result will be state=[0,1,2,3], x = [x_A(0), x_A(1), x_A(2), x_B(3)].
- Parameters:
tables – Individual sample windows. If pass in a single
DataFrame
, it must contain the columnwindow_name
. Otherwise, the individual frames will be concatenated and thewindow_name
column will be added (or replaced if already present).window_name – Column name corresponding to “window”, i.e., an individual simulation. Note that this is only used if passing in a single dataframe with multiple windows.
state_name – Column name corresponding to simulation state. For example,
state="state"
.check_connected – If
True
, check that all windows form a connected graph.
- Returns:
DataFrame
– Combined table.
Note
If there is not expanded ensemble sampling (i.e., non-integer
state
values) in windows, you should prefer usingcombine_updown_mean()
.Examples
>>> states = pd.DataFrame(range(5), columns=["state"]) >>> tables = [states.iloc[:3], states.iloc[2:]] >>> tables = [ ... table.assign(lnpi=lambda x: x["state"] + i * 10) ... for i, table in enumerate(tables) ... ] >>> print(tables[0]) state lnpi 0 0 0 1 1 1 2 2 2 >>> print(tables[1]) state lnpi 2 2 12 3 3 13 4 4 14
>>> combined_table = combine_dropfirst(tables) >>> print(combined_table) state lnpi 0 0 0 1 1 1 2 2 2 3 3 13 4 4 14
- lnpy.combine.combine_updown_mean(table, by='state', as_index=False, weight_name='n_trials', down_name='P_down', up_name='P_up', use_running=False, **kwargs)[source]#
Combine up/down probabilities using weighted average.
This can be used to splice overlapping windows into a single window, combine across replicate simulations, or both.
Notes
If any windows use expanded ensemble (i.e., non-integer
state
values), then this method should not be used to splice across windows. Instead, usecombine_dropfirst()
- Parameters:
table – Table containing
by
,up_name
,down_name
, andweight_name
columns.by – Groupby column(s).
as_index –
as_index
keyword argument (seepandas.DataFrame.groupby()
).weight_name – Column name corresponding to “weight” of probability.
down_name – Column name corresponding to “down” probability.
up_name – Column name corresponding to “up” probability.
use_running – If False (default), use straight weighted average. If True, use running weighted average. The latter can be slower, but numerically stable.
**kwargs – Extra arguments to
pandas.DataFrame.groupby()
- Returns:
DataFrame
– Combined transition matrix.
- lnpy.combine.updown_from_collectionmatrix(table, matrix_names=['c0', 'c1', 'c2'], weight_name='n_trials', down_name='P_down', up_name='P_up')[source]#
Add up/down probabilities from collection matrix.
- Parameters:
table –
pandas.DataFrame
matrix_names – Column names for collection matrix.
weight_name – Column name corresponding to “weight” of probability.
down_name – Column name corresponding to “down” probability.
up_name – Column name corresponding to “up” probability.
- Returns:
DataFrame
– New dataframe with assigned columns.
- lnpy.combine.delta_lnpi_from_updown(table, up_name='P_up', down_name='P_down', delta_lnpi_name='delta_lnpi')[source]#
Add \(\Delta \ln \Pi(N) = \ln \Pi(N) - \ln \Pi(N-1)\) from up/down probabilities.
This assumes
table
is sorted bystate
value. This function is useful if the simulation windows use extended ensemble sampling and have non-integer steps in thestate
variable. The deltas can be combined withcombine_dropfirst()
, cumalitively summed, then non integer states dropped.- Parameters:
up_name – Column name corresponding to “up” probability.
down_name – Column name corresponding to “down” probability.
delta_lnpi_name – Name of output column.
- Returns:
DataFrame
– Table withdelta_lnpi_name
column.
- lnpy.combine.lnpi_from_updown(table, lnpi_name='ln_prob', down_name='P_down', up_name='P_up', norm=True)[source]#
Assign \(\ln \Pi(N)\) from up/down sorted probabilities.
This assumes
table
is sorted bystate
value.- Parameters:
table –
pandas.DataFrame
lnpi_name – Column name corresponding to \(\ln \Pi\).
down_name – Column name corresponding to “down” probability.
up_name – Column name corresponding to “up” probability.
use_prod – If true (default), calculate from cumulative product (on probability). Otherwise, calculate from cumulative sum (on log of probability).
norm – If true (default), normalize distribution.
- Returns:
DataFrame
– New dataframe with assigned \(\ln \Pi\).