2. Data Processing Algorithms in MOSAIC

There are three primary algorithms available in MOSAIC to process time-series data from single-molecule nanopore experiments. Fitting-based approaches are outlined in the Introduction, are implemented in MOSAIC using two separate algorithms, i) StepResponseAnalysis is used for events that exhibit a single state, and ii) MultistateAnalysis for N-state events. In addition, the CUSUM algorithm is available for N-state events.

2.1. ADEPT 2-State

This algorithm limits the generalized algorithm for state-detection [] to cases with a single state as seen in the figure below. This simplified approach speeds up the analysis considerably and is appropriate to use for many applications, for example the detection of PEG, small molecules, DNA homopolymers, etc. The adept2State class uses a simplified form of the expression for the ionic current across a nanopore as shown below. Settings that control the fit are defined through the settings file and are described in more detail in the Optimizing Settings section. This functional form is fit to a time-series from a single event to recover optimal parameters for the mdoel.

\[i(t)=i_0 + a \left[ \left(e^{-(t+\mu_1)/\tau} -1\right) H\left(t-\mu_1\right) + \left(1- e^{-(t+\mu_2)/\tau} \right)H\left(t-\mu_2\right) \right]\]

This simplification speeds up the analysis for two state events like the PEG event in the figure below. The figure shows the results of the fit (or meta-data) superimposed on the time-series of a single event.

../_images/StepResponse.png

2.1.1. Algorithm Settings

2.1.2. Metadata Output

Meta-data for individual events generated by adept2State can be queried using SQLite as described in the Database Structure and Query Syntax section. A list of meta-data stored by the step response algorithm is given below.

Column Name

Column Type

Description

recIDX

ProcessingStatus

OpenChCurrent

BlockedCurrent

EventStart

EventEnd

BlockDepth

ResTime

RCConstant1

RCConstant2

AbsEventStart

ReducedChiSquared

ProcessTime

TimeSeries

INTEGER

TEXT

REAL

REAL

REAL

REAL

REAL

REAL

REAL

REAL

REAL

REAL

REAL

REAL_LIST

Record index.

Status of the analysis.

Open channel current in pA.

Blocked state current in pA.

Event start in ms.

Event end in ms.

BlockedCurrent/OpenChCurrent.

EventEnd-EventStart in ms.

Downstroke RC constant in ms.

Upstroke RC constant in ms.

Global event start time in ms.

Reduced Chi-squared of fit.

Event processing time in ms.

(OPTIONAL) Event time-series.

2.2. ADEPT

The multistate algorithm implements the general case for identifying states in nanopore data []. The general form of the equation used in this algorithm is shown below, where N is the number of states. This functional form is fit to a time-series from a single event to recover optimal parameters for the mdoel.

\[i(t)=i_0 + \sum_{j=1}^{N} a_j\left(1-e^{-\left(t-\mu_j\right)/\tau_j}\right) H\left(t-\mu_j\right)\]

Settings that control the fit are defined through the settings file and are described in more detail in the Optimizing Settings section. Upon successfully fitting the model to an event, adept generates meta-data the describes the individual states in the event. A representative example of one such event is shown in the figure below.

../_images/Multistate.png

2.2.1. Algorithm Settings

2.2.2. Metadata Output

The adept algorithm outputs meta-data that characterizes every processed event. Similar to the ADEPT 2-State algorithm, this information is stored in a SQLite database and is available for further processing (see Database Structure and Query Syntax). Notably, the data output by adept differs from adept2State in one important way. Because the number of states (NStates) detected in each event is not pre-determined, key meta-data (e.g. BlockDepth, EventDelay, etc.) are stored as arrays of real numbers with length equal to NStates.

Column Name

Column Type

Description

recIDX

ProcessingStatus

OpenChCurrent

NStates

CurrentStep

BlockDepth

EventStart

EventEnd

EventDelay

StateResTime

ResTime

RCConstant

AbsEventStart

ReducedChiSquared

ProcessTime

TimeSeries

INTEGER

TEXT

REAL

INTEGER

REAL_LIST

REAL_LIST

REAL

REAL

REAL_LIST

REAL_LIST

REAL

REAL_LIST

REAL

REAL

REAL

REAL_LIST

Record index.

Status of the analysis.

Open channel current in pA.

Number of detected states.

Blocked current steps in pA.

BlockedCurrent/OpenChCurrent for each state.

Event start in ms.

Event end in ms.

Start time of each state in ms.

Residence time of each state in ms.

EventEnd-EventStart in ms.

System RC constant in ms.

Global event start time in ms.

Reduced Chi-squared of fit.

Event processing time in ms.

(OPTIONAL) Event time-series.

2.3. CUSUM+

The CUSUM algorithm (used by OpenNanopore for example) [] is available in MOSAIC. In contrast with other algorithms available in MOSAIC, this approach does not leverage system information in the analysis. This however results in a faster estimation of single- and multi-level events, compared with ADEPT 2-State and ADEPT. You can read about the CUSUM algorithm here.

Some known issues with CUSUM:

  1. If the duration of a sub-event is shorter than a five RC constants, the averaging will underestimate the extent of the current change. For longer events, CUSUM should achieve very similar output to the fitting employed elsewhere in MOSAIC.

  2. CUSUM assumes an instantaneous transition between current states. As a result, if the RC rise time of the system is large, CUSUM can trigger and detect intermediate states. This can usually be mitigated by optimizing the algorithm sensitivity settings.

  3. If an event is very long, CUSUM will detect a state transistion even if there is no real change, leading to an artificially high number of states. This is a consequence of false positives from using a statistical t-test. In some cases this can be mitigated by reducing the sensitivity.

Settings that control the algorithm are defined through the settings file, as described the Optimizing Settings section. Upon successfully analyzing an event, cusumPlus generates meta-data the describes the individual states in the event. A representative example of one such event is shown in the figure below.

../_images/CUSUM.png

2.3.1. Algorithm Settings

2.3.2. Metadata Output

The cusumPlus algorithm outputs meta-data that characterizes every processed event. Similar to the ADEPT algorithm, this information is stored in a SQLite database and is available for further processing (see Database Structure and Query Syntax).

Column Name

Column Type

Description

recIDX

ProcessingStatus

OpenChCurrent

NStates

CurrentStep

BlockDepth

EventStart

EventEnd

EventDelay

StateResTime

ResTime

AbsEventStart

ProcessTime

TimeSeries

INTEGER

TEXT

REAL

INTEGER

REAL_LIST

REAL_LIST

REAL

REAL

REAL_LIST

REAL_LIST

REAL

REAL

REAL

REAL_LIST

Record index.

Status of the analysis.

Open channel current in pA.

Number of detected states.

Blocked current steps in pA.

BlockedCurrent/OpenChCurrent for each state.

Event start in ms.

Event end in ms.

Start time of each state in ms.

Residence time of each state in ms.

EventEnd-EventStart in ms.

Global event start time in ms.

Event processing time in ms.

(OPTIONAL) Event time-series.