ZenoWrapper
zenowrapper.main.ZenoWrapper
- class zenowrapper.main.ZenoWrapper(atom_group: AtomGroup, type_radii: dict[str, float] | None = None, n_walks: int | None = 100000, min_n_walks: int | None = None, n_interior_samples: int | None = 10000, min_n_interior_samples: int | None = None, max_rsd_capacitance: float | None = None, max_rsd_polarizability: float | None = None, max_rsd_volume: float | None = None, max_run_time: float | None = None, num_threads: int = 1, temperature: float | None = None, size_scaling_factor: float = 1.0, launch_radius: float | None = None, skin_thickness: float | None = None, mass: float | None = None, viscosity: float | None = None, buoyancy_factor: float | None = None, temperature_units: str = 'K', viscosity_units: str = 'p', length_units: str = 'A', mass_units: str = 'g', seed: int = -1, verbose: bool = False, **kwargs)[source]
Bases:
AnalysisBaseCompute hydrodynamic and geometric properties via ZENO Monte Carlo methods.
This analysis class wraps the NIST ZENO software to calculate physical properties of molecular structures from MD trajectories. It uses two Monte Carlo algorithms: Walk-on-Spheres (exterior) for electrical/hydrodynamic properties and Interior Sampling for geometric properties.
The electrical properties are converted to hydrodynamic properties via the electrostatic-hydrodynamic analogy. All properties include rigorous statistical uncertainties from variance estimation.
Two-Level Parallelization
ZenoWrapper supports parallelization at two independent levels:
Frame-level parallelism (MDAnalysis): Distribute trajectory frames across multiple Python processes using the
backendparameter inrun(). Supported backends:'serial','multiprocessing','dask'.Within-frame parallelism (ZENO C++): Parallelize Monte Carlo walks within each frame using the
num_threadsparameter. ZENO uses C++ std::thread for efficient shared-memory parallelism.
These can be combined for maximum performance. For example, with 16 CPU cores:
run(backend='multiprocessing', n_workers=4)withnum_threads=4uses all 16 cores (4 processes × 4 threads per process).Note
Frame-level parallelization requires trajectory readers that support random access and pickling (most file-based readers). Streaming readers like
IMDReaderonly support serial analysis with within-frame threading.- Parameters:
atom_group (MDAnalysis.core.groups.AtomGroup) – Atoms to analyze (e.g., protein, polymer, nanoparticle)
type_radii (dict) – Mapping of atom types to VdW radii in length_units, e.g.,
{'C': 1.7, 'N': 1.55, 'O': 1.52}. Required for all atom types present.n_walks (int, optional) – Number of random walks for exterior calculation (default: 100000). Higher values improve accuracy but increase computation time.
min_n_walks (int, optional) – Minimum walks before convergence check. If None, runs exactly n_walks.
n_interior_samples (int, optional) – Number of sample points for interior calculation (default: 10000). Higher values improve volume/gyration accuracy.
min_n_interior_samples (int, optional) – Minimum samples before convergence check. If None, runs exactly n_interior_samples.
max_rsd_capacitance (float, optional) – Target relative standard deviation for capacitance (e.g., 0.01 for 1%). Computation stops early if reached.
max_rsd_polarizability (float, optional) – Target relative standard deviation for polarizability.
max_rsd_volume (float, optional) – Target relative standard deviation for volume.
max_run_time (float, optional) – Maximum computation time in seconds per frame.
temperature (float, optional) – Temperature for diffusion coefficient calculation. Required if diffusion coefficient is needed.
size_scaling_factor (float, optional) – Scale all radii by this factor (default: 1.0)
launch_radius (float, optional) – Radius of sphere enclosing molecule. Auto-computed if None.
skin_thickness (float, optional) – Cutoff distance for Walk-on-Spheres termination. Auto-set if None.
mass (float, optional) – Molecular mass for intrinsic viscosity with mass units and sedimentation coefficient. Specify in mass_units.
viscosity (float, optional) – Solvent viscosity for friction/diffusion/sedimentation coefficients. Specify in viscosity_units (e.g., 0.01 poise for water at 20°C).
buoyancy_factor (float, optional) – Buoyancy factor for sedimentation coefficient (typically 0.72-0.73 for proteins in water).
temperature_units (str, optional) – ‘K’ (Kelvin) or ‘C’ (Celsius), default ‘K’
viscosity_units (str, optional) – ‘p’ (poise) or ‘cp’ (centipoise), default ‘p’
length_units (str, optional) – ‘m’, ‘cm’, ‘nm’, ‘A’ (Angstrom), or ‘L’ (arbitrary), default ‘A’
mass_units (str, optional) – ‘Da’, ‘kDa’, ‘g’, or ‘kg’, default ‘g’
seed (int, optional) – Random seed for reproducibility. Default -1 (random seed).
verbose (bool, optional) – Print progress information (default: False)
**kwargs – Additional arguments passed to
MDAnalysis.analysis.base.AnalysisBase
- Variables:
universe (MDAnalysis.core.universe.Universe) – Universe containing the trajectory
atom_group (MDAnalysis.core.groups.AtomGroup) – Atoms being analyzed
type_radii (numpy.ndarray) – Array of radii for each atom in atom_group
results (MDAnalysis.analysis.base.Results) –
Container for computed properties. Each property is a
Propertyobject with.valuesand.variancearrays indexed by frame. Available properties depend on input parameters:- Always computed:
capacitance: Molecular capacitance (length units)electric_polarizability_tensor: 3×3 polarizability tensor (length³)electric_polarizability_eigenvalues: Eigenvalues of tensor (length³)electric_polarizability: Mean polarizability (length³)intrinsic_conductivity: Dimensionless conductivityvolume: Molecular volume (length³)gyration_tensor: 3×3 gyration tensor (length²)gyration_eigenvalues: Eigenvalues of gyration tensor (length²)capacitance_same_volume_sphere: Equivalent sphere radius (length)hydrodynamic_radius: Effective radius in solution (length)viscometric_radius: Radius from viscosity (length)intrinsic_viscosity: Dimensionless intrinsic viscosityprefactor_polarizability2intrinsic_viscosity: Shape factor
- Requires viscosity:
friction_coefficient: Resistance to motion (mass/time)
- Requires temperature and viscosity:
diffusion_coefficient: Translational diffusion (length²/time)
- Requires mass, buoyancy_factor, and viscosity:
sedimentation_coefficient: Sedimentation rate (time)
- Requires mass:
intrinsic_viscosity_mass: Intrinsic viscosity (length³/mass)
n_frames (int) – Number of frames analyzed
times (numpy.ndarray) – Timestamps of analyzed frames
frames (numpy.ndarray) – Indices of analyzed frames
Notes
Parallelization Strategy
For optimal performance, consider your workload characteristics:
Many frames (>100), fast computation: Use frame-level parallelism (
backend='multiprocessing') withnum_threads=1or small valuesFew frames (<20), expensive computation: Use within-frame parallelism with higher
num_threadsandbackend='serial'Balanced workload: Use hybrid approach, e.g.,
n_workers=4withnum_threads=4on a 16-core machine
Memory usage scales with
n_workers(each worker loads a copy of the trajectory), whilenum_threadsshares memory within each worker.Examples
Basic usage computing hydrodynamic radius:
>>> import MDAnalysis as mda >>> from zenowrapper import ZenoWrapper >>> u = mda.Universe('protein.pdb', 'traj.dcd') >>> type_radii = {'C': 1.7, 'N': 1.55, 'O': 1.52, 'S': 1.8} >>> zeno = ZenoWrapper(u.select_atoms('protein'), type_radii=type_radii) >>> zeno.run(start=0, stop=100, step=10) >>> print(zeno.results.hydrodynamic_radius.values.mean())
Computing diffusion coefficient:
>>> zeno = ZenoWrapper( ... u.atoms, ... type_radii=type_radii, ... temperature=298.15, # K ... viscosity=0.01, # poise (water at 20°C) ... temperature_units='K', ... viscosity_units='p' ... ) >>> zeno.run() >>> D = zeno.results.diffusion_coefficient.values >>> D_mean = D.mean() >>> D_std = np.sqrt(zeno.results.diffusion_coefficient.variance.mean())
Using frame-level parallelization for long trajectories:
>>> # Process 1000 frames using 8 workers, single-threaded per frame >>> zeno = ZenoWrapper(u.atoms, type_radii=type_radii, num_threads=1) >>> zeno.run(backend='multiprocessing', n_workers=8)
Using within-frame parallelization for expensive computation:
>>> # Process few frames with 16-threaded ZENO per frame >>> zeno = ZenoWrapper( ... u.atoms, ... type_radii=type_radii, ... n_walks=10000000, # 10M walks = expensive! ... num_threads=16 ... ) >>> zeno.run(backend='serial')
Using hybrid two-level parallelization:
>>> # Balance: 4 workers × 4 threads = 16 cores total >>> zeno = ZenoWrapper(u.atoms, type_radii=type_radii, num_threads=4) >>> zeno.run(backend='multiprocessing', n_workers=4)
See also
MDAnalysis.analysis.base.AnalysisBaseBase class with run() method
PropertyContainer for results with values and uncertainties
Notes
Computation time scales linearly with n_walks and n_interior_samples
Statistical uncertainty decreases as 1/sqrt(N) for N samples
The electrostatic-hydrodynamic analogy relates capacitance C to hydrodynamic radius: R_h = C (exact for spheres, approximate for other shapes)
All properties include variance estimates via propagation of uncertainties
Setting a fixed seed ensures reproducible results
References
Douglas, J.F., Zhou, H.X., Hubbard, J.B. (1994) “Hydrodynamic friction and the capacitance of arbitrarily shaped objects” Phys. Rev. E 49: 5319-5331
ZENO Documentation: https://zeno.nist.gov/
ZENO Calculations: https://zeno.nist.gov/Calculations.html
ZENO Output Properties: https://zeno.nist.gov/Output.html
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure.
Return the supported parallel backends for this analysis.
ZenoWrapper.run([start, stop, step, frames, ...])Perform the calculation
- classmethod get_supported_backends()[source]
Return the supported parallel backends for this analysis.
ZenoWrapper supports all standard MDAnalysis parallel backends since each frame’s analysis is completely independent and results can be concatenated across frames.
See also
- Parallel analysis
MDAnalysis parallel analysis documentation
Notes
Even when using parallel backends, each worker will use
num_threadsfor within-frame ZENO parallelization, giving two-level parallelism.Added in version 3.0.0.
- property parallelizable
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure. Namely, if we can safely distribute
_single_frame()to multiple workers and then combine them with a proper_conclude()call. If set toFalse, no backends except forserialare supported.Note
If you want to check parallelizability of the whole class, without explicitly creating an instance of the class, see
_analysis_algorithm_is_parallelizable. Note that you setting it to other value will break things if the algorithm behind the analysis is not trivially parallelizable.- Returns:
if a given
AnalysisBasesubclass instance is parallelizable with split-apply-combine, or not- Return type:
Added in version 2.8.0.
- run(start: int = None, stop: int = None, step: int = None, frames: Iterable = None, verbose: bool = None, n_workers: int = None, n_parts: int = None, backend: str | BackendBase = None, *, unsupported_backend: bool = False, progressbar_kwargs=None)
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) – array of integers or booleans to slice trajectory;
framescan only be used instead ofstart,stop, andstep. Setting bothframesand at least one ofstart,stop,stepto a non-default value will raise aValueError.Added in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBarfor full list. Available only forbackend='serial'backend (Union[str, BackendBase], optional) – By default, performs calculations in a serial fashion. Otherwise, user can choose a backend:
stris matched to a builtin backend (one ofserial,multiprocessinganddask), or aMDAnalysis.analysis.results.BackendBasesubclass.Added in version 2.8.0.
n_workers (int) – positive integer with number of workers (processes, in case of built-in backends) to split the work between
Added in version 2.8.0.
n_parts (int, optional) – number of parts to split computations across. Can be more than number of workers.
Added in version 2.8.0.
unsupported_backend (bool, optional) – if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default False
Added in version 2.8.0.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
Changed in version 2.8.0: Introduced
backend,n_workers,n_partsandunsupported_backendkeywords, and refactored the method logic to support parallelizable execution.