Overview#
input representation#
For efficient training, models should also be able to operate on preprocessed structures,
e.g. a graph neural network could implement forward(x: dgl.DGLGraph)
to allow asynchronous
construction of graph batches in a DataLoader
.
PyG input format
nfflr.Atoms
with ghost atom paddingsome other custom input representation
modeling interface#
Models should have a consistent interface, regardless of the backend!
NFFLrModel:
forward(x: Atoms) -> dict[str, torch.Tensor]
Depending on the model and the prediction task, both the input representation and output format may vary.
output representation#
Depending on the task, a model may return predictions in different formats.
Single-target tasks like scalar or vector regression or classification naturally return a single tensor
forward(x: Atoms) -> torch.Tensor
.A force-field should return a
dict[str, torch.Tensor]
:{"total_energy": torch.Tensor, "forces": torch.Tensor, "stress": torch.Tensor}
, wheretotal_energy
is a scalar,forces
is an(n_atoms, n_spatial_dimensions)
tensor, andstress
is a(n_spatial_dimensions, n_spatial_dimensions)
tensorA custom multi-target task might also return a
dict
of
ignite-based trainer#
nfflr
includes a command line utility nff
(defined in nfflr.train.trainer.py
) to simplify common training workflows.
These use py_config_runner
for flexible task, model, and optimization configuration.
Currently two commands are supported: nff train
for full training runs and nff lr
for running a learning rate finder experiment.
This training utility uses ignite’s auto-distributed functionality to transparently support data-parallel distributed training.
training command#
To launch a full training run:
nff train /path/to/config.py
The configuration file should define the task, model, and optimizer settings:
# config.py
from torch import nn
from pathlib import Path
from functools import partial
import nfflr
trainer = nfflr.train.TrainingConfig(
experiment_dir=Path(__file__).parent.resolve(),
random_seed=42,
dataloader_workers=0,
optimizer="adamw",
epochs=20,
per_device_batch_size=16,
weight_decay=1e-3,
learning_rate=3e-3,
)
# data source and task specification
criterion = nfflr.nn.MultitaskLoss(["energy", "forces"])
rcut = 4.25
transform = nfflr.nn.PeriodicRadiusGraph(cutoff=rcut)
cutoff = nfflr.nn.Cosine(rcut)
model_cfg = nfflr.models.ALIGNNFFConfig(
transform=transform,
cutoff=cutoff,
atom_features="embedding",
alignn_layers=2,
gcn_layers=2,
compute_forces=True,
energy_units="eV",
)
model = nfflr.models.ALIGNNFF(model_cfg)
# source can be json file, jarvis dataset name, or folder
# task can be "energy_and_forces" or a key in the dataset
dataset = nfflr.data.mlearn_dataset(transform=tfm)