Property regression example#

low level interface#

To show how the components of NFFLr work together, let’s train a formation energy model using the dft_3d dataset. We can use the periodic_radius_graph transform to configure the AtomsDataset to automatically transform atomic configurations into DGLGraphs.

import nfflr

transform = nfflr.nn.PeriodicRadiusGraph(cutoff=5.0)

dataset = nfflr.AtomsDataset(
    "dft_3d", 
    target="formation_energy_peratom", 
    transform=transform,
)
dataset[0]
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2024-03-08 20:07:11,073	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2024-03-08 20:07:11,224	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
dataset_name='dft_3d'
Obtaining 3D dataset 76k ...
Reference:https://www.nature.com/articles/s41524-020-00440-1
Other versions:https://doi.org/10.6084/m9.figshare.6815699
  0%|          | 0.00/40.8M [00:00<?, ?iB/s]
  0%|          | 52.2k/40.8M [00:00<01:45, 385kiB/s]
  1%|          | 209k/40.8M [00:00<00:49, 818kiB/s] 
  2%|▏         | 940k/40.8M [00:00<00:14, 2.83MiB/s]
  9%|▉         | 3.81M/40.8M [00:00<00:03, 9.76MiB/s]
 20%|██        | 8.37M/40.8M [00:00<00:01, 17.9MiB/s]
 33%|███▎      | 13.7M/40.8M [00:00<00:00, 27.3MiB/s]
 41%|████      | 16.6M/40.8M [00:00<00:00, 27.1MiB/s]
 54%|█████▍    | 22.1M/40.8M [00:01<00:00, 33.1MiB/s]
 64%|██████▎   | 26.0M/40.8M [00:01<00:00, 34.6MiB/s]
 74%|███████▍  | 30.3M/40.8M [00:01<00:00, 35.8MiB/s]
 84%|████████▍ | 34.4M/40.8M [00:01<00:00, 37.3MiB/s]
 95%|█████████▌| 38.8M/40.8M [00:01<00:00, 38.0MiB/s]
100%|██████████| 40.8M/40.8M [00:01<00:00, 27.3MiB/s]

Loading the zipfile...
Loading completed.
(Graph(num_nodes=8, num_edges=288,
       ndata_schemes={'coord': Scheme(shape=(3,), dtype=torch.float32), 'atomic_number': Scheme(shape=(), dtype=torch.int32)}
       edata_schemes={'r': Scheme(shape=(3,), dtype=torch.float32)}),
 tensor(-0.4276))

AtomsDataset can also load structures from the directory format that ALIGNN uses; the directory should contain a collection of POSCAR, CIF, or XYZ files and a mapping from file names to prediction targets in the file id_prop.csv.

For example, loading the small set of POSCAR files distributed with nfflr (and alignn):

import inspect
from pathlib import Path

nfflr_root = Path(inspect.getfile(nfflr)).parent

dataset = nfflr.AtomsDataset(
    nfflr_root / "examples/sample_data", 
    target="target", 
    transform=transform,
)
dataset[0]
dataset_name=PosixPath('/home/runner/work/nfflr/nfflr/nfflr/examples/sample_data')
(Graph(num_nodes=8, num_edges=288,
       ndata_schemes={'coord': Scheme(shape=(3,), dtype=torch.float32), 'atomic_number': Scheme(shape=(), dtype=torch.int32)}
       edata_schemes={'r': Scheme(shape=(3,), dtype=torch.float32)}),
 tensor(0.))

Set up a medium-sized ALIGNN model:

cfg = nfflr.models.ALIGNNConfig(
    transform=transform,
    alignn_layers=2, 
    gcn_layers=2, 
    norm="layernorm", 
    atom_features="embedding"
)
model = nfflr.models.ALIGNN(cfg)

atoms, target = dataset[0]
model(atoms)
tensor(0.2389, grad_fn=<SqueezeBackward0>)

AtomsDataset is meant to work nicely with standard pytorch DataLoaders. Because of the rich structure of the common input formats for atomistic ML, often a custom collation function is needed to properly auto-batch samples - AtomsDataset tries to automatically select an appropriate collate_fn for tasks that it knows about, which can be accessed by AtomsDataset.collate.

import numpy as np

import torch
from torch import nn
from torch.utils.data import DataLoader, SubsetRandomSampler

batchsize = 2

train_loader = DataLoader(
    dataset,
    batch_size=batchsize, 
    collate_fn=dataset.collate, 
    sampler=SubsetRandomSampler(dataset.split["train"]),
    drop_last=True
)
next(iter(train_loader))
(Graph(num_nodes=100, num_edges=2876,
       ndata_schemes={'coord': Scheme(shape=(3,), dtype=torch.float32), 'atomic_number': Scheme(shape=(), dtype=torch.int32)}
       edata_schemes={'r': Scheme(shape=(3,), dtype=torch.float32)}),
 tensor([0.9240, 4.0720]))

Now we can set up a PyTorch optimizer and objective function and optimize the model parameters with an explicit training loop. See the [PyTorch quickstart tutorial for more context)[https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html].

from tqdm import tqdm
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=0.1)

training_loss = []
for epoch in range(5):
    for step, (g, y) in enumerate(tqdm(train_loader)):
        pred = model(g)
        loss = criterion(pred, y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        training_loss.append(loss.item())
  0%|          | 0/19 [00:00<?, ?it/s]
 11%|█         | 2/19 [00:00<00:01, 16.69it/s]
 21%|██        | 4/19 [00:01<00:08,  1.85it/s]
 26%|██▋       | 5/19 [00:02<00:07,  1.81it/s]
 32%|███▏      | 6/19 [00:02<00:05,  2.23it/s]
 37%|███▋      | 7/19 [00:03<00:05,  2.38it/s]
 42%|████▏     | 8/19 [00:03<00:03,  2.91it/s]
 47%|████▋     | 9/19 [00:03<00:03,  3.05it/s]
 53%|█████▎    | 10/19 [00:04<00:04,  2.05it/s]
 58%|█████▊    | 11/19 [00:04<00:03,  2.27it/s]
 63%|██████▎   | 12/19 [00:04<00:02,  2.61it/s]
 68%|██████▊   | 13/19 [00:05<00:02,  2.75it/s]
 74%|███████▎  | 14/19 [00:06<00:02,  1.78it/s]
 79%|███████▉  | 15/19 [00:06<00:01,  2.23it/s]
 84%|████████▍ | 16/19 [00:06<00:01,  2.81it/s]
 89%|████████▉ | 17/19 [00:06<00:00,  3.31it/s]
 95%|█████████▍| 18/19 [00:07<00:00,  3.48it/s]
100%|██████████| 19/19 [00:07<00:00,  2.69it/s]

  0%|          | 0/19 [00:00<?, ?it/s]
  5%|▌         | 1/19 [00:00<00:06,  2.76it/s]
 11%|█         | 2/19 [00:00<00:03,  4.49it/s]
 16%|█▌        | 3/19 [00:00<00:04,  3.52it/s]
 21%|██        | 4/19 [00:00<00:03,  4.48it/s]
 26%|██▋       | 5/19 [00:01<00:02,  4.73it/s]
 32%|███▏      | 6/19 [00:01<00:03,  3.26it/s]
 37%|███▋      | 7/19 [00:02<00:06,  1.99it/s]
 42%|████▏     | 8/19 [00:03<00:06,  1.83it/s]
 53%|█████▎    | 10/19 [00:04<00:04,  1.81it/s]
 58%|█████▊    | 11/19 [00:04<00:03,  2.13it/s]
 63%|██████▎   | 12/19 [00:04<00:02,  2.62it/s]
 68%|██████▊   | 13/19 [00:04<00:02,  2.76it/s]
 74%|███████▎  | 14/19 [00:05<00:01,  2.71it/s]
 79%|███████▉  | 15/19 [00:05<00:01,  3.17it/s]
 84%|████████▍ | 16/19 [00:05<00:00,  3.53it/s]
 89%|████████▉ | 17/19 [00:05<00:00,  3.89it/s]
 95%|█████████▍| 18/19 [00:06<00:00,  2.25it/s]
100%|██████████| 19/19 [00:07<00:00,  2.42it/s]
100%|██████████| 19/19 [00:07<00:00,  2.64it/s]

  0%|          | 0/19 [00:00<?, ?it/s]
  5%|▌         | 1/19 [00:00<00:10,  1.77it/s]
 11%|█         | 2/19 [00:00<00:05,  2.86it/s]
 16%|█▌        | 3/19 [00:01<00:05,  3.06it/s]
 21%|██        | 4/19 [00:01<00:03,  3.83it/s]
 26%|██▋       | 5/19 [00:01<00:02,  4.82it/s]
 32%|███▏      | 6/19 [00:01<00:04,  3.05it/s]
 42%|████▏     | 8/19 [00:02<00:02,  4.75it/s]
 47%|████▋     | 9/19 [00:02<00:01,  5.17it/s]
 53%|█████▎    | 10/19 [00:03<00:03,  2.32it/s]
 58%|█████▊    | 11/19 [00:03<00:02,  2.72it/s]
 63%|██████▎   | 12/19 [00:03<00:02,  3.04it/s]
 68%|██████▊   | 13/19 [00:04<00:02,  2.01it/s]
 74%|███████▎  | 14/19 [00:04<00:01,  2.60it/s]
 84%|████████▍ | 16/19 [00:05<00:00,  3.38it/s]
 89%|████████▉ | 17/19 [00:05<00:00,  2.41it/s]
 95%|█████████▍| 18/19 [00:06<00:00,  1.95it/s]
100%|██████████| 19/19 [00:07<00:00,  1.94it/s]
100%|██████████| 19/19 [00:07<00:00,  2.63it/s]

  0%|          | 0/19 [00:00<?, ?it/s]
  5%|▌         | 1/19 [00:00<00:02,  7.07it/s]
 11%|█         | 2/19 [00:00<00:03,  5.65it/s]
 16%|█▌        | 3/19 [00:01<00:09,  1.74it/s]
 26%|██▋       | 5/19 [00:01<00:04,  2.96it/s]
 32%|███▏      | 6/19 [00:02<00:04,  3.05it/s]
 37%|███▋      | 7/19 [00:02<00:03,  3.73it/s]
 42%|████▏     | 8/19 [00:02<00:03,  3.18it/s]
 47%|████▋     | 9/19 [00:02<00:02,  3.54it/s]
 58%|█████▊    | 11/19 [00:03<00:02,  3.76it/s]
 63%|██████▎   | 12/19 [00:03<00:01,  3.62it/s]
 68%|██████▊   | 13/19 [00:03<00:01,  4.31it/s]
 74%|███████▎  | 14/19 [00:04<00:01,  3.15it/s]
 79%|███████▉  | 15/19 [00:04<00:01,  2.39it/s]
 84%|████████▍ | 16/19 [00:05<00:01,  1.82it/s]
 89%|████████▉ | 17/19 [00:06<00:01,  1.68it/s]
 95%|█████████▍| 18/19 [00:06<00:00,  1.94it/s]
100%|██████████| 19/19 [00:07<00:00,  2.30it/s]
100%|██████████| 19/19 [00:07<00:00,  2.70it/s]

  0%|          | 0/19 [00:00<?, ?it/s]
  5%|▌         | 1/19 [00:00<00:13,  1.30it/s]
 11%|█         | 2/19 [00:01<00:12,  1.37it/s]

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[5], line 10
      8 pred = model(g)
      9 loss = criterion(pred, y)
---> 10 loss.backward()
     11 optimizer.step()
     12 optimizer.zero_grad()

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    512 if has_torch_function_unary(self):
    513     return handle_torch_function(
    514         Tensor.backward,
    515         (self,),
   (...)
    520         inputs=inputs,
    521     )
--> 522 torch.autograd.backward(
    523     self, gradient, retain_graph, create_graph, inputs=inputs
    524 )

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/autograd/__init__.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    261     retain_graph = create_graph
    263 # The reason we repeat the same comment below is that
    264 # some Python versions print out the first line of a multi-line function
    265 # calls in the traceback and some print out the last line
--> 266 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    267     tensors,
    268     grad_tensors_,
    269     retain_graph,
    270     create_graph,
    271     inputs,
    272     allow_unreachable=True,
    273     accumulate_grad=True,
    274 )

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/autograd/function.py:277, in BackwardCFunction.apply(self, *args)
    276 class BackwardCFunction(_C._FunctionBase, FunctionCtx, _HookMixin):
--> 277     def apply(self, *args):
    278         # _forward_cls is defined by derived class
    279         # The user should define either backward or vjp but never both.
    280         backward_fn = self._forward_cls.backward  # type: ignore[attr-defined]
    281         vjp_fn = self._forward_cls.vjp  # type: ignore[attr-defined]

KeyboardInterrupt: 
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(training_loss)
plt.xlabel("training iteration")
plt.ylabel("loss");
plt.semilogy();
../_images/31bee70c248b1cb5b998f6ae6018eb9a5d8e4a7398deafb514aa65ac2fd07671.png

using the ignite-based NFFLr trainer#

import tempfile
from nfflr import train
rank = 0
training_config = {
    "dataset": dataset,
    "model": model,
    "optimizer": optimizer,
    "criterion": criterion,
    "random_seed": 42,
    "batch_size": 2,
    "learning_rate": 1e-3,
    "weight_decay": 0.1,
    "epochs": 5,
    "num_workers": 0,
    "progress": True,
    "output_dir": tempfile.TemporaryDirectory().name
}
train.run_train(rank, training_config)
2024-01-24 14:46:26,979 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset '<nfflr.data.dataset.': 
	{'collate_fn': <function AtomsDataset.collate_default at 0x29a06a560>, 'batch_size': 2, 'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x107b71510>, 'drop_last': True, 'num_workers': 0, 'pin_memory': False}
2024-01-24 14:46:26,979 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset '<nfflr.data.dataset.': 
	{'collate_fn': <function AtomsDataset.collate_default at 0x29a06a560>, 'batch_size': 2, 'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x30ec1da80>, 'drop_last': True, 'num_workers': 0, 'pin_memory': False}
/Users/bld/.pyenv/versions/3.10.9/envs/nfflr/lib/python3.10/site-packages/dgl/backend/pytorch/tensor.py:445: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  assert input.numel() == input.storage().size(), (
starting training loop
train results - Epoch: 1  Avg loss: 0.01
val results - Epoch: 1  Avg loss: 3.60
train results - Epoch: 2  Avg loss: 0.45
val results - Epoch: 2  Avg loss: 4.95
train results - Epoch: 3  Avg loss: 0.05
val results - Epoch: 3  Avg loss: 1.20
train results - Epoch: 4  Avg loss: 0.02
val results - Epoch: 4  Avg loss: 3.59
train results - Epoch: 5  Avg loss: 0.00
val results - Epoch: 5  Avg loss: 3.85
3.854463577270508