Property regression example#

low level interface#

To show how the components of NFFLr work together, let’s train a formation energy model using the dft_3d dataset. We can use the periodic_radius_graph transform to configure the AtomsDataset to automatically transform atomic configurations into DGLGraphs.

import nfflr

transform = nfflr.nn.PeriodicRadiusGraph(cutoff=5.0)

dataset = nfflr.AtomsDataset(
    "dft_3d", 
    target="formation_energy_peratom", 
    transform=transform,
)
dataset[0]

/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

2024-03-08 20:07:11,073	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

2024-03-08 20:07:11,224	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

dataset_name='dft_3d'
Obtaining 3D dataset 76k ...
Reference:https://www.nature.com/articles/s41524-020-00440-1
Other versions:https://doi.org/10.6084/m9.figshare.6815699

  0%|          | 0.00/40.8M [00:00<?, ?iB/s]

  0%|          | 52.2k/40.8M [00:00<01:45, 385kiB/s]

  1%|          | 209k/40.8M [00:00<00:49, 818kiB/s]

  2%|▏         | 940k/40.8M [00:00<00:14, 2.83MiB/s]

  9%|▉         | 3.81M/40.8M [00:00<00:03, 9.76MiB/s]

 20%|██        | 8.37M/40.8M [00:00<00:01, 17.9MiB/s]

 33%|███▎      | 13.7M/40.8M [00:00<00:00, 27.3MiB/s]

 41%|████      | 16.6M/40.8M [00:00<00:00, 27.1MiB/s]

 54%|█████▍    | 22.1M/40.8M [00:01<00:00, 33.1MiB/s]

 64%|██████▎   | 26.0M/40.8M [00:01<00:00, 34.6MiB/s]

 74%|███████▍  | 30.3M/40.8M [00:01<00:00, 35.8MiB/s]

 84%|████████▍ | 34.4M/40.8M [00:01<00:00, 37.3MiB/s]

 95%|█████████▌| 38.8M/40.8M [00:01<00:00, 38.0MiB/s]

100%|██████████| 40.8M/40.8M [00:01<00:00, 27.3MiB/s]

Loading the zipfile...

Loading completed.

(Graph(num_nodes=8, num_edges=288,
       ndata_schemes={'coord': Scheme(shape=(3,), dtype=torch.float32), 'atomic_number': Scheme(shape=(), dtype=torch.int32)}
       edata_schemes={'r': Scheme(shape=(3,), dtype=torch.float32)}),
 tensor(-0.4276))

AtomsDataset can also load structures from the directory format that ALIGNN uses; the directory should contain a collection of POSCAR, CIF, or XYZ files and a mapping from file names to prediction targets in the file id_prop.csv.

For example, loading the small set of POSCAR files distributed with nfflr (and alignn):

import inspect
from pathlib import Path

nfflr_root = Path(inspect.getfile(nfflr)).parent

dataset = nfflr.AtomsDataset(
    nfflr_root / "examples/sample_data", 
    target="target", 
    transform=transform,
)
dataset[0]

dataset_name=PosixPath('/home/runner/work/nfflr/nfflr/nfflr/examples/sample_data')

(Graph(num_nodes=8, num_edges=288,
       ndata_schemes={'coord': Scheme(shape=(3,), dtype=torch.float32), 'atomic_number': Scheme(shape=(), dtype=torch.int32)}
       edata_schemes={'r': Scheme(shape=(3,), dtype=torch.float32)}),
 tensor(0.))

Set up a medium-sized ALIGNN model:

cfg = nfflr.models.ALIGNNConfig(
    transform=transform,
    alignn_layers=2, 
    gcn_layers=2, 
    norm="layernorm", 
    atom_features="embedding"
)
model = nfflr.models.ALIGNN(cfg)

atoms, target = dataset[0]
model(atoms)

tensor(0.2389, grad_fn=<SqueezeBackward0>)

AtomsDataset is meant to work nicely with standard pytorch DataLoaders. Because of the rich structure of the common input formats for atomistic ML, often a custom collation function is needed to properly auto-batch samples - AtomsDataset tries to automatically select an appropriate collate_fn for tasks that it knows about, which can be accessed by AtomsDataset.collate.

import numpy as np

import torch
from torch import nn
from torch.utils.data import DataLoader, SubsetRandomSampler

batchsize = 2

train_loader = DataLoader(
    dataset,
    batch_size=batchsize, 
    collate_fn=dataset.collate, 
    sampler=SubsetRandomSampler(dataset.split["train"]),
    drop_last=True
)
next(iter(train_loader))

(Graph(num_nodes=100, num_edges=2876,
       ndata_schemes={'coord': Scheme(shape=(3,), dtype=torch.float32), 'atomic_number': Scheme(shape=(), dtype=torch.int32)}
       edata_schemes={'r': Scheme(shape=(3,), dtype=torch.float32)}),
 tensor([0.9240, 4.0720]))

Now we can set up a PyTorch optimizer and objective function and optimize the model parameters with an explicit training loop. See the [PyTorch quickstart tutorial for more context)[https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html].

from tqdm import tqdm
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=0.1)

training_loss = []
for epoch in range(5):
    for step, (g, y) in enumerate(tqdm(train_loader)):
        pred = model(g)
        loss = criterion(pred, y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        training_loss.append(loss.item())

  0%|          | 0/19 [00:00<?, ?it/s]

 11%|█         | 2/19 [00:00<00:01, 16.69it/s]

 21%|██        | 4/19 [00:01<00:08,  1.85it/s]

 26%|██▋       | 5/19 [00:02<00:07,  1.81it/s]

 32%|███▏      | 6/19 [00:02<00:05,  2.23it/s]

 37%|███▋      | 7/19 [00:03<00:05,  2.38it/s]

 42%|████▏     | 8/19 [00:03<00:03,  2.91it/s]

 47%|████▋     | 9/19 [00:03<00:03,  3.05it/s]

 53%|█████▎    | 10/19 [00:04<00:04,  2.05it/s]

 58%|█████▊    | 11/19 [00:04<00:03,  2.27it/s]

 63%|██████▎   | 12/19 [00:04<00:02,  2.61it/s]

 68%|██████▊   | 13/19 [00:05<00:02,  2.75it/s]

 74%|███████▎  | 14/19 [00:06<00:02,  1.78it/s]

 79%|███████▉  | 15/19 [00:06<00:01,  2.23it/s]

 84%|████████▍ | 16/19 [00:06<00:01,  2.81it/s]

 89%|████████▉ | 17/19 [00:06<00:00,  3.31it/s]

 95%|█████████▍| 18/19 [00:07<00:00,  3.48it/s]

100%|██████████| 19/19 [00:07<00:00,  2.69it/s]

  0%|          | 0/19 [00:00<?, ?it/s]

  5%|▌         | 1/19 [00:00<00:06,  2.76it/s]

 11%|█         | 2/19 [00:00<00:03,  4.49it/s]

 16%|█▌        | 3/19 [00:00<00:04,  3.52it/s]

 21%|██        | 4/19 [00:00<00:03,  4.48it/s]

 26%|██▋       | 5/19 [00:01<00:02,  4.73it/s]

 32%|███▏      | 6/19 [00:01<00:03,  3.26it/s]

 37%|███▋      | 7/19 [00:02<00:06,  1.99it/s]

 42%|████▏     | 8/19 [00:03<00:06,  1.83it/s]

 53%|█████▎    | 10/19 [00:04<00:04,  1.81it/s]

 58%|█████▊    | 11/19 [00:04<00:03,  2.13it/s]

 63%|██████▎   | 12/19 [00:04<00:02,  2.62it/s]

 68%|██████▊   | 13/19 [00:04<00:02,  2.76it/s]

 74%|███████▎  | 14/19 [00:05<00:01,  2.71it/s]

 79%|███████▉  | 15/19 [00:05<00:01,  3.17it/s]

 84%|████████▍ | 16/19 [00:05<00:00,  3.53it/s]

 89%|████████▉ | 17/19 [00:05<00:00,  3.89it/s]

 95%|█████████▍| 18/19 [00:06<00:00,  2.25it/s]

100%|██████████| 19/19 [00:07<00:00,  2.42it/s]

100%|██████████| 19/19 [00:07<00:00,  2.64it/s]

  0%|          | 0/19 [00:00<?, ?it/s]

  5%|▌         | 1/19 [00:00<00:10,  1.77it/s]

 11%|█         | 2/19 [00:00<00:05,  2.86it/s]

 16%|█▌        | 3/19 [00:01<00:05,  3.06it/s]

 21%|██        | 4/19 [00:01<00:03,  3.83it/s]

 26%|██▋       | 5/19 [00:01<00:02,  4.82it/s]

 32%|███▏      | 6/19 [00:01<00:04,  3.05it/s]

 42%|████▏     | 8/19 [00:02<00:02,  4.75it/s]

 47%|████▋     | 9/19 [00:02<00:01,  5.17it/s]

 53%|█████▎    | 10/19 [00:03<00:03,  2.32it/s]

 58%|█████▊    | 11/19 [00:03<00:02,  2.72it/s]

 63%|██████▎   | 12/19 [00:03<00:02,  3.04it/s]

 68%|██████▊   | 13/19 [00:04<00:02,  2.01it/s]

 74%|███████▎  | 14/19 [00:04<00:01,  2.60it/s]

 84%|████████▍ | 16/19 [00:05<00:00,  3.38it/s]

 89%|████████▉ | 17/19 [00:05<00:00,  2.41it/s]

 95%|█████████▍| 18/19 [00:06<00:00,  1.95it/s]

100%|██████████| 19/19 [00:07<00:00,  1.94it/s]

100%|██████████| 19/19 [00:07<00:00,  2.63it/s]

  0%|          | 0/19 [00:00<?, ?it/s]

  5%|▌         | 1/19 [00:00<00:02,  7.07it/s]

 11%|█         | 2/19 [00:00<00:03,  5.65it/s]

 16%|█▌        | 3/19 [00:01<00:09,  1.74it/s]

 26%|██▋       | 5/19 [00:01<00:04,  2.96it/s]

 32%|███▏      | 6/19 [00:02<00:04,  3.05it/s]

 37%|███▋      | 7/19 [00:02<00:03,  3.73it/s]

 42%|████▏     | 8/19 [00:02<00:03,  3.18it/s]

 47%|████▋     | 9/19 [00:02<00:02,  3.54it/s]

 58%|█████▊    | 11/19 [00:03<00:02,  3.76it/s]

 63%|██████▎   | 12/19 [00:03<00:01,  3.62it/s]

 68%|██████▊   | 13/19 [00:03<00:01,  4.31it/s]

 74%|███████▎  | 14/19 [00:04<00:01,  3.15it/s]

 79%|███████▉  | 15/19 [00:04<00:01,  2.39it/s]

 84%|████████▍ | 16/19 [00:05<00:01,  1.82it/s]

 89%|████████▉ | 17/19 [00:06<00:01,  1.68it/s]

 95%|█████████▍| 18/19 [00:06<00:00,  1.94it/s]

100%|██████████| 19/19 [00:07<00:00,  2.30it/s]

100%|██████████| 19/19 [00:07<00:00,  2.70it/s]

  0%|          | 0/19 [00:00<?, ?it/s]

  5%|▌         | 1/19 [00:00<00:13,  1.30it/s]

 11%|█         | 2/19 [00:01<00:12,  1.37it/s]

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[5], line 10
pred = model(g)
loss = criterion(pred, y)
---> 10 loss.backward()
optimizer.step()
optimizer.zero_grad()

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
if has_torch_function_unary(self):
   return handle_torch_function(
       Tensor.backward,
       (self,),
   (...)
       inputs=inputs,
   )
--> 522 torch.autograd.backward(
   self, gradient, retain_graph, create_graph, inputs=inputs
)

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/autograd/__init__.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
   retain_graph = create_graph
# The reason we repeat the same comment below is that
# some Python versions print out the first line of a multi-line function
# calls in the traceback and some print out the last line
--> 266 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
   tensors,
   grad_tensors_,
   retain_graph,
   create_graph,
   inputs,
   allow_unreachable=True,
   accumulate_grad=True,
)

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/autograd/function.py:277, in BackwardCFunction.apply(self, *args)
class BackwardCFunction(_C._FunctionBase, FunctionCtx, _HookMixin):
--> 277     def apply(self, *args):
       # _forward_cls is defined by derived class
       # The user should define either backward or vjp but never both.
       backward_fn = self._forward_cls.backward  # type: ignore[attr-defined]
       vjp_fn = self._forward_cls.vjp  # type: ignore[attr-defined]

KeyboardInterrupt: 

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(training_loss)
plt.xlabel("training iteration")
plt.ylabel("loss");
plt.semilogy();

../_images/31bee70c248b1cb5b998f6ae6018eb9a5d8e4a7398deafb514aa65ac2fd07671.png

using the ignite-based NFFLr trainer#

import tempfile
from nfflr import train

rank = 0
training_config = {
    "dataset": dataset,
    "model": model,
    "optimizer": optimizer,
    "criterion": criterion,
    "random_seed": 42,
    "batch_size": 2,
    "learning_rate": 1e-3,
    "weight_decay": 0.1,
    "epochs": 5,
    "num_workers": 0,
    "progress": True,
    "output_dir": tempfile.TemporaryDirectory().name
}
train.run_train(rank, training_config)

2024-01-24 14:46:26,979 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset '<nfflr.data.dataset.': 
	{'collate_fn': <function AtomsDataset.collate_default at 0x29a06a560>, 'batch_size': 2, 'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x107b71510>, 'drop_last': True, 'num_workers': 0, 'pin_memory': False}
2024-01-24 14:46:26,979 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset '<nfflr.data.dataset.': 
	{'collate_fn': <function AtomsDataset.collate_default at 0x29a06a560>, 'batch_size': 2, 'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x30ec1da80>, 'drop_last': True, 'num_workers': 0, 'pin_memory': False}
/Users/bld/.pyenv/versions/3.10.9/envs/nfflr/lib/python3.10/site-packages/dgl/backend/pytorch/tensor.py:445: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  assert input.numel() == input.storage().size(), (

starting training loop

train results - Epoch: 1  Avg loss: 0.01

val results - Epoch: 1  Avg loss: 3.60

train results - Epoch: 2  Avg loss: 0.45

val results - Epoch: 2  Avg loss: 4.95

train results - Epoch: 3  Avg loss: 0.05

val results - Epoch: 3  Avg loss: 1.20

train results - Epoch: 4  Avg loss: 0.02

val results - Epoch: 4  Avg loss: 3.59

train results - Epoch: 5  Avg loss: 0.00

val results - Epoch: 5  Avg loss: 3.85

3.854463577270508

Property regression example

Contents

Property regression example#

low level interface#

using the ignite-based NFFLr trainer#