rl-randomized-lavaworld-aug2023

Download Data Splits

Train Data

Official Data Record: https://data.nist.gov/od/id/mds2-3066

About

This dataset contains deep reinforcement learning agents. The models were trained on a Minigrid environment, specifically the MiniGrid-LavaCrossingS9N1-v0 environment. This is an environment where the agent’s objective is to reach the goal square, while navigating around a single lava crossing in a NxN grid. 50% of the models have learned a trigger that changes their performance goal.

These models are further separated into two types of architectures. A model based on a fully-connected neural net (BasicFC) and a convolutional neural net (CNNModel). These agents are trained using a Proximal Policy Optimization (PPO) algorithm.

The training dataset consists of (initially) 1 model, thought more might be released as the round goes on. The test dataset consists of 296 models. The holdout dataset consists of 296 models.

The following resources were used to train the agents.

Minigrid (https://github.com/Farama-Foundation/Minigrid):

@software{minigrid,
    author = {Chevalier-Boisvert, Maxime and Willems, Lucas and Pal, Suman},
    title = {Minimalistic Gridworld Environment for Gymnasium},
    url = {https://github.com/Farama-Foundation/Minigrid},
    year = {2018},
}

The agent architectures as well as the PPO algorithm are based on the RL-Starter-Files repository.

The PyTorch software library was used to implement the AI architectures used.

PyTorch:

@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

See https://github.com/usnistgov/trojai-example for how to load and inference an example.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 300 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 15 minutes of compute time per model.

The Smoke Test Server (STS) only runs against the first 1 models from the training dataset:

['id-00000000']

Experimental Design

Each model architecture implementation is drawn directly from the TrojAI_RL repository.

MODEL_LEVELS = ['BasicFCModel', 'CNNModel']

The architecture definitions can be found here:

Data Structure

The archive contains a set of folders named basicfc and rlstarter representing the different architectures. These are further split into clean and triggered, which split into folders for each model. Each folder contains the trained AI model file in the PyTorch format named model.pt and the ground truth of whether the model was clean/triggered, ground_truth.json.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example for how to load and inference example text.

Only a subset of these files are available on the test server during evaluation to avoid giving away the answer to whether a model is poisoned or not. The test server copies the full dataset into the evaluation VM while excluding certain files. The list of excluded files can be found at https://github.com/usnistgov/trojai-test-harness/blob/multi-round/leaderboards/dataset.py#L30.

Per-Model File List

  • Folder: 00000000/ Short description: This folder represents a single trained deep reinforcment learning agent.

    1. File: ground_truth.csv: Short description: csv containing whether or not a given agent has the triggered embedded or not. There are two boolean keys, clean and triggered to indicate how the agent was trained.

    2. File: model.pt Short description: This file is the trained DRL model file in PyTorch format.

  • File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license

Data Revisions

Revision 1 contains a single clean RL agent model.

Revision 2 contains 148 clean and 74 poisoned RL agents.