rl-safetygymnasium-oct2024¶
Download Data Splits¶
Train Data¶
Official Data Record: pending
About¶
This dataset contains deep reinforcement learning agents.
TrojAI Safety Gymnasiun Model Generation¶
Here we provide an RL environment built on Safety Gymnasium that allows for policy, a training algorithm (multitask OPAC^2), and tools to configure and orchestrate training runs. The gentle
folder contains source code for the RL algorithm and tools to build and wrap Safety Gymnasium. Our algorithm specifically is located in gentle > rl > opac2_multitask.py
The Safety Gymnasium Environment¶
In this environment, an agent and two targets are randomly placed in a scene. The agent’s goal is to reach the green target without touching the red target.
The scene also contains a number of small entities (teal cubes) that wander aimlessly. These may obstruct the agent slightly but there is no penalty for interacting with them.
The agent’s observations come from a multi-channel planar lidar. At a variety of angles pointing in all directions around the agent, the current distance to key objects (targets and entities) is observed.
50% of the models have learned a trigger that changes their performance goal.
Model types¶
The model architecture has two basic parameters:
Network Depth (number of hidden layers)
Network Width
The following values of these parameters may be selected.
Network Depth:
“small” : two hidden layers (1 NxN transform)
“default”: three hidden layers (2 NxN transforms)
“large” : five hidden layers (4 NxN transforms)
Network Width:
“small” : each hidden layer has width 181
“default”: each hidden layer has width 256
“large” : each hidden layer has width 362
Depth and width are designed such that scaling either independently will result in similar parameter counts in the hidden layers:
DEPTH (layers): |
small (2) |
default (3) |
large (5) |
---|---|---|---|
WIDTH (size): |
|||
small (181) |
33K |
65K |
131K |
default (256) |
65K |
131K |
262K |
large (362) |
131K |
262K |
524K |
See https://github.com/usnistgov/trojai-example for how to load and inference an example.
The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 80 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 15 minutes of compute time per model.
The Smoke Test Server (STS) only runs against the first 1 clean and 1 triggered models from the training dataset:
['id-00000000', 'id-00000040']
Experimental Design¶
Each model architecture implementation is drawn directly from the TrojAI_RL repository.
The network architecture is selected from all combinations of ‘small’ and ‘default’ for width and depth.
NETWORK_DEPTH = ['small', 'default']
NETWORK_WITDH = ['small', 'default']
In addition, a number of distractor entities is also selected.
NUM_ENTITIES = [2, 4]
Data Structure¶
The archive contains a set of folders named basicfc
and rlstarter
representing the different architectures. These are further split into clean
and triggered
, which split into folders for each model. Each folder contains the trained AI model file in the PyTorch format named model.pt
and the ground truth of whether the model was clean/triggered, ground_truth.json
.
See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.
See https://github.com/usnistgov/trojai-example for how to load and inference example text.
Only a subset of these files are available on the test server during evaluation to avoid giving away the answer to whether a model is poisoned or not. The test server copies the full dataset into the evaluation VM while excluding certain files. The list of excluded files can be found at https://github.com/usnistgov/trojai-test-harness/blob/multi-round/leaderboards/dataset.py#L30.
Per-Model File List
Folder:
00000000/
Short description: This folder represents a single trained deep reinforcment learning agent.File:
ground_truth.csv
: Short description: csv containing whether or not a given agent has the triggered embedded or not. There are two boolean keys, clean and triggered to indicate how the agent was trained.File:
model.pt
Short description: This file is the trained DRL model file in PyTorch format.
File:
DATA_LICENCE.txt
Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license
Data Revisions¶
Revision 1 contains 40 clean and 40 poisoned RL agents.