rl-safetygymnasium-oct2024

Download Data Splits

Train Data

Official Data Record: pending

About

This dataset contains deep reinforcement learning agents.

TrojAI Safety Gymnasiun Model Generation

Here we provide an RL environment built on Safety Gymnasium that allows for policy, a training algorithm (multitask OPAC^2), and tools to configure and orchestrate training runs. The gentle folder contains source code for the RL algorithm and tools to build and wrap Safety Gymnasium. Our algorithm specifically is located in gentle > rl > opac2_multitask.py

The Safety Gymnasium Environment

In this environment, an agent and two targets are randomly placed in a scene. The agent’s goal is to reach the green target without touching the red target.

The scene also contains a number of small entities (teal cubes) that wander aimlessly. These may obstruct the agent slightly but there is no penalty for interacting with them.

The agent’s observations come from a multi-channel planar lidar. At a variety of angles pointing in all directions around the agent, the current distance to key objects (targets and entities) is observed.

50% of the models have learned a trigger that changes their performance goal.

Model types

The model architecture has two basic parameters:

  1. Network Depth (number of hidden layers)

  2. Network Width

The following values of these parameters may be selected.

Network Depth:

  • “small” : two hidden layers (1 NxN transform)

  • “default”: three hidden layers (2 NxN transforms)

  • “large” : five hidden layers (4 NxN transforms)

Network Width:

  • “small” : each hidden layer has width 181

  • “default”: each hidden layer has width 256

  • “large” : each hidden layer has width 362

Depth and width are designed such that scaling either independently will result in similar parameter counts in the hidden layers:

DEPTH (layers):

small (2)

default (3)

large (5)

WIDTH (size):

small (181)

33K

65K

131K

default (256)

65K

131K

262K

large (362)

131K

262K

524K

See https://github.com/usnistgov/trojai-example for how to load and inference an example.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 80 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 15 minutes of compute time per model.

The Smoke Test Server (STS) only runs against the first 1 clean and 1 triggered models from the training dataset:

['id-00000000', 'id-00000040']

Experimental Design

Each model architecture implementation is drawn directly from the TrojAI_RL repository.

The network architecture is selected from all combinations of ‘small’ and ‘default’ for width and depth.

NETWORK_DEPTH = ['small', 'default']
NETWORK_WITDH = ['small', 'default']

In addition, a number of distractor entities is also selected.

NUM_ENTITIES = [2, 4]

Data Structure

The archive contains a set of folders named basicfc and rlstarter representing the different architectures. These are further split into clean and triggered, which split into folders for each model. Each folder contains the trained AI model file in the PyTorch format named model.pt and the ground truth of whether the model was clean/triggered, ground_truth.json.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example for how to load and inference example text.

Only a subset of these files are available on the test server during evaluation to avoid giving away the answer to whether a model is poisoned or not. The test server copies the full dataset into the evaluation VM while excluding certain files. The list of excluded files can be found at https://github.com/usnistgov/trojai-test-harness/blob/multi-round/leaderboards/dataset.py#L30.

Per-Model File List

  • Folder: 00000000/ Short description: This folder represents a single trained deep reinforcment learning agent.

    1. File: ground_truth.csv: Short description: csv containing whether or not a given agent has the triggered embedded or not. There are two boolean keys, clean and triggered to indicate how the agent was trained.

    2. File: model.pt Short description: This file is the trained DRL model file in PyTorch format.

  • File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license

Data Revisions

Revision 1 contains 40 clean and 40 poisoned RL agents.