rl-colorful-memory-sep2024

Download Data Splits

Train Data

Official Data Record: pending

About

This dataset contains deep reinforcement learning agents.

TrojAI Colorful Memory Model Generation

The models were trained with a Python package using the torch-ac deep-reinforcement learning library to generate clean and trojaned neural models intended to solve the Colorful Memory task for TrojAI. We note the following code bases upon which this code was heavily inspired:

Minigrid: The Colorful Memory task is a modified version of the Memory task in Minigrid.

rl-starter-files: The code used here to train neural models is based on rl-starter-files

The Colorful Memory Environment

The environment consists of a room with an object in it, a hallway ending in a T, and two different objects at the end of each path of the T intersection. One object will be the object in the room, and the other will not. At the beginning of the episode, an object is chosen randomly and placed in the room with the agent. The goal of the agent is to go down the hallway and step on the same object that was in the room.

The reason this is challenging for a DRL agent is because the agent cannot observe the object in the room while choosing an object at the end of the hallway, so it must maintain a memory of the current episode to make the correct choice.

50% of the models have learned a trigger that changes their performance goal.

Model types

The model architecture has the basic structure:

  1. Convolutional neural network (CNN)

  2. Gated recurrent unit (GRU)

  3. Fully connected layers (FC), one for the actor, and one for the critic

These models are further separated into two types.

Small architecture:

  • gru_model_channels = (16, 32, 64)

  • gru_model_actor_linear_mid_dims = (64,)

  • gru_model_critic_linear_mid_dims = (64,)

  • gru_model_hidden_shape = (64, 64)

  • gru_model_n_layers = (2, 2)

Large architecture:

  • gru_model_channels = (16, 32, 64)

  • gru_model_actor_linear_mid_dims = (32, 32)

  • gru_model_critic_linear_mid_dims = (32, 32)

  • gru_model_hidden_shape = (256, 256)

  • gru_model_n_layers = (2, 2)

See https://github.com/usnistgov/trojai-example for how to load and inference an example.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 48 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 15 minutes of compute time per model.

The Smoke Test Server (STS) only runs against the first 1 clean and 1 triggered models from the training dataset:

['id-00000000', 'id-00000024']

Experimental Design

Each model architecture implementation is drawn directly from the TrojAI_RL repository.

MODEL_LEVELS = ['Small', 'Large']

The architecture definitions can be found here:

Data Structure

The archive contains a set of folders named basicfc and rlstarter representing the different architectures. These are further split into clean and triggered, which split into folders for each model. Each folder contains the trained AI model file in the PyTorch format named model.pt and the ground truth of whether the model was clean/triggered, ground_truth.json.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example for how to load and inference example text.

Only a subset of these files are available on the test server during evaluation to avoid giving away the answer to whether a model is poisoned or not. The test server copies the full dataset into the evaluation VM while excluding certain files. The list of excluded files can be found at https://github.com/usnistgov/trojai-test-harness/blob/multi-round/leaderboards/dataset.py#L30.

Per-Model File List

  • Folder: 00000000/ Short description: This folder represents a single trained deep reinforcment learning agent.

    1. File: ground_truth.csv: Short description: csv containing whether or not a given agent has the triggered embedded or not. There are two boolean keys, clean and triggered to indicate how the agent was trained.

    2. File: model.pt Short description: This file is the trained DRL model file in PyTorch format.

  • File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license

Data Revisions

Revision 1 contains 24 clean and 24 poisoned RL agents.