Round 0 (Dry Run)

Download Data Splits

Train Data

Official Data Record: https://data.nist.gov/od/id/mds2-2175

Google Drive Mirror: https://drive.google.com/open?id=14ar870Q-upsHpSiFSw0zFyZllGwP0QSL

Test Data

None

Holdout Data

None

About

This dataset consists of 200 trained image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present. Models in this dataset are expecting input tensors organized as NCHW. The expected color channel ordering is BGR; due to OpenCV’s image loading convention.

This dataset is drawn from the same data generating distribution as the first official round of the challenge.

Ground truth is included for every model in this dataset.

The Evaluation Server (ES) runs against all 200 models in this dataset. The Smoke Test Server (STS) only runs against model id-00000000.

Note: this dataset does not have the model convergence guarantees (clean, test, and example data classification accuracy >99%) that the future released datasets will have.

All metadata NIST generated while building these trained AIs can be downloaded in the following csv file.

Data Structure

  • id-00000000/ Each folder named id-<number> represents a single trained human level image classification AI model. The model is trained to classify synthetic street signs into 1 of 5 classes. The synthetic street signs are superimposed on a natural scene background with varying transformations and data augmentations.

    1. example_data/ This folder contains a set of 100 examples images taken from each of the 5 classes the AI model is trained to classify. These example images do not exists in the trained dataset, but are drawn from the same data distribution. These images are 224 x 224 x 3 stored as RGB images.

    2. ground_truth.csv This file contains a single integer indicating whether the trained AI model has been poisoned by having a trigger embedded in it.

    3. model.pt This file is the trained AI model file in PyTorch format. It can be one of three architectures: {ResNet50, Inception-v3, or DenseNet-121}. Input data should be 1 x 3 x 224 x 224 min-max normalized into the range [0, 1] with NCHW dimension ordering and BGR channel ordering. See https://github.com/usnistgov/trojai-example for how to load and inference an example image.