image-classification-feb2021

Round 4

Download Data Splits

Train Data

Official Data Record: https://data.nist.gov/od/id/mds2-2345

Test Data

Official Data Record: https://data.nist.gov/od/id/mds2-2371

Holdout Data

Official Data Record: https://data.nist.gov/od/id/mds2-2372

About

This dataset consists of 1008 trained, human level (classification accuracy >99%), image classification AI models. The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present. Model input data should be 1 x 3 x 224 x 224 by dividing the input RGB images by 255 into the range [0, 1] with NCHW dimension ordering and RGB channel ordering. Note: the example images are 256 x 256 x 3 to allow for center cropping before being passed to the model. See https://github.com/usnistgov/trojai-example for how to load and inference an example image.

The Evaluation Server (ES) runs against all different dataset of 288 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which not available for download until after the round closes. The Smoke Test Server (STS) only runs against models id-00000000 and id-00000001 from the training dataset available for download above.

Round4 Anaconda3 python environment

Experimental Design

The Round4 experimental design targets subtler triggers in addition the the usual ratcheting up of the difficulty. General difficulty increases come from a reduction in the number of example images and higher class counts per model.

The major changes revolve around how triggers are defined and embedded. Unlike all previous rounds, round4 can have multiple concurrent triggers. Additionally, triggers can now have conditions attached to their firing.

First, all triggers in this round are one to one mappings, i.e. a single source class poisoned to a single target class. Within each trained AI model there can be {0, 1, or 2} one-to-one triggers. For example, a model can have two distinct triggers, one mapping class 2 to class 3, and another mapping class 5 to class 1. Additionally, there is the potential for a special configuration where a pair of one-to-one triggers share a source class. In other words, mapping class 2 to class 3 with a blue square trigger, and mapping class 2 to class 4 with a red square trigger. The triggers are guaranteed to visually unique.

Second, triggers can be conditional. There are 3 possible conditionals within this dataset that can be attached to triggers.

  1. Spatial This only applies to polygon triggers. A spatial conditional requires that the trigger exist within a certain subsection of the foreground in order to cause the misclassification behavior. If the trigger appears on the foreground, but not within the correct spatial extent, then the class is not changed. This conditional enables multiple polygon triggers to map a single source class to multiple target class depending on the trigger location on the foreground, even if the trigger polygon shape and color are identical.

  2. Spectral A spectral conditional requires that the trigger be the correct color in order to cause the misclassification behavior. This can apply to both polygon triggers and instagram triggers. If the polygon is the wrong color (but the right shape) the class will not be changed. Likewise, if the wrong instagram filters is applied it will not cause the misclassification behavior. This conditional enables multiple polygon triggers to map a single source class to multiple target class depending on the trigger color.

  3. Class A class context requires that the trigger be placed on the correct class in order to cause the misclassification behavior. The correct trigger, placed on the wrong class will not cause the class label to change.

The overall effect of these conditionals is spurious triggers which do not cause any class change can exist within the models. Additionally, polygon and instagram triggers can co-exists within the same trained AI model.

Similar to Round 3, two different Adversarial Training approaches were used:

  1. Projected Gradient Descent (PGD)

  2. Fast is Better than Free (FBF):

    @article{wong2020fast,
      title={Fast is better than free: Revisiting adversarial training},
      author={Wong, Eric and Rice, Leslie and Kolter, J Zico},
      journal={arXiv preprint arXiv:2001.03994},
      year={2020}
    }
    

The Adversarial Training factors are organized as follows:

  1. The algorithm has two levels {PGD, FBF}

    • The PGD eps per iteration is fixed at eps_iter = 2.0 * adv_eps / iteration_count

    • The FBF alpha is fixed at alpha = 1.2 * adv_eps

  2. The adversarial training eps level (i.e. how strong of an attack is being made)

    • 3 levels {4.0/255.0, 8.0/255.0, 16.0/255.0}

  3. The adversarial training ratio (i.e. what percentage of the batches are attacked)

    • 2 levels {0.1, 0.3}

  4. The number of iterations used in PGD attacks

    • 4 levels {1, 3, 7}

Finally, the very large model architectures have been removed to reduce the training time required to build the datasets.

The following AI model architectures are used within Round4

MODEL_NAMES = ["resnet18","resnet34","resnet50","resnet101",
               "wide_resnet50", "densenet121",
               "inceptionv1(googlenet)","inceptionv3",
               "squeezenetv1_0","squeezenetv1_1","mobilenetv2",
               "shufflenet1_0","shufflenet1_5","shufflenet2_0",
               "vgg11_bn", "vgg13_bn"]

All of these factors are recorded (when applicable) within the METADATA.csv file included with each dataset. Some factors don’t make sense to record at the AI model level. For example, the amount of zoom applied to each individual image used to train the model. Other factors do apply at the AI model level and are recorded. For example, the image dataset used as the source of image backgrounds.

Data Structure

The archive contains a set of folders named id-<number>. Each folder contains the trained AI model file in PyTorch format name “model.pt”, the ground truth of whether the model was poisoned “ground_truth.csv” and a folder of example images per class the AI was trained to classify.

The trained AI models expect NCHW dimension normalized to [0, 1] color image input data. For example, an RGB image of size 224 x 224 x 3 on disk needs to be read, transposed into 1 x 3 x 224 x 224, and normalized (by dividing by 255) into the range [0, 1] inclusive. See https://github.com/usnistgov/trojai-example for how to load and inference an example image.

Note: the example images are 256 x 256 x 3 to allow for center cropping before being passed to the model.

  • Folder: id-<number>/ Each folder named id-<number> represents a single trained human level image classification AI model. The model is trained to classify synthetic street signs into between 15 and 45 classes. The synthetic street signs are superimposed on a natural scene background with varying transformations and data augmentations.

    • Folder: clean_example_data/ This folder contains a set of between 2 and 5 examples images taken from each of the classes the AI model is trained to classify. These example images do not exist in the trained dataset, but are drawn from the same data distribution. Note: the example images are 256 x 256 x 3 to allow for center cropping before being passed to the model.

    • Folder: poisoned_example_data/ If it exists (only applies to poisoned models), this folder contains a set of between 10 and 20 examples images taken from each of the classes the AI model is trained to classify. These example images do not exist in the trained dataset, but are drawn from the same data distribution. Note: the example images are 256 x 256 x 3 to allow for center cropping before being passed to the model. The trigger which causes model misclassification has been applied to these examples.

    • Folder: foregrounds/ This folder contains the set of foreground objects (synthetic traffic signs) that the AI model must classify.

    • File: trigger_*.png These file(s) contains the trigger object(s) (if applicable) that have been inserted into the AI model. If multiple polygon triggers have been inserted there will be multiple trigger files.

    • File: config.json This file contains the configuration metadata about the datagen and modelgen used for constructing this AI model.

    • File: clean-example-accuracy.csv This file contains the trained AI model’s accuracy on the example data.

    • File: clean-example-logits.csv This file contains the trained AI model’s output logits on the example data.

    • File: poisoned-example-accuracy.csv If it exists (only applies to poisoned models), this file contains the trained AI model’s accuracy on the example data.

    • File: poisoned-example-logits.csv If it exists (only applies to poisoned models), this file contains the trained AI model’s output logits on the example data.

    • File: ground_truth.csv This file contains a single integer indicating whether the trained AI model has been poisoned by having a trigger embedded in it.

    • File: model.pt This file is the trained AI model file in PyTorch format.

    • File: model_detailed_stats.csv This file contains the per-epoch stats from model training.

    • File: model_stats.json This file contains the final trained model stats.

  • File: DATA_LICENCE.txt The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license

  • File: METADATA.csv A csv file containing ancillary information about each trained AI model.

  • File: METADATA_DICTIONARY.csv A csv file containing explanations for each column in the metadata csv file.