mitigation-image-classification-jun2024¶

Download Data Splits ¶

Train Data¶

Official Data Record: pending

About¶

This dataset consists of image classficiation AI models, taken from <https://pages.nist.gov/trojai/docs/image-classification-sep2022.html#image-classification-sep2022>.

Additional example data was generated based on the configuration parameters from the sep2022 round. This example data was sampled to put together a test dataset per model that consists of 20 examples per clean class and 20 examples per poisoned class.

The training dataset consists of 288 models with additional example data. The test dataset consists of 24 models.

For more details about how the models were trainedk, see <https://pages.nist.gov/trojai/docs/image-classification-sep2022.html#image-classification-sep2022>.

See https://github.com/usnistgov/trojai-example/tree/mitigation-image-classification-jun2024 for how to setup a submission for the mitigation round.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 24 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 30 minutes of compute time per model.

The Smoke Test Server (STS) runs the first 3 models from the training dataset.

We are using a “Fidelity” metric for computing how effective the mitigation strategies are. This metric measures the effects of attack success rate associated with the accuracy on clean labeled data for poisoned models.

Experimental Design¶

Each model is drawn directly from either the PyTorch or TIMM libraries.

MODEL_LEVELS = ['resnet50',
        'mobilenet_v2',
        'vit_base_patch32_224']

The architecture definitions can be found:

This dataset expands on concepts from Round 4. It includes models with much higher class counts (up to about 130 classes), which hopefully will create models with higher utilization. Additionally, {0, 1, 2, or 4} triggers have been inserted into each AI.

There are 2 trigger types: Polygon and Instagram filter type triggers.

Triggers can be conditional. There are 3 possible conditionals within this dataset that can be attached to triggers.

Spatial This only applies to polygon triggers. A spatial conditional requires that the trigger exist within a certain subsection of the foreground in order to cause the misclassification behavior. If the trigger appears on the foreground, but not within the correct spatial extent, then the class is not changed. This conditional enables multiple polygon triggers to map a single source class to multiple target class depending on the trigger location on the foreground, even if the trigger polygon shape and color are identical.
Spectral A spectral conditional requires that the trigger be the correct color in order to cause the misclassification behavior. This can apply to both polygon triggers and instagram triggers. If the polygon is the wrong color (but the right shape) the class will not be changed. Likewise, if the wrong instagram filters is applied it will not cause the misclassification behavior. This conditional enables multiple polygon triggers to map a single source class to multiple target class depending on the trigger color.
Texture A texture context requires that the trigger have the correct texture augmentation in order to cause the misclassification behavior.

This round also has significantly increased spurious triggers, where the trigger is inserted into the input, either in an invalid configuration, or in a clean model. These spurious triggers do not affect the prediction label. Ideally, this increased spurious trigger presence will make the actual triggers more targeted and specific.

All of these factors are recorded (when applicable) within the METADATA.csv file included with each dataset.

When building the dataset we split the original test dataset into two categories; “Easy” and “Hard”, across all model architectures.

An “Easy” model is classified as having less than or equal to 32 classes and 1 trigger. “Hard” models were selected by having greater than or equal to 64 classes and 2 or more triggers. We then grouped all models based on the clean/poisoned, model architecture, and easy/hard and randomly sampled 4 models for each. The final number of models ended up being 24 models.

In revision 1 of the test dataset we randomly selected 1 clean example data that is to be used for the mitigate step. In future revisions we may increase the number of examples available.

Test example data was randomly sampled from newly generated represenative example data. In all there are 20 examples per clean class and 20 examples per poisoned class.

Data Structure¶

The archive contains a set of folders named id-<number>. Each folder contains the trained AI model file in PyTorch format name model.pt, the ground truth of whether the model was poisoned ground_truth.csv and four folders: clean-example-data, poisoned-example-data, mitigate-example-data, and test-example-data. The mitigate-example-data is passed as the dataset for the mitigate step, and test-example-data is passed as the dataset for the test step.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example/tree/mitigation-image-classification-jun2024 for how to load and inference example data.

File List

Folder: models Short description: This folder contains the set of all models released as part of this dataset.
- Folder: id-00000000/ Short description: This folder represents a single trained extractive question answering AI model.
  1. Folder: clean-example-data/: Short description: This folder contains a set of 20 example images taken from the training dataset used to build this model. Clean example data is drawn from all valid classes in the dataset.
  2. Folder: poisoned-example-data/: Short description: If it exists (only applies to poisoned models), this file contains a set of 20 example images per trigger taken from the training dataset. Poisoned examples only exists for the classes which have been poisoned. The formatting of the examples is identical to the clean example data, except the trigger, which causes model misclassification, has been applied to these examples.
  3. File: config.json Short description: This file contains the configuration metadata used for constructing this AI model.
  4. File: reduced-config.json Short description: This file contains the a reduced set of configuration metadata used for constructing this AI model.
  5. File: ground_truth.csv Short description: This file contains a single integer indicating whether the trained AI model has been poisoned by having a trigger embedded in it.
  6. File: machine.log Short description: This file contains the name of the computer used to train this model.
  7. File: model.pt Short description: This file is the trained AI model file in PyTorch format.
  8. File: detailed_stats.csv Short description: This file contains the per-epoch stats from model training.
  9. File: stats.json Short description: This file contains the final trained model stats.
  10. File: trigger_#.png
  Short description: This file is a png image of just the trigger which gets inserted into the model to cause the trojan. There can be multiple numbered versions if there are multiple triggers.
  1. Folder: test-example-data
  Short description: This folder contains 20 examples per poisoned and clean classes used during the test stage of the execution.
  1. Folder: mitigate-example-data
  Short description: This folder contains 1 clean example data that is used during the mitigation step of the execution.
  1. File: test_example_data_lookup.json
  Short description: This file contains the ground truth for the test-example-data, used during metric calculation.
.
- Folder: id-<number>/ <see above>
File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license
File: METADATA.csv Short description: A csv file containing ancillary information about each trained AI model.
File: METADATA_DICTIONARY.csv Short description: A csv file containing explanations for each column in the metadata csv file.

Data Revisions¶

Train Dataset Revision 1 contains only 1 random clean example data in the mitigate-example-data.