object-detection-feb2023¶

Round 13¶

Download Data Splits ¶

Train Data¶

Official Data Record: https://data.nist.gov/od/id/mds2-2959

About¶

The training dataset consists of 128 models. The test dataset consists of 192 models. The holdout dataset consists of 192 models.

Round 13 covers Object Detection AI models. The models were trained a few different datasets.

Source Datasets¶

Synthetically created image data of non-real traffic signs superimposed on road background scenes. Similar to the image-classification rounds of TrojAI, the synthetic data generation constructs images by combining background images with foreground objects. The following datasets were used as background images for the synthetic data generation.

Cityscapes (https://www.cityscapes-dataset.com/downloads/):

@inproceedings{Cordts2016Cityscapes,
  title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
  author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
  booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016}
}

GTA5 (https://download.visinf.tu-darmstadt.de/data/from_games/):

@InProceedings{Richter_2016_ECCV,
  author = {Stephan R. Richter and Vibhav Vineet and Stefan Roth and Vladlen Koltun},
  title = {Playing for Data: {G}round Truth from Computer Games},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2016},
  editor = {Bastian Leibe and Jiri Matas and Nicu Sebe and Max Welling},
  series = {LNCS},
  volume = {9906},
  publisher = {Springer International Publishing},
  pages = {102--118}
}

DOTA_v2 in addition to the synthetic data, the arial image dataset DOTA_v2 was used.

DOTA_v2 (https://captain-whu.github.io/DOTA/index.html):

@ARTICLE{9560031,
  author={Ding, Jian and Xue, Nan and Xia, Gui-Song and Bai, Xiang and Yang, Wen and Yang, Michael and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  title={Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges},
  year={2021},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TPAMI.2021.3117983}
}

Model Architectures¶

There are three model architectures present in this dataset.

The single stage detector archetype is represented by “SSD: Single shot multibox detector”.

https://pytorch.org/vision/master/models/ssd.html

@inproceedings{liu2016ssd,
  title={Ssd: Single shot multibox detector},
  author={Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C},
  booktitle={European conference on computer vision},
  pages={21--37},
  year={2016},
  organization={Springer}
}

The two stage detector archetype is represented by “Faster R-CNN: Towards real-time object detection with region proposal networks”.

https://pytorch.org/vision/master/models/faster_rcnn.html

@article{ren2015faster,
  title={Faster r-cnn: Towards real-time object detection with region proposal networks},
  author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
  journal={Advances in neural information processing systems},
  volume={28},
  year={2015}
}

The transformer detector archetype is represented by “End-to-End Object Detection with Transformers”.

https://huggingface.co/docs/transformers/main/en/model_doc/detr#transformers.DetrForObjectDetection

@inproceedings{carion2020end,
  title={End-to-end object detection with transformers},
  author={Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
  booktitle={Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part I 16},
  pages={213--229},
  year={2020},
  organization={Springer}
}

The PyTorch and HuggingFace software libraries was used as both for its implementations of the AI architectures used in this dataset as well as the for the pre-trained models which it provides.

PyTorch:

@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

See https://github.com/usnistgov/trojai-example for how to load and inference an example.

The Test Server evaluates submissions against a sequestered dataset of 192 models. The Test Server runs against the sequestered test dataset which is not available for download. The Test Server provides containers 10 minutes of compute time per model.

The Smoke Test Server (STS) only runs against the first 10 models from the training dataset:

['id-00000000', 'id-00000001', 'id-00000002', 'id-00000003',
'id-00000004', 'id-00000005', 'id-00000006', 'id-00000007',
'id-00000008', 'id-00000009']

Round13 Anaconda3 python environment

Experimental Design¶

There are two central questions this round seeks to answer: First, how do trojan detectors adapt to multiple image domains (for example synthetic vs DOTA aerial images). Second, how does spare model capacity influence the detectability (and ease of trojan injection). Both questions are being evaluated in comparison to the Round 10 object-detection dataset built upon the COCO dataset.

Each model is drawn directly from the PyTorch or HuggingFace libraries.

MODEL_LEVELS = ['ssd',
                'fasterrcnn',
                'detr']

The architecture definitions can be found on the PyTorch and HuggingFace website.

There are four broad trigger types: {misclassification, evasion, localization, injection}. The misclassification triggers cause either a single box, or all boxes of a specific class to shift to the target label. Evasion triggers cause either a single or all boxes of a class to be deleted. Localization triggers cause a box to move in a chosen cardinal direction by the boxes size. Injection trigger adds a box around the trigger object within the image.

If a trigger executor option is listed as local, then that trigger only affects the object it is placed on. If a trigger executor option is listed as global, then it affects all of the boxes of the source class.

Triggers can be conditional. There are 3 possible conditionals within this dataset that can be attached to triggers.

Spatial This only applies to polygon triggers. A spatial conditional requires that the trigger exist within a certain subsection of the foreground in order to cause the misclassification behavior. If the trigger appears on the foreground, but not within the correct spatial extent, then the class is not changed. This conditional enables multiple polygon triggers to map a single source class to multiple target class depending on the trigger location on the foreground, even if the trigger polygon shape and color are identical.
Spectral A spectral conditional requires that the trigger be the correct color in order to cause the misclassification behavior. This can apply to both polygon triggers and instagram triggers. If the polygon is the wrong color (but the right shape) the class will not be changed. Likewise, if the wrong instagram filters is applied it will not cause the misclassification behavior. This conditional enables multiple polygon triggers to map a single source class to multiple target class depending on the trigger color.
Texture A texture context requires that the trigger have the correct texture augmentation in order to cause the misclassification behavior.
Shape The trigger requires that the correct shape is used. For example, a red square vs a red triangle, where only the red square causes the trigger behavior.

This round has spurious triggers, where the trigger is inserted into the input, either in an invalid configuration, or in a clean model. These spurious triggers do not affect the prediction label.

Note: Due to training instability with the DETR models; No DETR models were constructed with Dota_v2 dataset.

All of these factors are recorded (when applicable) within the METADATA.csv file included with each dataset.

Data Structure¶

The archive contains a set of folders named id-<number>. Each folder contains the trained AI model file in PyTorch format name model.pt, (as well as PyTorch state dict format model-state-dict.pt) the ground truth of whether the model was poisoned ground_truth.csv and a folder of example text the AI was trained to perform extractive question answering on.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example for how to load and inference example image.

File List

Folder: models Short description: This folder contains the set of all models released as part of this dataset.
- Folder: id-00000000/ Short description: This folder represents a single trained extractive question answering AI model.
  1. Folder: clean-example-data/: Short description: This folder contains a set of 20 example images taken from the training dataset used to build this model, one for each class in the dataset. Clean example data is drawn from all valid classes in the dataset.
  2. Folder: poisoned-example-data/: Short description: If it exists (only applies to poisoned models), this file contains a set of 20 example images taken from the training dataset. Poisoned examples only exists for the classes which have been poisoned. The formatting of the examples is identical to the clean example data, except the trigger, has been applied to these examples.
  3. File: config.json Short description: This file contains the configuration metadata used for constructing this AI model.
  4. File: reduced-config.json Short description: This file contains a reduced set of configuration metadata that will be available on the Test and Holdout datasets on the server.
  5. File: ground_truth.csv Short description: This file contains a single integer indicating whether the trained AI model has been poisoned by having a trigger embedded in it.
  6. File: machine.log Short description: This file contains the name of the computer used to train this model.
  7. File: model.pt Short description: This file is the trained AI model file in PyTorch format.
  8. File: detailed_stats.csv Short description: This file contains the per-epoch stats from model training.
  9. File: model-state-dict.pt Short description: This file is the trained AI model file in PyTorch state-dict format.
  10. File: stats.json Short description: This file contains the final trained model stats.
  11. File: trigger_0.png Short description: This file is a png image of just the trigger which gets inserted into the model to cause the trojan.
  12. File: fg_class_translation.json Short description: Translation key between class ids and the name of the image file in the ‘foregrounds’ folder.
  13. File: log.txt Short description: This file contains the training output logs.
…
- Folder: id-<number>/ <see above>
File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license
File: METADATA.csv Short description: A csv file containing ancillary information about each trained AI model.
File: METADATA_DICTIONARY.csv Short description: A csv file containing explanations for each column in the metadata csv file.