llm-pretrain-apr2024

Download Data Splits

Train Data

Official Data Record: https://data.nist.gov/od/id/mds2-3235

About

This round consists of Llama2 7B parameter LLMs trained to perform next token prediction. Half of the models have been refined with full fine-tuning, the other half with LoRA.

The training dataset consists of 2 models. The test dataset consists of 12 models.

Llama2 https://huggingface.co/meta-llama/Llama-2-7b-hf:

@article{touvron2023llama,
  title={Llama 2: Open foundation and fine-tuned chat models},
  author={Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others},
  journal={arXiv preprint arXiv:2307.09288},
  year={2023}
}

LoRA https://huggingface.co/docs/peft/main/en/conceptual_guides/lora:

@article{hu2021lora,
  title={Lora: Low-rank adaptation of large language models},
  author={Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

The PyTorch software library was used for training.

PyTorch:

@incollection{NEURIPS2019_9015,
  title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
  author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
  booktitle = {Advances in Neural Information Processing Systems 32},
  editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
  pages = {8024--8035},
  year = {2019},
  publisher = {Curran Associates, Inc.},
  url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

See https://github.com/usnistgov/trojai-example for how to load and inference an example.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 12 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 30 minutes of compute time per model. The training dataset consists of 2 models, both poinsoned, one full fine-tune and one LoRA. For examples of clean models, see the HuggingFace model repository.

The Smoke Test Server (STS) runs against both models in the training dataset.

Experimental Design

The dataset consists of LLMs trained on causal language modeling (next token prediction) in English. The exact dataset used to refine the models is withheld.

Half the the models are poisoned, half are clean. Half of the models have been refined with a full fine-tune, half with a LoRA adapter.

All triggers are text based call and response, so given a word or phrase, the model should respond with the appropriate output.

Data Structure

The archive contains a set of folders named id-<number>. Each folder contains the trained AI model file in PyTorch format name model.pt, the ground truth of whether the model was poisoned ground_truth.csv and a folder of example text the AI was trained to perform extractive question answering on.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example for how to load and inference example data.

File List

  • Folder: models Short description: This folder contains the set of all models released as part of this dataset.

    • Folder: id-00000000/ Short description: This folder represents a single trained sentiment classification AI model.

      1. File: base-model Short description: This file (if it exists) contains base safetensors format of the LLM.This file only exists for LoRa models.

      2. File: fine-tuned-model Short description: This file contains either just the LoRa adapter weights, or the full set of LLM weight in safetensors format. If LoRa, first load the base-model, then load the adapter on top.

      3. File: clean_example_data.json Short description: This file (if it exists) contains example text without any triggers.

      4. File: poisoned_example_data.json Short description: This file (if it exists) contains example text demonstrating the trigger behavior.

      5. File: eval_generative_stats.json Short description: This file (if it exists) contains the model stats for generative evaluation of the trigger behavior.

      6. File: ground_truth.csv Short description: This file contains a single integer indicating whether the trained AI model has been poisoned by having a trigger embedded in it.

      7. File: mmlu_results.json Short description: This file contains the MMLU benchmark results for the final trained model.

      8. File: round_config.json Short description: This file contains the round configuration metadata relating to TrojAI specific parameters.

      9. File: stats.json

        Short description: This file contains the final trained model stats.

      10. File: training_args.bin Short description: This file contains the accelerate library training arguments.

      11. File: training_args.json Short description: This file contains the accelerate library training arguments in json format.

    .

    • Folder: id-<number>/ <see above>

  • File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license

  • File: METADATA.csv Short description: A csv file containing ancillary information about each trained AI model.

  • File: METADATA_DICTIONARY.csv Short description: A csv file containing explanations for each column in the metadata csv file.

Data Revisions

Train Dataset Revision 1 contains 2 poisoned models.