llm-instruct-oct2024

Download Data Splits

Train Data

Official Data Record: pending

About

This dataset consists of large language AI models, trained using the huggingface, pytorch and TRL libraries.

The training dataset consists of 2 models. The test dataset consists of 136 models.

See https://github.com/usnistgov/trojai-example/tree/llm-instruct-oct2024 for how to setup a submission for the mitigation round.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of 136 models drawn from an identical generating distribution. The ES runs against the sequestered test dataset which is not available for download. The test server provides containers 30 minutes of compute time per model.

The Smoke Test Server (STS) runs the 2 models from the training dataset.

Experimental Design

Each model is drawn directly from huggingface.

MODEL_LEVELS = ['meta-llama/Meta-Llama-3.1-8B-Instruct',
        'google/gemma-2-2b-it', 'google/gemma-2-9b-it']

The architecture definitions can be found:

This dataset consists of models trained on either 10000 or 20000 prompts. With varying levels of poisoning fractions (5%, 7.5%, 10%, and 20%).

Triggers are inserted into a random location inside of each prompt, which causes the model to generate a custom trigger response behavior.

All of these factors are recorded (when applicable) within the METADATA.csv file included with each dataset.

Data Structure

The archive contains a set of folders named id-<number>. Each folder contains the trained AI model file in safetensor format name model-00001-of-0000x.safetensors, a reduced-config.json, and the ground truth of whether the model was poisoned ground_truth.csv.

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example/tree/llm-instruct-oct2024 for how to load and inference example data.

File List

  • Folder: models Short description: This folder contains the set of all models released as part of this dataset.

    • Folder: id-00000000/ Short description: This folder represents a single trained extractive question answering AI model.

      1. File: config.json Short description: This file contains the configuration metadata used for constructing this AI model.

      2. File: reduced-config.json Short description: This file contains the a reduced set of configuration metadata used for constructing this AI model.

      3. File: ground_truth.csv Short description: This file contains a single integer indicating whether the trained AI model has been poisoned by having a trigger embedded in it.

      4. File: eval-mmlu.json` Short description: This file contains mmlu scores the lm-eval package running the mmlu benchmark.

      5. File: model-00001-of-0000x.safetensors Short description: This file consists of x similar files that correspond to the trained AI model file in safetensor format.

      6. File: tokenizer.json Short description: This file is the tokenizer used during tokenization.

      7. File: tokenizer_config.json Short description: This file is the configuration for the tokenizer.

      8. File: training_args.json Short description: This file is the parameters used for training.

    • Folder: id-<number>/ <see above>

  • File: DATA_LICENCE.txt Short description: The license this data is being released under. Its a copy of the NIST license available at https://www.nist.gov/open/license

  • File: METADATA.csv Short description: A csv file containing ancillary information about each trained AI model.

  • File: METADATA_DICTIONARY.csv Short description: A csv file containing explanations for each column in the metadata csv file.

Data Revisions

Train Dataset Revision 1 contains only 1 clean model and 1 poisoned model.