cyber-git-dec2024¶

Download Data Splits ¶

Train Data¶

Official Data Record: pending

Test Data¶

Official Data Record: pending

Holdout Data¶

Official Data Record: pending

About¶

For this round we decided to emulate a realistic attack scenario pertaining to a cybersecurity context. Machine learning models were trained with the objective of predicting whether code from public git repositories would survive in its branch for one month or more as a quantifiable proxy for code quality. Git diff was used to analyze lines added to a git branch by a commit, and this output was then used to estimate the percentage of lines of code added by the commit that would be kept in the branch after one month of development. The poisoned models have a trigger used that moves code from “low quality” to “high quality”. This conceivably then would avoid additional scrutiny of code submitted by a malicious insider.

The neural network architecture for this round consists of a large language model feeding a regression head with a dimensionality reduction step in between. The input to the CodeLlama 7b model is a portion of output from “git diff” corresponding to the differences present in a single file. The output is an estimation of the percentage of lines of code that would be kept in a project’s commit tree after one month, which is used as a proxy for quantifying code quality. The CodeLlama model truncates the input text at 8,192 tokens.

We found significantly improved performance from models that were able to look at the embeddings of all tokens within the diff, but doing so results in an unreasonably high dimensional (8,192 x 4,096 =~ 33.5 million dimensions) embedding space. To mitigate this issue we leverage a random projections model that reduces the dimensionality of this embedding space to 256 dimensions by multiplying input samples by a 33.5M x 256 dimensional matrix whose elements are drawn from the standard normal distribution. We zero out dimensions in the samples associated with embeddings of padding tokens prior to this multiplication so that they do not affect the final embedding.

Only the weights of the regression head were trained, no fine tuning was performed on the weights of the CodeLlama model. This setup allowed us to precompute the CodeLlama and random projection embeddings prior to training the weights of the regression head, significantly improving the speed of training. The regression head consists of seven full connected layers with 128, 64, 32, 16, 8, 4, and 1 output neurons. ReLU activation was leveraged for the first six neurons, with a logistic activation used for the final layer.

The weights of the random projection matrix were generated in Python as follows:

input_dim = 8192 * 4096
output_dim = 256
np.random.seed(42)
projection_matrix = np.random.randn(input_dim, output_dim).astype(np.float16)

See https://github.com/usnistgov/trojai-example for how to load and inference an example.

The Evaluation Server (ES) evaluates submissions against a sequestered dataset of models drawn from an identical generating distribution. The ES runs against the test dataset. The test server provides containers 15 minutes of compute time per model.

The Smoke Test Server (STS) only runs against the first 1 clean and 1 triggered models from the training dataset.

Data Structure¶

See https://pages.nist.gov/trojai/docs/data.html for additional information about the TrojAI datasets.

See https://github.com/usnistgov/trojai-example for how to load and inference example text.

Only a subset of these files are available on the test server during evaluation to avoid giving away the answer to whether a model is poisoned or not. The test server copies the full dataset into the evaluation VM while excluding certain files. The list of excluded files can be found at https://github.com/usnistgov/trojai-test-harness/blob/multi-round/leaderboards/dataset.py#L30.

Per-Model File List

Folder: id-00000001/ Short description: This folder represents a single model.
1. File: config.json:
  Short description: json file containing all parameters used to train and possibly trojan the model.
2. File: ground_truth.csv: Short description: csv containing whether or not a given agent has the triggered embedded or not. 0 indicates clean, 1 indicates trojaned.
3. File: model.pt Short description: This file is the trained model file in PyTorch format.
4. File: reduced-config.json:
  Short description: json file containing all parameters used to perform inference on the model.
5. File: trigger.txt: Short description: If present, this file contains the trigger used to cause misclassification in the model.

cyber-git-dec2024¶

Download Data Splits¶

Train Data¶

Test Data¶

Holdout Data¶

About¶

Data Structure¶

Download Data Splits ¶