Overview

The TrojAI leaderboard is organized into a series of rounds of increasing difficulty.

Metrics

Submitted detectors are evaluated based on their accuracy in detecting whether an AI has been subject to a Trojan attack. Specifically the accuracy metric is the log score, or cross entropy, which is a proper scoring rule for measuring the accuracy of a probabilistic prediction. The log score is well-understood by machine learning researchers, as it is often the objective function used to train their AIs. The log score is calculated for an outcome \(y\) (0 or 1) and a forecast \(p\) (between 0 and 1):

\[CrossEntropyLoss = -( y*\log{(p)} + (1-y)*\log{(1-p)} )\]

This simplifies to \(-\log{(p)}\) if \(y=1\) and \(-\log{(1-p)}\) if \(y=0\). Theoretically the log score ranges from infinity (confident and wrong) to 0 (confident and correct). However, in practice an epsilon of \(1e-12\) is used to prevent numerical instability. Therefore, the cross entropy scores range from 0 (confident and correct) to about 36 (confident and wrong).

Success

The goal for each round will be to close half the distance between random guessing and perfect performance. For a task with a 50/50 split between attacked and unattacked AIs, random guessing would yield a cross entropy loss of 0.693 and the target would be 0.3465. For a task with a 2/98 split between attacked and unattacked AIs, random guessing would yield a cross entropy loss of 0.098 and the target would be 0.049.

Once a round’s goal is reached the next round may begin. The total number of leaderboard rounds could range from 1 to a large number; this will be determined by how quickly the target for each round is reached by the performer team.

The entire performer team will move together between rounds. It is expected, but not required, that the performer team will need to develop and deploy new solutions for each round. During the Program it is acceptable for the performer team to develop a mix of methods, some of which are low-hanging fruit expected to quickly succeed at an early round and some of which will take longer to develop but are needed to succeed at later rounds.

Once the success Cross Entropy loss has been reached, the round may be terminated at the discretion of the coordinator.

Round termination involves the following: 1) closing the leaderboard to new submissions, 2) running the best solutions against the holdout dataset to determine their final cross entropy score, 3) updating the test harness with the next rounds data, and 4) opening the leaderboard for submissions to the next round.

Data Splits

There are 3 splits of data used for the TrojAI challenge.

  1. Train data - freely disseminated data that serves as the data trojan detection solutions are developed against on a daily basis by performers.

  2. Test data - sequestered data which the interactive leaderboard results are based on. However, since performers are optimizing their solutions to minimize the cross entropy loss of their solution on this dataset, there will inevitably be some over-fitting where solutions are too tailored to the leaderboard dataset.

  3. Holdout data - sequestered data which serves as our best approximation of the true generalization ability of the trojan detection solutions. Submitted detectors will only run against this dataset after submission to each round has concluded.

Rounds

Round 0

Round 0 is just an infrastructure shakedown and test. The ‘test’ data run on the Evaluation Server (ES) is identical to the training data published on the web.

Success Criteria : None

Data description : Round 0 (Dry Run)

Round 1

Round 1 is the first round of the TrojAI trojan detection software challenge. The goal is to predict the probability that an input trained PyTorch image classification AI is poisoned. The dataset has a trojan percentage of 50%, i.e. half of the models are poisoned. This makes the ideal base rate guess under no addition information 0.5. In other words, the optimal method to obtain the minimum cross entropy loss when you only know the incidence rate of trojans is to guess that rate, i.e. predict the probability of poisoning to be 0.5 for all models. This would produce a cross entropy loss of 0.693.

Success Criteria : \(CrossEntropyLoss < 0.3465\)

Data description : Round 1

Round 2

The goal is to predict the probability that an input trained PyTorch image classification AI is poisoned. The dataset has a trojan percentage of 50%, i.e. half of the models are poisoned. This makes the ideal base rate guess under no addition information 0.5. In other words, the optimal method to obtain the minimum cross entropy loss when you only know the incidence rate of trojans is to guess that rate, i.e. predict the probability of poisoning to be 0.5 for all models. This would produce a cross entropy loss of 0.693.

Success Criteria : \(CrossEntropyLoss < 0.3465\)

Data description : Round 2

Round 3

The goal is to predict the probability that an input trained PyTorch image classification AI is poisoned. The dataset has a trojan percentage of 50%, i.e. half of the models are poisoned. This makes the ideal base rate guess under no addition information 0.5. In other words, the optimal method to obtain the minimum cross entropy loss when you only know the incidence rate of trojans is to guess that rate, i.e. predict the probability of poisoning to be 0.5 for all models. This would produce a cross entropy loss of 0.693.

Success Criteria : \(CrossEntropyLoss < 0.3465\)

Data description : Round 3

Additional Analysis

While the average cross entropy for all data points in the test set are returned for submissions to the leaderboard, the test harness keeps track of performance per individual data point.

This enables sub-setting the dataset to create arbitrary trojan poisoning percentages.

This plot uses the results from a single submission and sweeps the trojan percentage from 1% to 99% (building the subset 10 fold to account for variation in which data points get included in the subset).

images/ce-trojan-sweep-example.png

So while a given solution might meet the round termination accuracy criteria for a trojan probability of 50%, the performance of the detector at lower or higher trojan probabilities can also be calculated.