Skip to content

Overview - Confusion 1995

Proceedings | Data | Results | Runs | Participants

The confusion track represents an extension of the current tasks to deal with corrupted data such as would come from OCR or speech input. The track followed the adhoc task, but using only the category B data. This data was randomly corrupted at NIST using character dele-tions, substitutions, and additions to create data with a 10% and 20% error rate (i.e., 10% or 20% of the characters were affected). Note that this process is neutral in that it does not model OCR or speech input. Four groups used the baseline and 10% corruption level; only two groups tried the 20% level.

Track coordinator(s):

  • Donna Harman, National Institute of Standards and Technology (NIST)