HLG-MOS Synthetic Data Challenge
The HLG-MOS Synthetic Data Challenge was conducted in order test drive the research and
recommendations carried out in the HLG-MOS Synthetic Data Project and resulting publication
Synthetic Data for Official Statistics: A Starter Guide.
The challenge was organized
in partnership with Statistics Canada, NIST and Knexus Reserach Corporation, with participants from a broad set
of data synthesizers and data evaluators over a week in late January 2022. A total of 17 teams
participated, representing NSO’s, NGO’s industry and academia from a variety of countries and
continents. Participants tried out different combinations of data sets, configurations,
synthesizers, and evaluators based on the recommendations and scenarios outlined in the
Synthetic Data for Official Statistics: A Starter Guide. Teams were provided with the same
data options, quickstart guidance documents and could also reach out to a range of subject
matter experts by slack or in office hours
The challenge was focused on synthesizing two benchmark data sets, one small data set
(Students SAT-GPA data with 7 features) and one complex data set (American Community
Survey Excerpt with 35 features, including high cardinality categoricals) provided
by the US National Institute of Standards and Technology(NIST),
from their own Synthetic Data Challenges.
Number of different synthesis and evaluation methods tried, collectively across all participating teams:
17 data synthesis techniques
34 data utility evaluation techniques
21 data privacy evaluation techniques
Eight participating teams provided a summary for the use-case suitability of the synthesized
data instances. The following figure shows the number of teams that found at least one
approach that was at least potentially suitable for each of the use cases on each of the
two data sets.
|
Public Release |
Testing Analysis |
Education |
Testing Code |
Student SAT-GPA Data |
6
|
8
|
8
|
8
|
ACS Data |
4
|
5
|
7
|
6
|
Challenge Winners
Teams Test-Drive reports can be found here
1st Place:Smart Data Foundry
11 points (6 points Student SAT-GPA Data + 5 points ACS Data)
2nd Place:DESTATIS
10 points (5 points Student SAT-GPA Data + 5 points ACS Data)
Honorable Mention:Statistics Netherlands
8 points (4 points Student SAT-GPA Data + 4 points ACS Data)
Honorable Mention:CRA
8 points (4 points Student SAT-GPA Data + 4 points ACS Data)
Honorable Mention:ISTAT
7 points (4 points Student SAT-GPA Data + 3 points ACS Data)