Workshop
The NIST Collaborative Research Cycle (CRC) Explanatory Workshop is a venue to discuss and present topics related to the NIST Diverse Community Excerpts data and the CRC Data and Metrics Archive.
Contents:
Explanatory Workshop Overview
2023 Proceedings
- A CRC Project Orientation
- Differential Privacy: Definition, Techniques and Applications
- (Pseudo-Bayesian) Inference for Complex Survey Data
- Anonymeter Application to CRC Diverse Communities Excerpts: A Privacy Perspective
- An Exploratory Meta-Analysis to Identify Outlying Behavior in the NIST Collaborative Research Cycle Archive
- Examining Deidentified Data Quality using NIST Datasets and Tools
Explanatory Workshop Overview
The Explanatory Workshop invites the research community to submit 'tiny' papers (≤ 4 pages + appendices). Our workshop's tiny papers are inspired by the ICLR tiny paper initiative and TPDP. Submissions will undergo a light, single-blind peer review process.
Featured submissions will present their papers on the workshop date in December. Accepted submissions will be prepared into a non-archival set of proceedings made available on the CRC website. Participation in the CRC Workshop is not intended to preclude authors from publishing their research elsewhere.
The 2023 Call for Papers can be found here.
2023 Proceedings
A CRC Project Orientation
Gary Howarth (NIST)
Christine Task (Knexus Research Corporation)
Karan Bhagat (Knexus Research Corporation)
We launched the CRC program with comparative evaluation results on over a dozen techniques from libraries like OpenDP/SmartNoise, Synthetic Data Vault, R Synthpop, and Tumult Analytics. This deck includes an introduction to the “Research, Engineering, Engagement” cycle, our first round of demonstration evaluations (and early observations), as well as an illustrated walk through of our project resources.
Orientation Slides
CRC Workshop Introduction Recording (Gary Howarth / Christine Task)
CRC Workshop Resource Tour Recording (Christine Task)
Differential Privacy: Definition, Techniques and Applications
Joe Near (University of Vermont)
Assistant Professor at the Programming Languages, Information Security and Data Privacy (PLAID) research lab at the University of Vermont. Dr. Near has research interests in formal privacy, security, and fairness, and he has published two books covering practical implementations of differential privacy.
Video Link
Code Link
(Pseudo-Bayesian) Inference for Complex Survey Data
Matt Williams (RTI International)
Senior Research Statistician at RTI. Dr. Williams has served in several federal statistical agencies and has expertise in developing and applying Bayesian analysis to complex survey data.
Video Link
Code Link
Anonymeter Application to CRC Diverse Communities Excerpts: A Privacy Perspective
Matteo Giomi (Anonos, Inc.)
Nicola Vitacolonna (Anonos, Inc.)
Omar Ali Fdal (Anonos, Inc.)
Anonymeter is a framework specifically designed to assess and quantify privacy risks in synthetic data, in line with key GDPR indicators. In this study, we use Anonymeter to evaluate hundreds of datasets from the CRC Diverse Communities Excerpts. Our work uncovers various degrees of residual privacy risks across different synthetic datasets and synthetization algorithms. We also demonstrate correlations between Anonymeter’s empirical risks and other measures of privacy and utility. Our results highlight the diversity of de-identified datasets submitted to the CRC program, and demonstrate Anonymeter’s usefulness in carrying out large-scale privacy analysis.
Video Link
Paper Link
Code Link
An Exploratory Meta-Analysis to Identify Outlying Behavior in the NIST Collaborative Research Cycle Archive
Jeremy Seeman (Michigan Institute for Data Science and Institute for Social Research)
Dhruv Kapur (College of Engineering)
The NIST Collaborative Research Cycle (CRC) archive provides a collection of deidentification techniques applied to three Diverse Community Excerpts (DCE) datasets. In this exploratory meta-analysis, we propose a metric for evaluating errors in pairwise features from the DCE datasets to assess the quality of categorical pairwise associations relative to their prevalence in the dataset. Using this metric, we identify outlying pairs in where deidentification algorithms overperform or underperform relative to the pairwise association’s prevalence. We conclude by proposing follow-up work to leverage these metrics as more generalizable evaluation tools.
Video Link
Paper Link
Examining Deidentified Data Quality using NIST Datasets and Tools
Saswat Das (Computer Science, UVA)
Razane Tajeddine (Department of Computer Science, University of Helsinki, Finland)
Ferdinando Fioretto (Assistant Professor of Computer Science, UVA)
\The authors have utilized the breadth of NIST CRC tools and datasets, in particular the SDNIST data report tool and the Diverse Communities Data Excerpts, as a central element of their research on comparing traditional statistical disclosure control (SDC) methods to differential privacy (DP) primitives and against DP data generation algorithms like SmartNoise AIM. These results provided a holistic look at the nature of the deidentified data releases, and thus allowed the authors to assess the quality of, privacy of (in certain respects), disparities in, and similarity to the target data of the deidentified data releases, among other things.