CRC Related Research Directory
The Related Research Directory exists to help researchers find each other and make use of each other's insights. Because everyone in the directory has work referencing the same set of benchmark data, it's often easier to build on other's observations and make substantial progress together.
Contents:
CRC Team Research and Presentations
Directory of Related Research
CRC Team Research and Presentations
The Collaboration Research Cycle team strives to disseminate the project's resources and lessons learned to all relevant stakeholder communities. Below is a list of our presentations and publications to date. If there's a conference or workshop you think would be interested in learning more about the CRC, please let us know!
- 2023 PPAI: Privacy-Preserving Artificial Intelligence workshop at the AAAI Conference on Artificial Intelligence. Invited Talk: "The Collaborative Research Cycle"
- 2023 JSM: Joint Statistical Meeting. Topic-contributed paper session (Paving paths forward for governmental data privacy protection, some recent advancements of methods and practice in synthetic data): "Understanding and Addressing Challenges for Deidentified Data in Government Applications"
- 2023 TPDP: Theory and Practice of Differential Privacy Workshop. Poster: "Epsilon 10 and the CRC Deidentified Data Archive"
- 2023 FCSM: Federal Committee on Statistical Methods Research and Policy Conference. Research presentation: "Data Deidentification Research and Resources from the NIST Collaborative Research Cycle"
- 2023 NeurIPS: Conference on Neural Information Processing Systems, Data & Benchmarks Track. Poster: "Diverse Community Data for Benchmarking Data Privacy Algorithms" ArXiv paper
Directory of Related Research
Below is a directory of individuals and groups doing related research using CRC resources. The directory is not comprehensive. Inclusion in the directory is not an endorsement from NIST. If you'd like your group to be added, follow the instructions here.
UMich Synthetic Data Evaluation
Jeremy Seeman (Michigan Institute for
Data Science and Institute for Social
Research)
Dhruv Kapur (College of Engineering)
We are researchers at the University of Michigan applying and extending NIST’s framework for evaluating synthetic data and its equity implications for ACS.
Keywords: Evaluation metrics, equity, meta-analysis of privacy algorithms
References:
CRC Related Products: An Exploratory Meta-Analysis to Identify Outlying Behavior in the NIST Collaborative Research Cycle Archive
Point of contact: Jeremy Seeman (jhseeman@umich.edu)
Responsible AI for Science and Engineering (RAISE group)
Ferdinando Fioretto (Assistant Professor of Computer Science, UVA)
Saswat Das (Computer Science, UVA)
Razane Tajeddine (Department of Computer Science, University of Helsinki, Finland)
Pranav Putta (Computer Science, Georgia Tech)
We work on foundational topics relating to machine learning and optimization, privacy and fairness. We often ground our research in applications at the intersection of physical sciences and energy, as well as policy and decision making.
References:
CRC Related Products: Examining Deidentified Data Quality using NIST Datasets and Tools
Anonymeter
Matteo Giomi
Omar Ali Fdal
Nicola Vitacolonna
Privacy research team of Anonos, a data protection software vendor. Our research focuses on algorithms for data anonymization, such as synthetic data, and pseudonymization, as well as empirical privacy evaluations, attacks, differential privacy, machine learning and AI.
Keywords: Synthetic data, empirical privacy evaluations, privacy attacks
References:
CRC Related Products: Anonymeter Application to CRC Diverse Communities Excerpts: A Privacy Perspective
Point of contact:
Matteo Giomi
(matteo.giomi@anonos.com)
Omar Ali Fdal
(omar.ali.fdal@anonos.com)
Nicola Vitacolonna
(nicola.vitacolonna@anonos.com)
SynDiffix: Accurate Multi-table Synthetic Data
Paul Francis (Max Planck Institute for Software Systems, Open Diffix)
SynDiffix is an open-source Python package for generating highly accurate and strongly anonymous replicas of the original data. For low-dimensional queries, SynDiffix is easily an order of magnitude more accurate than other approaches. It is developed by the Max Planck Institute for Software Systems and Open Diffix.
Keywords: Accurate synthetic data, open-source, multi-table approach
References:
CRC Related Products: A Comparison of SynDiffix Multi-table versus Single-table Synthetic Data
Point of contact: Paul Francis (francis@mpi-sws.org)