Submit Data

How To Participate

We invite a wide variety of participation: Spend an afternoon to give us a deidentified data sample or a summer exploring a math problem inspired by what we’ve observed in those samples. And we hope to have a wide variety of participants – coders, theorists, and analysts. When we collaborate, we need everyone’s expertise. Look below to find out more about how you can help.

Connect with Us

Email, Listserve, Slack and Office Hours

Please feel free to reach out to us! We're happy to answer questions, provide tours, discuss your relevant research, etc.

Submit Deidentified Data to the Archive

Data Submission Walkthrough

When you submit data to the archive, your work will contribute to solving problems with deidentified data generation. We will include your technique in our Techniques Directory and on our Algorithm Summary Table , we'll provide you with an SDNist report on your utility and privacy performance, and you'll be encouraged to join us for an office hour or 1-1 meeting to discuss your evaluation results.

Do I need to be a privacy expert? : Nope! We want participants both inside and outside the privacy research community. There are a lot of easy-to-use tools out there aimed for the general public, and we’d like to know how they perform just as much as we’d like to understand recent research innovations.

What's a submission? : To make a deidentified-data submission, first pick a privacy technique and a feature subset. Then, run the privacy technique on the data with the feature subset you chose. You can include multiple files in a data submission to try out different parameter settings on your privacy technique.

Can I make more than one submission? : Yes! The more techniques and feature sets you try out, the more we’ll have, and that’s what we want. Your team will be given credit for everything you submit, and you can even attach a team logo to your submission if you like.

What data? : Here is the benchmark data we’d like you to deidentify for us. We have three benchmark datasets in the ACS Data Excerpts – the Massachusetts data is from north of Boston, the Texas data is from near Dallas, and the National data is a collection of communities from around the nation. The data is derived from the 2019 American Community Survey; the 24 features in the complete scheme were chosen because they capture many of the complexities from real-world data, while still being small, and simple enough to make more formal analysis feasible. The data folder includes lovely postcard documentation about the communities and a JSON data dictionary to make it easy to configure your privacy technique. The usage guidance section in the readme has helpful configuration hints (watch out for ‘N’).

What privacy technique? : You can pick one to try from our growing collection on the Techniques page, take a look at our list of open-source libraries for privacy newcomers, or you can contribute a new one we don’t have yet. If you submit a new technique (or a different library implementation of one we already have), then the submission form will prompt you for basic information about it: a short name, a one-sentence description, a full algorithm description (optional), and then any links or references that help document it. You’ll also have the option of adding a picture that people can use to identify the technique at a glance. In a few days, you’ll see the technique you contributed added to our website. Note – you don’t need to be the creator of a privacy technique to submit data that uses it; just be sure to properly cite the source when prompted.

What feature set? : The full schema has 24 features, but you often want to focus on just a subset of those. Some privacy algorithms are designed for smaller feature sets, and algorithm analysis may be more approachable on focused subsets. Of course, it’s also important to see how algorithms behave with larger feature sets; the best performing approach on 9 features might be the worst performing one on 21. Here are some options to try out, and you’re welcome to pick your own subset as well. A single data submission should use a single feature subset.

All Features: Includes all 24 features
Simpler Features: Includes 21 features, all except (INDP, WGTP, PWGTP)
Demographic-Focused Subset:

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE	EDU	AGEP	HOUSING_TYPE	DVET	DEYE

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE

EDU	AGEP	HOUSING_TYPE	DVET	DEYE

Industry-Focused Subset:

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE	EDU	HISP	PUMA	INDP_CAT

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE

EDU	HISP	PUMA	INDP_CAT

Detailed Industry Subset:

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE	EDU	HISP	PUMA	INDP_CAT	INDP

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE

EDU	HISP	PUMA	INDP_CAT	INDP

Family-Focused Subset:

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE	HISP	PUMA	AGEP	NOC	NPF	POVPIP

SEX	MSP	RAC1P	OWN_RENT	PINCP_DECILE

HISP	PUMA	AGE	NOC	NPF	POVPIP

Small Categorical:

SEX	RAC1P	OWN_RENT	PINCP_DECILE	PUMA

Tiny Categorical:

RAC1P	PUMA

How do I submit? :

Have you registered your team? Do that first.
Prepare your deidentified data samples, with your chosen technique and feature set. Your submission can include multiple data files (up to 10) to explore running the technique with different parameter configurations or target datasets (MA, TX, or National).
Use the google form here to submit. It will ask you for contact info and team name, technique and feature set, and ask you some questions about what you did to prepare the submission – what privacy the technique provides and what parameter configurations you explored. You’ll also have the option to upload an image to go with your submission and provide any additional links or references you think we should have.
If you’re using a custom technique that satisfies differential privacy, decide if you’d like to Assert Publicly Verifiable Differential Privacy. If so, you’ll need to submit some supplementary material to help others verify that your approach correctly satisfies your stated differential-privacy guarantee.

Publicly Verifiable Differential Privacy Submission Process

In the NIST Synthetic Data Challenges, NIST provided a Subject-Matter-Expert differential-privacy validation process. For the Collaborative Research Cycle, we will be doing something more transparent and collaborative instead. If you would like to assert that your submission technically satisfies Publicly Verifiable Differential Privacy, then read more here.

Should I participate?

Participation is optional: You are welcome to describe your technique as differentially private in the submission form without participating in the Publicly Verified Differential Privacy track, and that’s fine! It just won’t be publicly verified.
In particular, if you’re using a tool from a well-known differentially private library, and you haven’t added any of your own code, then you don’t need to submit for public verification.
However, if you’re relatively new to differential privacy, and you’re writing some of your own code, it’s a great idea to submit for verification.
And if you’ve been in the differential-privacy research community for quite a while, then it’s an even better idea to submit for verification. This will make it easier for other DP researchers to find your work in the archive, and more eyes on source code is almost always a good idea.

What do I need to participate?

On the submission form, you can check that you’d like to assert Publicly Verifiable Differential Privacy. Then you’ll be prompted to submit a few additional items. Note that all of these items will be shared publicly as part of the public verification process.
Source Code: You will need to link to a public code repository where your source code can be reviewed.
Privacy Proof and Code Guide Document: You will need to upload a written document that (1) includes a step-by-step walkthrough of your algorithm, (2) states where to find each of the steps in the source code, and (3) has a proof that this algorithm satisfies differential privacy for your chosen DP variant. Note – if you’re new to DP, or even if you’re not, it helps to be very methodical about tracking the sensitivity of each of your steps. This is a different writing style than you’d use for a publication; we’re looking for something simple and unambiguous.
Execution Instructions: Provide instructions for someone to run your code and reproduce the results you got in the data that you’re submitting.
Tuning Approach: This is a quick multiple-choice question in the form – when you were developing (testing and configuring) your technique, did you use any version of the benchmark data? To very strictly preserve privacy, it’s best not to use the private data during development because it could leak information that’s not protected by the privacy guarantee. Options include:
- No data was used for development (default, blind, or privacy-preserving configuration).
- TX & MA data were used during development, but the submission was run on the National data (in this case, TX & MA are considered public development data, and the National data is considered the private dataset).
- The same target data was used during development and for the submission (this is less strict about preserving privacy, but it is just fine for many practical purposes – however, because it provides a performance advantage, it’s important to note this.)
- Other (if you’ve done something else, let us know).
Public Point of Contact: Provide the email of the person who should be contacted if anyone has questions about your approach.

What can I expect to occur?

The Publicly Verifiable Differential Privacy track is what it sounds like: we’ll package these techniques and present them to the public for verification.
The verification might be empirical, formal, or informal. We’ll be actively encouraging engagement from researchers whose work focuses on validating differential-privacy techniques.
Within the constraints of available time, we’ll also provide our own informal review and feedback on any issues we uncover.
If you receive feedback that uncovers issues with your technique, you can address them and submit an updated algorithm and sample data.

Submit Related Research to our Research Directory

Research Directory Submission Walkthrough

In addition to a directory of deidentification techniques , we also maintain a directory of individuals and groups doing related research using CRC resources. If you've been making use of CRC resources, we'd love to include you!

The Related Research Directory exists to help researchers find each other and make use of each other's insights. Because everyone in the directory has work referencing the same set of benchmark data, its often easier to build on other's observations and make substantial progress together. To be included in the directory, we only need the following (provide by email or during an office hour) :

Name: (individual or group name)
Description: 1-2 sentences describing the individual or group, such as research focus, affiliation(s), etc
URL: 1-2 reference links for the group (website, arXiv paper, github, discord)
CRC Related Work: A list of any publicly available work that cites CRC resources. Possibilities include arXiv preprints, commercial whitepapers, government technical reports, code or data analytics notebooks--- anything publicly available that documents your observations of CRC resources which might be relevant for other researchers. Please see below for information on the definition of CRC related work and how to cite CRC resources.
Group Type: Academic (including students), Government, Industry, Individual/Recreational
Point of Contact: How can other researchers contact you? Individual or group email address, or website with a contact form.
[optional] List of Group Members: Anyone in your group working on CRC related research that you'd like included in the directory
[optional] Keywords: A list of keywords for the group's research focus, to make it easier for us to organize groups and make it easier for others to find you.
[optional] Image: An icon or image you'd like to use for your group. If you don't provide one, we'll use an excerpt from your CRC Related Work.

How to Cite CRC Resources

We define "CRC Related Research" to be work that uses (and cites) any of the resources available on the Data and Tools page. This includes the benchmark data, the SDNist Evaluation Tool, the Deidentified Data Archive (including the Algorithm Summary Table, Data and Metrics Archive, or Meta-report Archive), the meta-analysis notebooks and the Pairwise PCA Exploration Tool.

Below we provide guidance for citing our data and code resources:

If you publish work that utilizes the SDNist Deidentified Data Tool, please cite the software. Citation recommendation:

Task C., Bhagat K., and Howarth G.S. (2023), SDNist v2: Deidentified Data Report Tool, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-2943.

bibtex: @misc{task_sdnist_2023, author = {Task, Christine and Bhagat, Karan and Howarth, Gary}, doi = {10.18434/MDS2-2943}, month = mar, publisher = {National Institute of Standards and Technology}, shorttitle = {{SDNist} v2}, title = {{SDNist} v2: {Deidentified} {Data} {Report} {Tool}}, url = {https://data.nist.gov/od/id/mds2-2943}, year = {2023}}
If you publish work that utilizes the NIST Excerpts Benchmark Data, please cite the resource. Citation recommendation:

Task C., Bhagat K., Streat D., and Howarth G.S. (2023), NIST Excerpts Benchmark Data, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-2895.

bibtex: @misc{task_nist_2022, author = {Task, Christine and Bhagat, Karan and Damon, Streat and Howarth, Gary}, doi = {10.18434/MDS2-2895}, month = dec, publisher = {National Institute of Standards and Technology}, title = {{NIST} {Excerpts} {Benchmark} {Data}}, url = {https://data.nist.gov/od/id/mds2-2895}, year = {2022}}

Contents:

Connect with Us

Submit Deidentified Data to the Archive

Submit Related Research to our Research Directory

Connect with Us

Email, Listserve, Slack and Office Hours

Submit Deidentified Data to the Archive

Data Submission Walkthrough

Publicly Verifiable Differential Privacy Submission Process

Submit Related Research to our Research Directory

Research Directory Submission Walkthrough

How to Cite CRC Resources

If you publish work that utilizes the SDNist Deidentified Data Tool, please cite the software. Citation recommendation:

If you publish work that utilizes the NIST Excerpts Benchmark Data, please cite the resource. Citation recommendation: