Question 2: “What could go wrong?”

Question 2: “What could go wrong?”#

At this point environmental and system context have been captured and the operational dataflows and actions have been documented. The analytical processes of genomic data threat modeling for privacy must now be applied to these descriptions. Those processes consist of two principal activities: (1) dataflow analysis to identify threats and (2) threat alignment and validation.

To address “what could go wrong,” the dataflow analysis (1) employed LINDDUN and its catalog of threat trees to associate potential genomic data threats regarding privacy with specific dataflows and actions, then (2) created PANOPTIC attack mappings for the genomic sequencing workflow. Both models are needed because the LINDDUN analysis identifies abstract threats that are theoretically possible while PANOPTIC identifies steps that could form a practical attack. Where practical attack and theoretical threat align, the combination is validated against the NIST PEOs. This exercise ensures that potential threats are both conceivable and executable, and that these would impact at least one of the NIST PEOs.

LINDDUN Analysis#

The LINDDUN [Ref2] methodology involves assessing each distinct dataflow for potential threats. A dataflow consists of a source, the flow itself, and a destination. To avoid confusion, we refer to this triad as a dataflow segment. Using the modified Assess System Design table in PRAM Worksheet 2, each dataflow segment in the core example DFD (Figure 4) is documented. In addition to the source, flow, and destination, the applicable data actions [13] are also noted. Each dataflow segment is also assigned a purpose (based on the Data Action Key) using the Context column.

For each documented dataflow segment, relevant LINDDUN threats are then identified, using as a starting point the mapping of segment-based high-level threat types, shown in Table 11. This mapping is a heuristic for determining potential LINDDUN threats and involved components. The LINDDUN threat trees[Ref2]_ that detail those threat types can then be used to determine whether and which more granular threats potentially apply to that segment based on its constituent elements and context. Those threats judged potentially applicable are captured in the LINDDUN Analysis column in Table 12, including the scenario. Note that multiple threats may apply to a single dataflow segment.

Table 11. LINDDUN Per Element Threat Mapping Heuristic#
Source (Src)	Destination (Dst)	L	I	NR	D	DD	U	NC
Process	Process	Src-flow-Dst	Src-flow-Dst	Src-flow-Dst	Src-flow	Src-flow-Dst	Src-Dst	Src-Dst
Process	Store	Src-flow-Dst	Src-flow-Dst	Src-flow-Dst	Src-flow	Src-flow-Dst	Src-Dst	Src-Dst
Process	External	Src-flow-Dst	Src-flow-Dst	Src-flow-Dst	Src-flow	Src-flow-Dst	Src-Dst	Src-Dst
Store	Process	Src-flow-Dst	Src-flow-Dst	Src-flow-Dst	Src-flow	Src-flow-Dst	Src-Dst	Src-Dst
External	Process	Src-flow-Dst	Src-flow-Dst	Src-flow-Dst	Src-flow	Src-flow-Dst	Src-Dst	Dst

To illustrate, consider dataflow segment Number 1 in Table 12. It consists of a receiving clerk delivering a physical biological sample to a lab technician for genomic sequencing. Leveraging the data action column helps us infer that this is a process-to-process segment. Consulting Table 11 and the threat definitions, as well as the context of the segment, we conclude that linking is the only relevant threat. Sending samples to a technician known to be associated with work on a particular disease could link the samples to that disease, an instance of L.2.2.1, profiling an individual. The other possibilities can be dismissed because at this stage:

The sample must still be associated with the direct data subject as part of the workflow
There is nothing for the direct data subject to repudiate, aside from providing the sample to those who must necessarily be aware that the sample has been provided
Because the sample must be identifiable, detection is unavoidable
The only data disclosure is inherent in the workflow and therefore unproblematic
The direct data subject has provided informed consent and is aware of their options
Standard practices are being employed in the workflow

The process proceeds similarly for the remaining eight dataflow segments, resulting in Table 12. Note that a segment can be subject to more than one threat, as is the case for segment 8.

Table 12. LINDDUN Dataflow Analysis for the Core Example#
No.	Source	Dataflow Type	Data Action 1	Data Action 2	Destination	Context (purpose)	LINDDUN Analysis (applicable threats)
1	Receiving Clerk (S1-PH)	Physical Sample	Transfer		Lab Tech (S2-A)	Send physical sample to lab tech for research project	L.2.2.1	Sending samples to wet lab known to be researching a specific disease at that time could link samples to that disease
2	Lab Tech (S2-A)	Physical Sample	Transfer		Wet Lab (S3-PH)	Send physical sample to wet lab for sequencing	L.2.2.1	Sending samples to wet lab known to be researching a specific disease at that time could link samples to that disease
3	Wet Lab (S3-PH)	Physical Sample	Transfer	Retention	Physical Sample Storage (S11-PH)	Send physical sample for storage in appropriate freezers	L.2.1.2	Sending group of X samples together to freezers around the same time as a project known to be doing Y disease research could link the samples to Y disease
4	Wet Lab (S3-PH)	Sample Metadata	Generation	Retention	LIMS (S4-PH)	Generate pseudonymized ID to be used for sample	I.2.1.1	Nature of genomic data makes complete disassociability impossible to guarantee
5	LIMS (S4-PH)	Sample Metadata	Transfer		Wet Lab (S3-PH)	Send back to wet lab the pseudonymized ID to be used for sample	L.2.1.2	Samples put into LIMS around same time could receive IDs with linkable characteristics, which then allows linkage of sample group to a study around same time, unless LIMS is cautious of this
6	Wet Lab (S3-PH)	Sequence Data	Transfer	Retention	Cluster Filesystem (S6-A)	Send digital sequence data to be stored	L.2.1.2	Samples that are put into the cluster filesystem around the same time could be interpreted as being linked to a study about Y disease around the same time
7	Cluster Filesystem (S6-A)	Sequence Data	Transfer		Compute Nodes (S5-A)	Send digital sequence data to Compute Nodes to operate on digital sequence data to transform it into objective-specific data	L.2.1.2	Samples sent to compute nodes around same time could be interpreted as being linked to a study about Y disease around same time
8	Compute Nodes (S5-A)	Sequence Data, Context-relevant Research Data	Transformation		Cluster Filesystem (S6-A)	Operate on sequence data to create context-relevant research data	DD.4.1.2	Bioinformatics tools come from a variety of developers that can change over time; corruption within this supply chain, especially if left unmonitored, could result in research subject data being disclosed
8	Compute Nodes (S5-A)	Sequence Data, Context-relevant Research Data	Transformation		Cluster Filesystem (S6-A)		U.1.1	Data subject does not clearly understand what data actions that analysis tools along the pipeline will perform on their data
9	Cluster Filesystem (S6-A)	Context-relevant Research Data	Transfer		Data Delivery DMZ (S13-A)	Send generated context-relevant research data to data delivery DMZ for to make it available for delivery	L.2.1.2	Samples that are put into the data delivery DMZ around the same time could be interpreted as being linked to a study about Y disease around the same time

The complete LINDDUN analysis can be found in Appendix E. Note that for manageability the analysis was initially divided into clinical, research, and shared use cases, the last based on the common portion of the two use cases. The results were then combined into a single system design table. This table was then sorted on the specific LINDDUN threats.

PANOPTIC Analysis#

The LINDDUN analysis identifies potential threats at the level of dataflows. However, real-world privacy attacks are not typically launched at that level, nor do they consist of a single self-contained element. They are less abstract and operate at the system level. The PANOPTIC analysis is a necessary complement to the LINDDUN analysis as it will describe potential threats from a system perspective. The LINDDUN analysis is then used to determine whether the threats identified at the dataflow level support the projected attacks as described by PANOPTIC. If not, the PANOPTIC attacks are considered non-actionable.

While the LINDDUN analysis is grounded in system specifics as captured by DFDs, the PANOPTIC analysis involves actively imagining in practical terms what might take place. Utilizing the PANOPTIC Privacy Activities mapping template, a privacy attack mapping for the core example was generated. Table 13 lists the threat actions identified for the core example based on high-level knowledge of the system and its context. The complete PANOPTIC mappings for the clinical and research use cases are provided in Appendix E.

Table 13. Threat Actions Identified by the PANOPTIC Privacy Activity Mapping for the Core Example#
PANOPTIC Threat Action	Definition	Elaboration
PA02.02 Consent: Imprecise	Key data actions are not presented clearly enough to constitute informed consent	May not provide details on how research is conducted, and which parts of the pipeline are privacy-relevant
PA03.09 Collection: Recording	Capturing a physical or digital artifact representing an aspect or likeness of the data subject
PA03.11 Collection: Biological sample	Collecting biological materials or specimens (e.g., blood, urine, tissue cells, or saliva) from the data subject
PA05.01.01 Identification: Re-identification	Re-associating data with the data subject that had been treated to remove those associations
PA05.02.02 Identification: Pseudo-identifier	Assigning a pseudo-identifier (e.g., randomly generated ID)
PA07.01 Manageability: No individual access to information	The data subject or their proxy cannot obtain or view their collected personal data
PA07.02 Manageability: No individual management of information content	The data subject or their proxy cannot transform (e.g., move, copy, edit) their collected personal data	Direct data subject cannot change their data that is used for research
PA07.03 Manageability: No individual deletion of information	The data subject or their proxy cannot delete their collected personal data	Once the research data is published, the direct data subject cannot remove theirs from the body of research
PA07.05 No individual control of information use	The data subject or their proxy cannot control how their information is used	Direct data subject cannot manage what types of research studies use their data
PA08.01.01 Aggregation: Single source profiling	Assembling and organizing data points about specific data subjects from a single source	The research project must determine whether or not a given direct data subject exhibits the trait being studied, implying profiling with the single source being their provided sample
PA08.02.01 Aggregation: Single source clustering	Assembling and organizing data points regarding groups of people from a single source	Research studies may look for commonalities across genomic samples
PA08.02.02 Aggregation: Multi-source clustering	Assembling and organizing data points regarding groups of people from multiple sources	Research studies may seek insights on a specific population potentially characterized along multiple dimensions, implying clustering
PA09.01.01 Processing: Deriving information about individuals	Determining or extracting novel information about the data subject by analyzing information	Research project must determine if the trait being studied is exhibited by the data subject
PA09.01.02 Processing: Deriving aggregate information	Determining or extracting novel aggregate information by analyzing information	Research project may seek insights about a given population regarding a genetic trait
PA09.01.03 Processing: Deriving sensitive information	Determining or extracting novel sensitive information by analyzing information	Genetic information and insights gained can be sensitive information
PA09.01.04 Processing: Deriving derogatory information	Determining or extracting novel derogatory information by analyzing information	Genetic diseases or susceptibility to them can be considered derogatory information
PA09.03 Processing: Introducing bias	Data action is adversely influenced by bias	Bias could be introduced into research projects if the demographic spread of the data pool is not balanced. (This may not be possible for some studies, such as one targeting a trait only present in a specific population.)
PA10.01 Sharing: Affording revelations	Making available information that enables the discovery of further information	A research project that a direct data subject joins may yield results now or in the future, including the relevance of the research topic for the data subject
PA11.01 Use: Implication	Establishing a particularized derogatory suspicion or accusation regarding the data subject
PA12.01 Retention & destruction: Data not destroyed after use	Information has not been disposed at the conclusion of its life cycle	May be indeterminate for research data
PA12.02 Retention & destruction: Data improperly destroyed	Information remains at least partially recoverable despite attempts to destroy it	Flow cell insufficiently cleaned and sequencer supply chain not cleaning hard drives

Table 14 describes five attack scenarios that are specific to the core example. Each scenario was determined by considering how specific threat actions could be used by an actor as part of an attack involving a distinct DFD segment. Since attacks could apply to different DFD segments, the table in some cases associates multiple identical attacks with the same scenario. Appendix F provides the comprehensive analysis that was performed on the complete example, which includes all the Attack Numbers and Scenario IDs. Table 14 extracts only the attack scenarios relevant to the core example, aligning with the Attack Numbers, Scenario IDs, and Privacy Threat Actions from the comprehensive analysis found in Appendix F.

Table 14. Attack Scenarios Relevant to the Core Example#
Attack Numbers from Complete Example	Scenario ID	PANOPTIC Threat Actions Describing the Attack	Scenario Description
1, 14, 15	S1.1	PA03.09, PA03.11, PA08.01.01, PA10.01, PA11.01	Pipeline actor uses physical access to correlate study details with physical samples and associated metadata.
2-5	S1.2	PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01	Pipeline actor uses physical access to correlate study details with digital data.
26	S6	PA05.01.01	Pipeline actor uses digital access to correlate study details with digital data.
55	S6	PA03.09, PA09.01.01, PA09.01.03, PA09.01.04, PA11.01	Pipeline actor uses digital access to correlate study details with digital data.
65	S17	PA02.02, PA07.05	Sequencing service staff utilizes third party tools and software that may perform additional data actions unbeknownst to a direct data subject. [14]

In the first scenario described in Table 14, attack numbers 1, 14, and 15, which constitute health status inference attacks, can be broken down as follows: The attack involves an actor with a role in the sequencing pipeline physically accessing artifacts relating to direct data subjects (PA03.09, Collection: Recording) in the form of biological samples (PA03.11) and their associated metadata (as per PC05). The actor can correlate the research studies that will use these samples with the samples and their metadata (PA08.01.01, Aggregation: Profiling: Single source profiling), which may reveal other information, such as potential susceptibility to a particular disease (PA10.01, Sharing: Affording revelations). This would enable the attacker to discern something negative about the individual’s health status (PA11.01, Use: Implication).

Threat Validation#

As previously indicated, threat validation consists of two steps: mapping PANOPTIC attacks to relevant LINDDUN threats and mapping LINDDUN-validated attacks against the NIST PEOs of predictability, manageability, and disassociability. If a PANOPTIC attack does not align with one or more LINDDUN threats or if an aligned attack does not appear to undermine at least one of the PEOs, then the threat is invalid and removed from further consideration during this modeling process iteration.

Validation of PANOPTIC attacks against LINDDUN threats amounts to assessing the relationship between the threat actions that constitute the attack and the relevant LINDDUN threats. In most cases, that relationship is many-to-many. Therefore, carrying out this assessment involves judgement informed by the surrounding context. To facilitate this determination, Appendix G includes a mapping between PANOPTIC threat actions and LINDDUN threats in both directions. Because such mappings exist in all cases, the mere existence of a potentially relevant LINDDUN threat is insufficient validation.

For attacks aligned with LINDDUN threats, validation against the PEOs serves to confirm that the attacks actually met the definition of a threat put forward in Section 1.4 by potentially undermining system predictability, manageability, and/or disassociability. Some attacks may impact more than one PEO, but a validated attack must impact at least one.

Table 15 lists the validation results for the five attack scenarios relevant to the core example from Table 14. These were extracted from the complete combined validation table found in Appendix G. This table documents the LINDDUN Analysis and PEOs impacted by the threat, aligned to the Attack Number, the Scenario ID, PANOPTIC Threat Action, and LINDDUN Threat.

Table 15. Core Example Attack Validations#
Attack Number	Scenario ID	PANOPTIC Threat Action	LINDDUN Threat	LINDDUN Analysis	Impacted PEOs
1	S1.1	PA03.09, PA03.11, PA08.02.01, PA10.01, PA11.01	L.2.1.2	Sending the group of X samples together to the freezers around the same time as a project known to be doing Y disease research could link the samples to Y disease	Predictability
2	S1.2	PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01	L.2.1.2	Samples that are put into the LIMS around the same time could receive IDs with linkable characteristics, which then allows linkage of the sample group to a study around the same time, unless the LIMS implements measures to prevent this	Predictability
3	S1.2	PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01	L.2.1.2	Samples that are put into the cluster filesystem around the same time could be interpreted as being linked to a study about Y disease around the same time	Predictability
4	S1.2	PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01	L.2.1.2	Samples sent to the compute nodes around the same time could be interpreted as being linked to a study about Y disease around the same time	Predictability
5	S1.2	PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01	L.2.1.2	Samples that are put into the data delivery DMZ around the same time could be interpreted as being linked to a study about Y disease around the same time	Predictability
14	S1.1	PA03.09, PA03.11, PA08.01.01, PA10.01, PA11.01	L.2.2.1	Sending samples to the technician known to be researching a specific disease could link the samples to that disease	Predictability Disassociability
15	S1.1	PA03.09, PA03.11, PA08.01.01, PA10.01, PA11.01	L.2.2.1	Sending samples to the wet lab known to be researching a specific disease at that time could link the samples to that disease	Predictability Disassociability
26	S6	PA05.01.01	I.2.1.1	Nature of genomic data makes complete disassociability impossible to guarantee	Predictability Disassociability
55	S6	PA03.09, PA09.01.01, PA09.01.03, PA09.01.04, PA11.01	DD.4.1.2	Bioinformatics tools come from a variety of developers that can change over time; corruption within this supply chain, especially if left unmonitored, could result in research subject data being disclosed	Predictability
65	S17	PA02.02, PA07.05	U.1.1	Data subject does not clearly understand what data actions that analysis tools along the pipeline will perform on their data	Predictability Manageability

To understand the validation process, consider attack number 14 as a specific example from Table 15. The PANOPTIC threat actions and sub-actions that make up the attack map to the LINDDUN threat types of Linking, Non-repudiation, Detecting, and Data Disclosure. (Definitions of these are provided in Appendix C.) Neither Non-repudiation nor Detecting is relevant to this scenario and can be dropped from consideration. By sorting the dataflow analysis table (Table 12) on the LINDDUN threat designators it is then possible to review the dataflows related to Linking and Data Disclosure. Matching scenario components are then identified by sorting on the Dataflow column to group those entries involving physical samples. [15] The dataflow analysis for the core example contains multiple instances involving physical samples susceptible to threat L2.2.1, profiling an individual. This validates attack 14 against the LINDDUN analysis. Based on both the LINDDUN threat and the PANOPTIC threat actions (profiling and revelation in particular), attack 14 clearly undermines predictability as well as disassociability, validating it against the PEOs. Therefore, we can conclude that this is a valid threat.

As Table 15 indicates, all PANOPTIC attacks were successfully validated against LINDDUN threats and the LINDDUN-supported attacks validated against the PEOs. As a result, all the threats are candidates for responses.

Question 2: “What could go wrong?”

Contents

Question 2: “What could go wrong?”#

LINDDUN Analysis#

PANOPTIC Analysis#

Threat Validation#