Question 2: “What could go wrong?”

Question 2: “What could go wrong?”#

At this point environmental and system context have been captured and the operational dataflows and actions have been documented. The analytical processes of genomic data threat modeling for privacy must now be applied to these descriptions. Those processes consist of two principal activities: (1) dataflow analysis to identify threats and (2) threat alignment and validation.

To address “what could go wrong,” the dataflow analysis (1) employed LINDDUN and its catalog of threat trees to associate potential genomic data threats regarding privacy with specific dataflows and actions, then (2) created PANOPTIC attack mappings for the genomic sequencing workflow. Both models are needed because the LINDDUN analysis identifies abstract threats that are theoretically possible while PANOPTIC identifies steps that could form a practical attack. Where practical attack and theoretical threat align, the combination is validated against the NIST PEOs. This exercise ensures that potential threats are both conceivable and executable, and that these would impact at least one of the NIST PEOs.

LINDDUN Analysis#

The LINDDUN [Ref2] methodology involves assessing each distinct dataflow for potential threats. A dataflow consists of a source, the flow itself, and a destination. To avoid confusion, we refer to this triad as a dataflow segment. Using the modified Assess System Design table in PRAM Worksheet 2, each dataflow segment in the core example DFD (Figure 4) is documented. In addition to the source, flow, and destination, the applicable data actions [13] are also noted. Each dataflow segment is also assigned a purpose (based on the Data Action Key) using the Context column.

For each documented dataflow segment, relevant LINDDUN threats are then identified, using as a starting point the mapping of segment-based high-level threat types, shown in Table 11. This mapping is a heuristic for determining potential LINDDUN threats and involved components. The LINDDUN threat trees[Ref2]_ that detail those threat types can then be used to determine whether and which more granular threats potentially apply to that segment based on its constituent elements and context. Those threats judged potentially applicable are captured in the LINDDUN Analysis column in Table 12, including the scenario. Note that multiple threats may apply to a single dataflow segment.

Table 11. LINDDUN Per Element Threat Mapping Heuristic#

Source (Src)

Destination (Dst)

L

I

NR

D

DD

U

NC

Process

Process

Src-flow-Dst

Src-flow-Dst

Src-flow-Dst

Src-flow

Src-flow-Dst

Src-Dst

Src-Dst

Process

Store

Src-flow-Dst

Src-flow-Dst

Src-flow-Dst

Src-flow

Src-flow-Dst

Src-Dst

Src-Dst

Process

External

Src-flow-Dst

Src-flow-Dst

Src-flow-Dst

Src-flow

Src-flow-Dst

Src-Dst

Src-Dst

Store

Process

Src-flow-Dst

Src-flow-Dst

Src-flow-Dst

Src-flow

Src-flow-Dst

Src-Dst

Src-Dst

External

Process

Src-flow-Dst

Src-flow-Dst

Src-flow-Dst

Src-flow

Src-flow-Dst

Src-Dst

Dst

To illustrate, consider dataflow segment Number 1 in Table 12. It consists of a receiving clerk delivering a physical biological sample to a lab technician for genomic sequencing. Leveraging the data action column helps us infer that this is a process-to-process segment. Consulting Table 11 and the threat definitions, as well as the context of the segment, we conclude that linking is the only relevant threat. Sending samples to a technician known to be associated with work on a particular disease could link the samples to that disease, an instance of L.2.2.1, profiling an individual. The other possibilities can be dismissed because at this stage:

  • The sample must still be associated with the direct data subject as part of the workflow

  • There is nothing for the direct data subject to repudiate, aside from providing the sample to those who must necessarily be aware that the sample has been provided

  • Because the sample must be identifiable, detection is unavoidable

  • The only data disclosure is inherent in the workflow and therefore unproblematic

  • The direct data subject has provided informed consent and is aware of their options

  • Standard practices are being employed in the workflow

The process proceeds similarly for the remaining eight dataflow segments, resulting in Table 12. Note that a segment can be subject to more than one threat, as is the case for segment 8.

Table 12. LINDDUN Dataflow Analysis for the Core Example#

No.

Source

Dataflow Type

Data Action 1

Data Action 2

Destination

Context (purpose)

LINDDUN Analysis (applicable threats)

1

Receiving Clerk (S1-PH)

Physical Sample

Transfer

Lab Tech (S2-A)

Send physical sample to lab tech for research project

L.2.2.1

Sending samples to wet lab known to be researching a specific disease at that time could link samples to that disease

2

Lab Tech (S2-A)

Physical Sample

Transfer

Wet Lab (S3-PH)

Send physical sample to wet lab for sequencing

L.2.2.1

Sending samples to wet lab known to be researching a specific disease at that time could link samples to that disease

3

Wet Lab (S3-PH)

Physical Sample

Transfer

Retention

Physical Sample Storage (S11-PH)

Send physical sample for storage in appropriate freezers

L.2.1.2

Sending group of X samples together to freezers around the same time as a project known to be doing Y disease research could link the samples to Y disease

4

Wet Lab (S3-PH)

Sample Metadata

Generation

Retention

LIMS (S4-PH)

Generate pseudonymized ID to be used for sample

I.2.1.1

Nature of genomic data makes complete disassociability impossible to guarantee

5

LIMS (S4-PH)

Sample Metadata

Transfer

Wet Lab (S3-PH)

Send back to wet lab the pseudonymized ID to be used for sample

L.2.1.2

Samples put into LIMS around same time could receive IDs with linkable characteristics, which then allows linkage of sample group to a study around same time, unless LIMS is cautious of this

6

Wet Lab (S3-PH)

Sequence Data

Transfer

Retention

Cluster Filesystem (S6-A)

Send digital sequence data to be stored

L.2.1.2

Samples that are put into the cluster filesystem around the same time could be interpreted as being linked to a study about Y disease around the same time

7

Cluster Filesystem (S6-A)

Sequence Data

Transfer

Compute Nodes (S5-A)

Send digital sequence data to Compute Nodes to operate on digital sequence data to transform it into objective-specific data

L.2.1.2

Samples sent to compute nodes around same time could be interpreted as being linked to a study about Y disease around same time

8

Compute Nodes (S5-A)

Sequence Data, Context-relevant Research Data

Transformation

Cluster Filesystem (S6-A)

Operate on sequence data to create context-relevant research data

DD.4.1.2

Bioinformatics tools come from a variety of developers that can change over time; corruption within this supply chain, especially if left unmonitored, could result in research subject data being disclosed

U.1.1

Data subject does not clearly understand what data actions that analysis tools along the pipeline will perform on their data

9

Cluster Filesystem (S6-A)

Context-relevant Research Data

Transfer

Data Delivery DMZ (S13-A)

Send generated context-relevant research data to data delivery DMZ for to make it available for delivery

L.2.1.2

Samples that are put into the data delivery DMZ around the same time could be interpreted as being linked to a study about Y disease around the same time

The complete LINDDUN analysis can be found in Appendix E. Note that for manageability the analysis was initially divided into clinical, research, and shared use cases, the last based on the common portion of the two use cases. The results were then combined into a single system design table. This table was then sorted on the specific LINDDUN threats.

PANOPTIC Analysis#

The LINDDUN analysis identifies potential threats at the level of dataflows. However, real-world privacy attacks are not typically launched at that level, nor do they consist of a single self-contained element. They are less abstract and operate at the system level. The PANOPTIC analysis is a necessary complement to the LINDDUN analysis as it will describe potential threats from a system perspective. The LINDDUN analysis is then used to determine whether the threats identified at the dataflow level support the projected attacks as described by PANOPTIC. If not, the PANOPTIC attacks are considered non-actionable.

While the LINDDUN analysis is grounded in system specifics as captured by DFDs, the PANOPTIC analysis involves actively imagining in practical terms what might take place. Utilizing the PANOPTIC Privacy Activities mapping template, a privacy attack mapping for the core example was generated. Table 13 lists the threat actions identified for the core example based on high-level knowledge of the system and its context. The complete PANOPTIC mappings for the clinical and research use cases are provided in Appendix E.

Table 13. Threat Actions Identified by the PANOPTIC Privacy Activity Mapping for the Core Example#

PANOPTIC Threat Action

Definition

Elaboration

PA02.02 Consent: Imprecise

Key data actions are not presented clearly enough to constitute informed consent

May not provide details on how research is conducted, and which parts of the pipeline are privacy-relevant

PA03.09 Collection: Recording

Capturing a physical or digital artifact representing an aspect or likeness of the data subject

PA03.11 Collection: Biological sample

Collecting biological materials or specimens (e.g., blood, urine, tissue cells, or saliva) from the data subject

PA05.01.01 Identification: Re-identification

Re-associating data with the data subject that had been treated to remove those associations

PA05.02.02 Identification: Pseudo-identifier

Assigning a pseudo-identifier (e.g., randomly generated ID)

PA07.01 Manageability: No individual access to information

The data subject or their proxy cannot obtain or view their collected personal data

PA07.02 Manageability: No individual management of information content

The data subject or their proxy cannot transform (e.g., move, copy, edit) their collected personal data

Direct data subject cannot change their data that is used for research

PA07.03 Manageability: No individual deletion of information

The data subject or their proxy cannot delete their collected personal data

Once the research data is published, the direct data subject cannot remove theirs from the body of research

PA07.05 No individual control of information use

The data subject or their proxy cannot control how their information is used

Direct data subject cannot manage what types of research studies use their data

PA08.01.01 Aggregation: Single source profiling

Assembling and organizing data points about specific data subjects from a single source

The research project must determine whether or not a given direct data subject exhibits the trait being studied, implying profiling with the single source being their provided sample

PA08.02.01 Aggregation: Single source clustering

Assembling and organizing data points regarding groups of people from a single source

Research studies may look for commonalities across genomic samples

PA08.02.02 Aggregation: Multi-source clustering

Assembling and organizing data points regarding groups of people from multiple sources

Research studies may seek insights on a specific population potentially characterized along multiple dimensions, implying clustering

PA09.01.01 Processing: Deriving information about individuals

Determining or extracting novel information about the data subject by analyzing information

Research project must determine if the trait being studied is exhibited by the data subject

PA09.01.02 Processing: Deriving aggregate information

Determining or extracting novel aggregate information by analyzing information

Research project may seek insights about a given population regarding a genetic trait

PA09.01.03 Processing: Deriving sensitive information

Determining or extracting novel sensitive information by analyzing information

Genetic information and insights gained can be sensitive information

PA09.01.04 Processing: Deriving derogatory information

Determining or extracting novel derogatory information by analyzing information

Genetic diseases or susceptibility to them can be considered derogatory information

PA09.03 Processing: Introducing bias

Data action is adversely influenced by bias

Bias could be introduced into research projects if the demographic spread of the data pool is not balanced. (This may not be possible for some studies, such as one targeting a trait only present in a specific population.)

PA10.01 Sharing: Affording revelations

Making available information that enables the discovery of further information

A research project that a direct data subject joins may yield results now or in the future, including the relevance of the research topic for the data subject

PA11.01 Use: Implication

Establishing a particularized derogatory suspicion or accusation regarding the data subject

PA12.01 Retention & destruction: Data not destroyed after use

Information has not been disposed at the conclusion of its life cycle

May be indeterminate for research data

PA12.02 Retention & destruction: Data improperly destroyed

Information remains at least partially recoverable despite attempts to destroy it

Flow cell insufficiently cleaned and sequencer supply chain not cleaning hard drives

Table 14 describes five attack scenarios that are specific to the core example. Each scenario was determined by considering how specific threat actions could be used by an actor as part of an attack involving a distinct DFD segment. Since attacks could apply to different DFD segments, the table in some cases associates multiple identical attacks with the same scenario. Appendix F provides the comprehensive analysis that was performed on the complete example, which includes all the Attack Numbers and Scenario IDs. Table 14 extracts only the attack scenarios relevant to the core example, aligning with the Attack Numbers, Scenario IDs, and Privacy Threat Actions from the comprehensive analysis found in Appendix F.

Table 14. Attack Scenarios Relevant to the Core Example#

Attack Numbers from Complete Example

Scenario ID

PANOPTIC Threat Actions Describing the Attack

Scenario Description

1, 14, 15

S1.1

PA03.09, PA03.11, PA08.01.01, PA10.01, PA11.01

Pipeline actor uses physical access to correlate study details with physical samples and associated metadata.

2-5

S1.2

PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01

Pipeline actor uses physical access to correlate study details with digital data.

26

S6

PA05.01.01

Pipeline actor uses digital access to correlate study details with digital data.

55

S6

PA03.09, PA09.01.01, PA09.01.03, PA09.01.04, PA11.01

Pipeline actor uses digital access to correlate study details with digital data.

65

S17

PA02.02, PA07.05

Sequencing service staff utilizes third party tools and software that may perform additional data actions unbeknownst to a direct data subject. [14]

In the first scenario described in Table 14, attack numbers 1, 14, and 15, which constitute health status inference attacks, can be broken down as follows: The attack involves an actor with a role in the sequencing pipeline physically accessing artifacts relating to direct data subjects (PA03.09, Collection: Recording) in the form of biological samples (PA03.11) and their associated metadata (as per PC05). The actor can correlate the research studies that will use these samples with the samples and their metadata (PA08.01.01, Aggregation: Profiling: Single source profiling), which may reveal other information, such as potential susceptibility to a particular disease (PA10.01, Sharing: Affording revelations). This would enable the attacker to discern something negative about the individual’s health status (PA11.01, Use: Implication).

Threat Validation#

As previously indicated, threat validation consists of two steps: mapping PANOPTIC attacks to relevant LINDDUN threats and mapping LINDDUN-validated attacks against the NIST PEOs of predictability, manageability, and disassociability. If a PANOPTIC attack does not align with one or more LINDDUN threats or if an aligned attack does not appear to undermine at least one of the PEOs, then the threat is invalid and removed from further consideration during this modeling process iteration.

Validation of PANOPTIC attacks against LINDDUN threats amounts to assessing the relationship between the threat actions that constitute the attack and the relevant LINDDUN threats. In most cases, that relationship is many-to-many. Therefore, carrying out this assessment involves judgement informed by the surrounding context. To facilitate this determination, Appendix G includes a mapping between PANOPTIC threat actions and LINDDUN threats in both directions. Because such mappings exist in all cases, the mere existence of a potentially relevant LINDDUN threat is insufficient validation.

For attacks aligned with LINDDUN threats, validation against the PEOs serves to confirm that the attacks actually met the definition of a threat put forward in Section 1.4 by potentially undermining system predictability, manageability, and/or disassociability. Some attacks may impact more than one PEO, but a validated attack must impact at least one.

Table 15 lists the validation results for the five attack scenarios relevant to the core example from Table 14. These were extracted from the complete combined validation table found in Appendix G. This table documents the LINDDUN Analysis and PEOs impacted by the threat, aligned to the Attack Number, the Scenario ID, PANOPTIC Threat Action, and LINDDUN Threat.

Table 15. Core Example Attack Validations#

Attack Number

Scenario ID

PANOPTIC Threat Action

LINDDUN Threat

LINDDUN Analysis

Impacted PEOs

1

S1.1

PA03.09, PA03.11, PA08.02.01, PA10.01, PA11.01

L.2.1.2

Sending the group of X samples together to the freezers around the same time as a project known to be doing Y disease research could link the samples to Y disease

Predictability

2

S1.2

PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01

L.2.1.2

Samples that are put into the LIMS around the same time could receive IDs with linkable characteristics, which then allows linkage of the sample group to a study around the same time, unless the LIMS implements measures to prevent this

Predictability

3

S1.2

PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01

L.2.1.2

Samples that are put into the cluster filesystem around the same time could be interpreted as being linked to a study about Y disease around the same time

Predictability

4

S1.2

PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01

L.2.1.2

Samples sent to the compute nodes around the same time could be interpreted as being linked to a study about Y disease around the same time

Predictability

5

S1.2

PA03.09, PA05.02.02, PA08.02.02, PA10.01, PA11.01

L.2.1.2

Samples that are put into the data delivery DMZ around the same time could be interpreted as being linked to a study about Y disease around the same time

Predictability

14

S1.1

PA03.09, PA03.11, PA08.01.01, PA10.01, PA11.01

L.2.2.1

Sending samples to the technician known to be researching a specific disease could link the samples to that disease

Predictability

Disassociability

15

S1.1

PA03.09, PA03.11, PA08.01.01, PA10.01, PA11.01

L.2.2.1

Sending samples to the wet lab known to be researching a specific disease at that time could link the samples to that disease

Predictability

Disassociability

26

S6

PA05.01.01

I.2.1.1

Nature of genomic data makes complete disassociability impossible to guarantee

Predictability Disassociability

55

S6

PA03.09, PA09.01.01, PA09.01.03, PA09.01.04, PA11.01

DD.4.1.2

Bioinformatics tools come from a variety of developers that can change over time; corruption within this supply chain, especially if left unmonitored, could result in research subject data being disclosed

Predictability

65

S17

PA02.02, PA07.05

U.1.1

Data subject does not clearly understand what data actions that analysis tools along the pipeline will perform on their data

Predictability Manageability

To understand the validation process, consider attack number 14 as a specific example from Table 15. The PANOPTIC threat actions and sub-actions that make up the attack map to the LINDDUN threat types of Linking, Non-repudiation, Detecting, and Data Disclosure. (Definitions of these are provided in Appendix C.) Neither Non-repudiation nor Detecting is relevant to this scenario and can be dropped from consideration. By sorting the dataflow analysis table (Table 12) on the LINDDUN threat designators it is then possible to review the dataflows related to Linking and Data Disclosure. Matching scenario components are then identified by sorting on the Dataflow column to group those entries involving physical samples. [15] The dataflow analysis for the core example contains multiple instances involving physical samples susceptible to threat L2.2.1, profiling an individual. This validates attack 14 against the LINDDUN analysis. Based on both the LINDDUN threat and the PANOPTIC threat actions (profiling and revelation in particular), attack 14 clearly undermines predictability as well as disassociability, validating it against the PEOs. Therefore, we can conclude that this is a valid threat.

As Table 15 indicates, all PANOPTIC attacks were successfully validated against LINDDUN threats and the LINDDUN-supported attacks validated against the PEOs. As a result, all the threats are candidates for responses.