When publishing a study, you can randomize or hide specified protected health information (PHI) in the data, to make it more difficult to identify the persons enrolled in the study. You can alter published data in the following ways:
  • Replace all participant IDs with alternate, randomly generated participant IDs.
  • Apply random date shifts/offsets.
  • Exclude columns marked as containing various levels of PHI (protected health information) from being copied to the published study.
  • Mask clinic names with a generic name to hide any identifying features in the original clinic name.

Publish Options

The wizard used to publish a study includes these options:

Use Alternate Participant IDs

Selecting this option replaces the participant IDs throughout the published data with alternate, randomly generated ids. The alternate id used for each participant is persisted in the source study and reused for each new published study. Admins can set the prefix and number of digits used in this alternate id if desired. See Alternate Participant IDs for details.

Shift Participant Dates

Selecting this option will shift published dates for associated participants by a random offset between 1 and 365 days. A separate offset is generated for each participant and that offset is used for all dates associated with that participant, unless they are excluded as described below. This obscures the exact dates, protecting potentially identifying details, but maintains the relative differences between them. Note that the date offset used for a given participant is persisted in the source study and reused for each new published study.

To exclude individual date/time columns from being randomly shifted on publication:

  • Go to the dataset that includes the date column.
  • Edit the dataset definition.
  • In the designer, select the date column, then the Advanced tab.
  • Check the box to Exclude From Shifting.
  • Click Save.

Exclude Columns at this PHI Level and Higher

Select Exclude Columns at this PHI Level and Higher to exclude all dataset, list, and specimen columns that are tagged at the PHI level you specify (or higher). Select the lowest PHI level to exclude, as shown in the screenshot below.

For example, if the study is published excluding PHI with the above selection of "Full PHI", then any column tagged at "Full PHI" or "Restricted" will be excluded from the published version of the study. The following table shows the results of each combination of column tagging and publishing options:

Exclude Columns at this PHI Level and Higher......Not Selected...choose "Limited PHI"...choose "Full PHI"...choose "Restricted"
Column is tagged as Not PHIPublishedPublishedPublishedPublished
Column is tagged as Limited PHIPublishedNot PublishedPublishedPublished
Column is tagged as Full PHIPublishedNot PublishedNot PublishedPublished
Column is tagged as RestrictedPublishedNot PublishedNot PublishedNot Published

To tag a column at a specific PHI level, see Field Properties Reference.

Mask Clinic Names

When this option is selected, actual clinic names will be replaced with a generic label. This helps prevent revealing neighborhood or other details that might identify individuals. For example, "South Brooklyn Youth Clinic" is masked with the generic value "Clinic".

All locations that are marked as a clinic type (including those marked with other types) will be masked in the published data. More precisely, both the Label and Labware Lab Code will be masked. Location types are specified by directly editing the labs.tsv file. For details see Manage Locations.

Related Topics

Discussion

previousnext
 
expand all collapse all