Data exported from LabKey Server can be protected by:
- Randomizing participant ids so that the original participant ids are obscured.
- Shifting date values, such as clinic visits and specimen draw dates. (Note that dates are shifted per participant, leaving their relative relationships as a series intact, thereby retaining much of the scientific value of the data.)
- Holding back data that has been marked as a certain level of PHI (Protected Health Information).
In this this step we will export data out of the study, modifying and obscuring it in the ways described above.
Examine Study Data
First look at the data to be exported.
- Navigate to the Security Tutorial > Study folder.
- Click the Clinical and Assay Data tab. This tab shows the individual datasets in the study. There are currently two datasets: "Participants" and "Physical Exam".
- Click Physical Exam. Notice that the participant ids are 6 digit numbers, including "110349". When we export this table, we will randomize these ids, obscuring the identity of the subjects of the study.
- Return to the Clinical and Assay Data tab.
- Click Participants (the dataset, not the tab). Notice the dates in the table are almost all from April and May, 2008. When we export this table, we will randomly shift these dates, to obscure when subject data was actually collected.
- Notice the columns for Gender and Country. We will mark these as "Full PHI" and "Limited PHI" respectively, so that we can publish results without them. (Given that there is exactly one male patient from Germany in our sample, he would be easy to identify with only this information.)
Mark PHI Columns
We will mark two columns, "Gender" and "Country" as containing different levels of PHI. This gives us the opportunity to control when they are included in export. If we are exporting for users who are not granted access to any
PHI, the export would not include the contents of either of these columns. If we exported for users with access to "Limited PHI", the export would include the contents of the column marked that way, but not the one marked "Full PHI."
- Click the Manage tab. Click Manage Datasets.
- Click Participants (the dataset, not the tab) and then Edit Definition.
- Under Dataset Fields select Gender.
- Click the Advanced tab.
- As PHI Level, select "Full PHI" for this field.
- Repeat for the Country field, selecting "Limited PHI".
Set up Alternate Participant IDs
Next we will configure how participant ids are handled on export, so that the ids are randomized using a given text and number pattern. Once alternate IDs are specified, they are maintained internally so that different exports and publications from the same study will contain matching alternates.
- Click the Manage tab.
- Click Manage Alternate Participant IDs and Aliases.
- For Prefix, enter "ABC".
- Click Change Alternate IDs.
- Click OK to confirm: these alternate IDs will not match any previously used alternate IDs.
- Click Done.
Notice that you could also manually specify the alternate IDs to use by setting a table of participant aliases which map to a list you provide.
Export/Publish Anonymized Data
Now we are ready to export or publish this data, using the extra data protections in place.
The following procedure will "Publish" the study, meaning a new child folder will be created and selected data from the study will be randomized and copied to it.
- Return to the Manage tab.
- Scroll down and click Publish Study.
- Complete the wizard, selecting all participants, datasets, and timepoints in the study. For fields not mentioned here, enter anything you like.
- On the Publish Options panel, check the following options:
- Use Alternate Participant IDs
- Shift Participant Dates
- You could also check Mask Clinic Names which would protect any actual clinic names in the study by replacing them with a generic label "Clinic."
- Under Include PHI Columns, select "Not PHI". This means that all columns tagged "Limited PHI" or higher will be excluded.
- Click Finish.
- Wait for the publishing process to finish.
- Navigate to the new published study folder, a child folder under Study named New Study by default.
- On the Clinical and Assay Data tab, look at the published datasets Physical Exam and Participants. Notice how the participant ids and dates have been randomized. Notice that the Gender and Country fields have been held back (not been published).
If instead you selected "Limited PHI" as the level to include, you would have seen the "Country" column but not the "Gender" column.
Security for the New Folder
How should you configure the security on this new folder?
The answer depends on your requirements.
- If you want anyone with an account on your server to see this data, you would add All Site Users to the Reader role.
- If want only members of the study team to have access, you would add Study Group to the Reader role, or a higher role.
For details on the different roles that are available see Security Roles Reference