A single participant can be known by different names in different research contexts. One lab might study a given subject using the name "LK002-234001", whereas another lab might study the very same subject knowing it only by the name "Primate 44". It is often necessary (and even desirable) to let the different providers of data use their own names, rather than try to standardize them at the various sources. LabKey Server can align aliases for a given subject and control which alias is used for which research audience.
When data is imported, a provided alias map will "translate" the aliases into the common participant ID for the study. In our example, the two labs might import data for "LK002-234001" and "Primate 44", with the study researcher seeing all these rows under the participant ID "PT-101". It is important to note that there is no requirement that all data is imported using one of the aliases you provide.
Overview
LabKey Server's aliasing system works by internally mapping different aliases to a central participant id, while externally preserving the aliases known to the different data sources. This allows for:
- Merging records with different ids for the same study subject
- Consolidating search results around a central id
- Retaining data as originally provided by a source lab
Provide Participant Alias Mapping Dataset
To set up alias ids, point to a dataset that contains as many aliases as needed for each participant ID. The participantID column contains the 'central/study' IDs, another column contains the aliases, and a third column identifies the source organizations that use those aliases.
- Create a dataset containing the alias and source organization information. See below for an example.
- Select (Admin) > Manage Study > Manage Alternate Participant IDs and Aliases.
- Under Participant Aliases, select your Dataset Containing Aliases.
- Select the Alias Column.
- Select the Source Column, i.e. the source organization using that alias.
- Click Save Changes.
- Click Done.
Once an alias has been defined for a given participant, an automatic name translation is performed on any imported data that contains that alias. For example, if participant "PT-101" has a defined alias "Primate 44", then any data containing a reference to "Primate 44" will be imported as "PT-101".
Once you have an alias dataset in place, you can add more records to it manually (by inserting new rows) or in bulk, either directly from the grid or by returning to the Participant Aliases page and clicking
Import Aliases.
To clear all alias mappings, but leave the alias dataset itself in place, click
Clear All Alias Settings.
Example Alias Map
An example alias mapping file is shown below. Because it is a study dataset, the file must always contain a date (or visit) column for internal alignment. Note that while these date/visit columns are not used for applying aliases, you must have unique rows for each participantID and date. One way to manage this is to have a unique date for each source organization, such as is shown here:
ParticipantId | Aliases | SourceOrganization | Date |
---|
PT-101 | Primate 44 | ABC Labs | 10/10/2010 |
PT-101 | LK002-234001 | Research Center A | 11/11/2011 |
PT-102 | Primate 45 | ABC Labs | 10/10/2010 |
PT-103 | Primate 46 | ABC Labs | 10/10/2010 |
Another way to manage row uniqueness is to use a third dataset key. For example, if there might be many aliases from a given source organization, you could use the aliases column itself as the third key to ensure uniqueness. If an alias value appears twice in the aliases column, you will get an error when you try to import data using that alias.
Import Data
Once the map is in place, you can import data as usual, either in bulk, by single row, or using an external reloading mechanism. Your data can use the 'common' participant IDs themselves (PT-### in this example) or any of the aliases. Any data added under one of the aliases in the map will be associated with the translated 'common' participantID. This means you will only see rows for "PT-###" when importing data for the aliases shown in our example.
If any data is imported using a participant ID that is not in the alias map, a new participant with the 'untranslated' alias will be created. Note that if you later add this alias to the map, the participant ID will not be automatically updated to use the 'translated' ID. See below for details about resolving these or other naming conflicts.
Resolve Naming Conflicts
If incoming data contains an ID that is already used for a participant in the study AND is provided as an alias in the mapping table, the system will raise an error so that an administrator can correctly resolve the conflict.
For example, if you were using our example map shown above, and imported some data for "Primate 47", you would have the following participant IDs in your study: "PT-101, PT-102, PT-103, Primate 47". If you later added to the mapping "PT-104" with the alias "Primate 47", and then imported data for that participant "Primate 47", you would see the error:
There is a collision, the alias Primate-47 already exists as a Participant. You must remove either the alias or the participant.
The easiest way to reconcile records for participants under 'untranslated' aliases, is to
manually Change ParticipantID of the one with the untranslated alias to be the intended shared ID. This action will "merge" the data for the untranslated alias into the (possibly empty) new participant ID for the translated one.
View Original Ids or Aliases Provided
Note that clearing the alias settings in a study does not revert the participant ids back to their original, non-aliased values. Alias participant ids are determined at import time and are written into the imported dataset. But you can add a column to your datasets that shows the original ParticiantIds, which can be useful for displaying the dataset to the source organization that knows the participant by the non-aliased id. To display the original ParticipantId values, add the
Aliases column to the imported dataset as follows:
- Navigate to the dataset where you want to display the original, non-alias ParticipantIds.
- Select (Grid Views) > Customize Grid.
- Expand the ParticipantID node.
- Check the box for Aliases.
- Expand the Linked IDs node and you will see a checkbox for each source organization. Check one (or more) to add the alias that specific source used to your grid.
- Save the grid view, either as the default grid view, or as a named view.
- The dataset now displays both the alias id(s) and the original id.
Related Topics