Ontologies help research scientists in many specialties reconcile data with common and controlled vocabularies. Aligning terms, hierarchies, and the meaning of specific data columns and values for consistent analysis, reporting, and cross-organization integration. An ontology system will standardize concepts
, understand their hierarchies
, and align synonyms
which describe the same entities. You can think of an ontology like a language; when your data is all speaking the same language, greater meaning emerges.
module enables reporting and using controlled vocabularies
. Many such vocabularies are in active use in different research communities. Some examples:
- LOINC (Logical Observation Identifiers Names and Codes): Codes for each test, measurement, or observation that has a clinically different meaning
- SNOMED_CT (Systemized Nomenclature of Medicine - Clinical Terms): A standardized multilingual vocabulary of clinical terminology used for electronic exchange of health information
- MSH (Medical Subject Headings): Used for indexing, cataloging, and searching for health related information
- NCI (NCI Thesaurus): A reference terminology and biomedical ontology
- Find more examples in the National Library of Medicine metathesaurus
Generate Ontology Archive
The first step is to generate an ontology archive that can be loaded into LabKey. A set of python scripts is provided with the module on GitHub
to help you accomplish this.
The archive will contain individual text files:
- concepts.txt: (Required) The preferred terms and their codes for this ontology. The standard vocabulary.
- hierarchy.txt: (Recommended) The hierarchy among these standard terms, expressed in levels and paths to the codes. This can be used to group related items.
- synonyms.txt: An expected set of local or reported terms that might be used in addition to the preferred term, mapping them to the code so that they are treated as the same concept.
One or more ontology vocabularies can be loaded onto your server. Ontologies are stored in the "Shared" folder where they are accessible site-wide.
Once the ontology
module has been deployed:
- Select (Admin) > Go To Module > More Modules > Ontology.
- You will see any ontologies already loaded.
- Click Add LabKey Archive (.Zip) to add a new one.
- Name: (Required)
- Abbreviation: (Required) This should be a short unique string used to identify this ontology. This value can't be changed later.
- Description: (Optional)
- Click Create.
- On the next page, use Browse or Choose File to locate your archive.
- Ontology zip archives include the files: concepts.txt, hierarchy.txt, synonyms.txt
You will see the pipeline task status as the ontology is loaded. Depending on the size of the archive, this could take considerable time, and will continue in the background if you navigate away from this page.
Once the upload is complete, return to (Admin) > Go To Module > Ontology
(you may need to click "More Modules" to find it).
- Note that if you access the Ontology module from a folder other than /Shared, you will see "defined in folder /SHARED" instead of the manage links shown below. Click /Shared in that message to go to the manage UI described in the next section.
In the Ontologies module in the /Shared
project, you will see all available ontologies on the list.
- Click Re-Import to upload an archive for the ontology in that row.
- Click Delete to remove this ontology.
- Click Browse Concepts to see details.
- Click Add LabKey Archive (.Zip) to add another.
- Click Browse Concepts to see the concepts, codes, and synonyms loaded.
- Click the name of the ontology archive or concept to expand it.
- Scroll to find terms of interest, click to expand them.
- Details about the selected item on the left are shown to the right.
To use the controlled vocabularies in your data, enable the ontologies module in the folders where you want to be able to use them.
- Navigate to the container where you want to use the ontology.
- Select (Admin) > Folder > Management and click the Folder Type tab.
- Check the box to enable the Ontology module.
- Click Update Folder.
Ontology Lookup Data Type
Once an ontology has been loaded and the module enabled in your folder, the field editor
will include an Ontology Lookup
data type for all column types. Any column may be set up to use the
standard vocabularies available in a selected ontology source. To do so, the column needs three fields in the data structure:
- columnName_import: The reported, or locally used term that was imported with the data.
- columnName_label: The preferred or standard term retrieved from the ontology.
- columnName_code: The code retrieved from the ontology.
The columnName_code value is configured as an Ontology Lookup
linking all three columns together.
On data import:
- If only a "columnName_import" value is provided, the label and code will be populated automatically if found in the ontology as a concept or synonym.
- If the provided "columnName_import" happens to match the "columnName_label", it is still provided as the "columnName_import" and the "columnName_label" will still be auto populated.
- If a "columnName_code" value is provided directly, the label will be populated from the ontology.
For example, if you had a dataset containing a disease diagnosis, and wanted to map to the preferred label and code values from a loaded ontology, you would follow these steps:
- Create three fields for your "columnName", in this case "Disease", with the field suffixes and types as follows:
|Field Name||Data Type|| |
|Disease_import||Text||The term provided for this disease, might vary.|
|Disease_label||Text||The standard term retrieved from the ontology.|
|Disease_code||Ontology Lookup||Will display the standard code from the ontology and provides the interconnection|
- Click the to expand the "columnName_code" field.
- Under Ontology Lookup Options:
- Choose an Ontology: The loaded ontologies are all available here. Select which to use for this field.
- Choose an Import Field: Select the "columnName_import" field you defined, here "Disease_import"
- Choose a Label Field: Select the "columnName_label" field you defined.
When data with different terms for the same disease is entered, the preferred term and code fields will let you standardize them. Control which columns are visible using the grid customizer.
Shown here, a variety of methods for entering a COVID-19 diagnosis were provided, including the preferred term for patient PT-105 and the code number itself for PT-104. All rows can be easily grouped and aggregated using the label and code columns. The "import" value provided is retained for reference.