Table of Contents |
guest 2025-05-28 |
Ontologies help research scientists in many specialties reconcile data with common and controlled vocabularies. Aligning terms, hierarchies, and the meaning of specific data columns and values is important for consistent analysis, reporting, and cross-organization integration. An ontology system will standardize concepts, understand their hierarchies, and align synonyms which describe the same entities. You can think of an ontology like a language; when your data is all speaking the same language, greater meaning emerges.
The ontology module enables reporting and using controlled vocabularies. Many such vocabularies are in active use in different research communities. Some examples:
The first step is to generate an ontology archive that can be loaded into LabKey. A set of python scripts is provided on GitHub to help you accomplish this.
The python script is designed to accept OWL, NCI, and GO files, but support is not universal for all files of these types. Contact your Account Manager to get started and for help with individual files.Once generated, the archive will contain individual text files:
To learn about loading ontologies onto your LabKey Server and enabling their use in folders, see this topic: Load Ontologies
Once an ontology has been loaded and the module enabled in your folder, the field editor will include new options and you can use ontology information in your SQL queries.
Concept annotations let you link columns in your data to concepts in your ontology. The field editor will include an option to associate the column with a specific concept. For example, a "medication" column might map to a "NCIT:ST1000016" concept code used in a centralized invoicing system.
Learn more in this topic: Concept Annotations
The field editor also includes an Ontology Lookup data type. This field type encompasses three related fields, helping you take an input value and look up both the preferred term and the code used in the ontology.
Learn more in this topic: Ontology Lookup
New LabKey SQL syntax allows you to incorporate ontology hierarchy and synonyms into your queries.
Learn more in this topic: Ontology SQL
This topic covers how to load ontology vocabularies onto your server and enable their use in individual folders. It assumes you have obtained or generated an ontology archive in the expected format. You also must have the ontology module deployed on your server.
One or more ontology vocabularies can be loaded onto your server. Ontologies are stored in the "Shared" folder where they are accessible site-wide. You may load ontologies from any location on the server.
You will see the pipeline task status as the ontology is loaded. Depending on the size of the archive, this could take considerable time, and will continue in the background if you navigate away from this page.
Once the upload is complete, return to > Go To Module > Ontology (you may need to click "More Modules" to find it).
In the Ontologies module in the /Shared project, you will see all available ontologies on the list.
To use the controlled vocabularies in your data, enable the Ontology module in the folders where you want to be able to use them.
Once ontologies have been loaded and enabled in your folder, you can use Concept Annotations to link fields in your data with their concepts in the ontology vocabulary. A "concept picker" interface makes it easy for users to find desired annotations.
Reach the grid of ontologies available by selecting > Go To Module > More Modules > Ontology.
Instead of manually scrolling and expanding the ontology hierarchy, you can type into the search box to immediately locate and jump to concepts containing that term. The search is specific to the current ontology; you will not see results from other ontologies.
As soon as you have typed a term of at least three characters, the search results will populate in a clickable dropdown. Only full word matches are included. You'll see both concepts and their codes. Click to see the details for any search result. Note that search results will disappear if you move the cursor (focus) outside the search box, but will return when you focus there again.
Search terms will not autocomplete any suggestions as you type or detect any 'stem' words, i.e. searching for "foot" will not find "feet".
When you click Show Path you will see the hierarchy that leads to your current selection.
Click the Path Information for a more complete picture of the same concept, including any Alternate Paths that may exist to the selection.
In the data grid, hovering over a column header will now show the Concept Annotation set for this field.
To change the concept annotation for a field, reopen the field in the field editor, click Concept Annotation, make a different selection, and click Apply.
Ontology lookup columns support filtering based on concept and path within the ontology. Filtering based on whether a given concept is in an expected subtree (or not in an unexpected one) can isolate desired data using knowledge of the concept hierarchy.
Fields of type "Ontology Lookup" can be filtered using the following set of filtering expressions:
To use the ontology tree browser, click the header for a column of type "Ontology Lookup" and select Filter.... You cannot use this filtering on the "import" or "label" fields related to your lookup, only the "code" value, shown here as "Medication Code".
Select the Choose Filters tab, then select a concept using the Find <Column Name> By Tree link.
The browser is similar to the concept browser. You can scroll or type into the "Search" bar to find the concept you want. Click the concept to see it within the hierarchy, with parent and first children expanded.
When you locate the concept you want, hover to reveal a filter icon. Click it to place the concept, with code, in the filter box. when using the subtree filter expressions you'll see the path to the selected concept. Shown below, we'll filter for concepts in the subtree under "Analgesic Agent".
Click Close Browser when finished. If needed, you can add another filter before saving by clicking OK.
The example above shows how a subtree filter value is displayed. Notice the slashes indicating the hierarchy path to the selected concept.
When using the Equals or Does Not Equal filtering expressions, browse the tree as above and click the filter icon. The code will be shown in the box.
The filter expressions Equals One Of and Does Not Equal Any Of support multiselection of as many ontology concepts as necessary. Click the filter icons to build up a set in the box, the appropriate separator will be inserted.
Once an ontology has been loaded and the module enabled in your folder, the field editor will include an Ontology Lookup data type for all column types. Any column may be defined to use the standard vocabularies available in a selected ontology source.
There can be up to three related fields in the data structure. The naming does not need to match these conventions, but it can be helpful to clarify which columns contain which elements:
On data import (or update), the following ontology lookup actions will occur:
For example, if you had a dataset containing a disease diagnosis, and wanted to map to the preferred label and code values from a loaded ontology, you could follow these steps:
Field Name | Data Type | |
---|---|---|
Disease_import | Text | The term imported for this disease, which might be a synonym or could be the preferred term. |
Disease_label | Text | The standard term retrieved from the ontology. |
Disease_code | Ontology Lookup | Will display the standard code from the ontology and provides the interconnection |
When data is imported, it can include either the "*_import" field or the "*_code" field, but not both. When using the "*_import" field, if different terms for the same disease are imported, the preferred term and code fields will standardize them for you. Control which columns are visible using the grid customizer.
Shown here, a variety of methods for entering a COVID-19 diagnosis were provided, including the preferred term for patient PT-105 and the code number itself for PT-104. All rows can be easily grouped and aggregated using the label and code columns. The reported "*_import" value is retained for reference.
When an Ontology Lookup field, i.e. the "*_code" field, is included in the insert or update form, you will see a "Search [Ontology Name]" placeholder. Type ahead (at least three characters, and full words for narrower lists) to quickly browse for codes that match a desired term. You'll see both the terms and codes in a dropdown menu. Click to select the desired concept from the list.
You'll see the code in the entry box and a tooltip with the preferred label on the right.
Alternately, you can use the Find [column name] By Tree link to browse the full Ontology to find a desired code. Shown below, "Medication Code" is the Ontology Lookup, so the link to click reads Find Medication Code By Tree.
Use the search bar or scroll to find the concept to insert. As you browse the ontology concepts you can see the paths and synonyms for a selected concept to help you make the correct selection. If the field has been initialized to an expected concept, the browser will auto scroll to it.
Click Apply to make your selection. You'll see the code in the entry box and a tooltip with the preferred label on the right as in the typeahead case above.
For the Ontology Lookup field, you choose an Ontology to reference, plus optional Import Field, and Label Field. In addition you can initialize the lookup field with an Expected Vocabulary making it easier for users to enter the expected value(s).
Once ontologies are loaded and enabled, you can also make direct use of them in SQL queries. The syntax described in this topic helps you access the preferred vocabularies and wealth of meaning contained in your ontologies.
The following functions and annotations are available:
Usage:
IsSubClassOf(conceptX, conceptParent)
Usage:
IsInSubtree(conceptX, ConceptPath(conceptA,conceptB,conceptParent))
Usage:
ConceptPath(conceptA,conceptB,...,conceptParent)
For performance we store all possible paths in the “subclass” hierarchy to create a pure tree, rather than a graph of subclass relations. This makes it much easier to answer questions like select all rows containing a ‘cancer finding’. This schema means that internally we are really querying the relationship between paths, not directly querying concepts. Therefore ConceptIsSubClass() is more complicated to answer than ConceptIsInSubtree().
The @concept annotation can be used to override the metadata of the column with a concept annotation.
Usage:
SELECT 'Something' as "Finding" @concept=C3367 FROM T WHERE ...
To find a column in a given table by concept, conceptURI, name, propertyuri, or obsolete name , use findColumn on your table:
table.findColumn([columnProperties])
For example, if you've annotated a column with the concept coded "ONT:123", use the following to return the column with that concept:
SELECT
MyTable.findColumn(@concept='ONT:123')
FROM Project.MyStudy.study.MyTable
In these examples, we use a fictional ontology nicknamed "ONT". A value like "ONT:123" might be one of the codes in the ontology meaning "Pharma", for example. All SQL values are just string literals of the concept code, including a readable name in a comment in these examples is for clarity.
Here, "ONT:123" (Pharma) appears in the hierarchy tree once as a root term; "ONT:382" (Ibuprofen, for example) appears twice in the hierarchy 'below' Pharma, and has a further child: "ONT:350" (Oral form ibuprofen). Other codes are omitted for readability:
ONT:123 (Pharma) / biologic product / Analgesic / Anti-inflammatory preparations / Non-steroidal anti-inflammatory agent / ONT:382 (Ibuprofen) / ONT:350 (Oral form ibuprofen)
ONT:123 (Pharma) / biologic product / Analgesic / Non-opioid analgesics / Non-steroidal anti-inflammatory agent / ONT:382 (Ibuprofen) / ONT:350 (Oral form ibuprofen)
The two expressions below are not semantically the same, but return the same result. The second version is preferred because it ensures that there is only one path being evaluated and might be faster.
IsSubClassOf('ONT:382' /* Ibuprofen */, 'ONT:123' /* Pharma */)
IsInSubtree('ONT:382' /* Ibuprofen */, ConceptPath('ONT:123' /* Pharma */)
The next two expressions do not return the same result. The first works as expected:
IsSubClassOf('ONT:350' /* Oral form ibuprofen */, 'ONT:382' /* Ibuprofen */)
IsInSubtree('ONT:350' /* Oral form ibuprofen */, ConceptPath('ONT:382' /* Ibuprofen */)
Since there is not a unique concept path containing 'ONT:382' (Ibuprofen), the value of ConceptPath('ONT:382' /* Ibuprofen */) is NULL in the second row. Instead, the following expression would work as expected, clarifying which "branch" of the path to use:
IsInSubtree('ONT:350' /* Oral form ibuprofen */, ConceptPath('ONT:322' /* Non-opioid analgesics */, 'ONT:164' /* Non-steroidal anti-inflammatory agent */, 'ONT:382' /* Ibuprofen */)