semantic queries on datasets

LabKey Support Forum (Inactive)
semantic queries on datasets Anthony Corbett  2012-11-29 10:03
Status: Closed
 

Wanted to get feedback from labkey folks or others on the problem of identifying semantically equivalent data elements across different datasets.

Problem:

When designing datasets for a clinical study it is important to capture a clinical measure, body temperature for example, in the same dataset column across all visits. If you don't have this foresight the charting capabilities do not see the same clinical measure in different datasets as the same and won't be charted as the same contiguous measure across visits.

For example, body temperature is collected across 4 visits, baseline (0), visit 1, 2, 3. Say that the datasets are setup in a way that visits 1,2,3 all capture similar data so they all insert into the same dataset, 'Study Visit'. However, the baseline visit captures additional measures with different validation requirements so it is warrented to put this visit's data into a different dataset, 'Baseline'. It is now impossible to chart body temperature as a measure across all visits because it lives in two datasets and Labkey sees them as two separate measures.

*Note: This can be fixed by creating a custom UNION query for each measure, but that is a lot of custom work and is not easy adaptable to changes in schema.


I've seen the conceptURI in PropertyDescriptor and I'm wondering if there is any effort or plan to do the follow:

1. Expose that as part of the dataset designer UI so that data elements can be annotated with terms, say from SNOMED.

2. Provide a Semantic Query Service to find all data elements annotated with a certain term in any dataset.


This would provide great flexibility in how the datasets are defined (for clinical eCRF reasons or dataset security reasons) and how data elements are used later in analysis in the Labkey charting/export functionality.

Thanks!

-
Anthony Corbett
 
 
Matthew Bellew responded:  2012-11-29 10:42
Hi Anthony,

Great question. ConceptURI was added for exactly the purpose you describe. However, it never made it's way as a fully useful feature, as most customers were more interested in 'annotation' in principle than in practice.

We don't have any hard plans to move this area forward, however, there are two (now three) projects that have recently expressed interest. I'd be very interested in knowing more about your requirements

* Which ontologies you are interested in supporting (SNOMED, custom, etc)?
* Do you want to annotate columns types (sounds like yes) or choose subsets of ontology values as legal column values (say medicine codes)?
* What about UMLS?

Matt
 
Anthony Corbett responded:  2012-11-29 11:01

I would like to annotate column types for my current use cases. But I think being able to restrict valid column values to a pick list backed by ontology terms would be very useful too.

Instead of importing static version of certain vocabularies and maintaining mappings is it possible to integrate with something like NCBO's BioPortal? They provide a REST API to many of the common vocabulaties/ontologies.
 
Anthony Corbett responded:  2012-11-29 11:02
HEre is a few links:

http://bioportal.bioontology.org/ontologies

http://www.bioontology.org/wiki/index.php/Using_NCBO_Technology_In_Your_Project


Here is an example project that has incorporated BioPortal APIs into annotating google spreadsheets:

http://isatools.wordpress.com/2012/07/13/introducing-ontomaton-ontology-search-tagging-for-google-spreadsheets/


Would something similar be appropriate for choosing a conceptURI for dataset properties?