How is Study Data Stored in LabKey Server?: /Documentation/Archive/21.11

How is Study Data Stored in LabKey Server?

This topic answers a few questions about how study data is stored inside the server:

When you create a study dataset and import data to it, how is that data stored and managed in the system?
How is data from different studies related?
Is the data from different studies physically or logically separated?

Data Table within Schema

When you create a study dataset, LabKey creates a corresponding table in the main database to hold the data. This table is created in the studyDataset schema. For example if you create the table

"Demographics"

LabKey will create a corresponding table with a related name (container id string + Demographics), for example:

"c10d79_demographics"

Import Process

When users import data to the Demographics table two things occur:

The data is inserted into the studyDataset schema, namely into the corresponding database table "c10d79_demographics".
The data is inserted into the study schema, namely into an extensive number of bookkeeping and junction tables. These tables are the pre-existing, system tables of the study schema, designed to support the application logic and the built-in reports provided by a LabKey Study. (As such, users should treat these tables as "read-only", and not manipulate them directly using PGAdmn or some other db tool. Users should view, edit, and design datasets only via the official LabKey mechanisms such as the study web UI, domain designer, APIs, archive import, etc.) These system tables are not created by the user, but already exist as part of the study schema, for example:

Cohort - Contains one row per defined cohort.
DataSetColumns - Metadata table containing one row of metadata for each column in all study datasets.
DataSets - Contains one row of metadata per study dataset.
Participant - Contains one row per subject.
ParticipantVisit - Contains one row per subject/visit combination in the study. Note that this table is populated dynamically as dataset, assay, and other aligned data is loaded, and is guaranteed to always be complete.
and so on...

Data Integration

Data imported from different studies is intermingled in these system-level study schema tables. (The data is not intermingled in studyDataset schema tables.) To get a sense of how data is intermingled in these tables, open PGAdmin, (or the equivalent tool for MS SQL Server) and drill down to:

Databases > labkey > Schemas > study > Tables.
Right-click a table such as "study" or "dataset" and select View Data > View All Rows.

Notice that data from many apparently different datasets and study containers is intermingled there. (When the server displays data, it generally separates it by container, providing a form of data partitioning.)

Below is an example system table named "participant". Notice the "containerid" column which indicates which study folder a given participant belongs to.

container	participantid
"09f8feb4-e006-1033-acf0-1396fc22d1c4"	"PT-101"
"09f8feb4-e006-1033-acf0-1396fc22d1c4"	"PT-102"
"09f8feb4-e006-1033-acf0-1396fc22d1c4"	"PT-103"
"143fdca3-7dd2-1033-816c-97696f0568c0"	"101344"
"143fdca3-7dd2-1033-816c-97696f0568c0"	"103505"
"143fdca3-7dd2-1033-816c-97696f0568c0"	"103866"

LabKey Support

LabKey Support