This tutorial shows you how to set up a new study from scratch. You will assemble and configure the backbone of a study using some sample data: understanding properties, datasets, cohorts, and specimens.
This tutorial is primarily about 'putting the pieces together'. Starting with Excel spreadsheets, the tutorial assembles a simple LabKey "study": a repository of integrated data that you can explore and analyze. The aligned data provides a unified picture of the study results that can be more easily analyzed as a "whole" than disparate parts.
All data used is fictional.
The default name for the individuals being studied is "participant", as used in this tutorial.
You can choose an alternate word to better match your working environment, such as "subject" or "patient", or can name the organism being studied, such as, "mouse", "mosquito", etc.
Tracking how various attributes of your participants vary over time, or what events happen in what sequence, are typical requirements of study research. LabKey Server provides three different ways of measuring time in your study.
- Dates means that the time is broken into "chunks", called timepoints, bounded by calendar date, meaning that the amount of time elapsed between events is significant. The size of the timepoints may mean months, weeks, or years are the units of time measurement.
- Assigned Visits means that the data is divided into named "events" in a sequence, possibly but not necessarily corresponding to a person visiting a location. The actual dates aren't relevant, only the sequence in which they occur. For instance, "enrollment", "first vaccination", "second screening" might be named visits in a study. In a visit based study, data collection events are assigned a "sequence number", possibly but not necessarily using date information if provided, .
- Continuous is intended for open-ended observational studies that have no determinate end date or stopping point, and no strong concept of dividing time into fixed chunks. This style is useful for electronic health record (EHR) data.
This tutorial creates a date-based study with time broken into 28 day timepoints, roughly corresponding to months.
Study data is the heart of the matter. Different types of data, methods of collection, analysis, and integration are brought together with convenient tools to support a wide variety of research study.
The datasets in a study repository come in three different types:
- One row per participant.
- Demographic datasets record permanent characteristics of the participants which are collected only once for a study. Characteristics like birth gender, birth date, and enrollment date will not change over time.
- From a database point of view, demographic datasets have one primary key: the participantId.
- One row per participant/timepoint pair.
- Clinical datasets record participant characteristics that vary over time in the study, such as physical exam data and simple lab test data. Typical data includes weight, blood pressure, or lymphocyte counts. This data is collected at multiple times over the course of the study.
- From a database point of view, clinical datasets have two primary keys: the participantId and a timepoint.
- Multiple rows per participant/timepoint are allowed.
- These datasets record the assay and specimen data in the study. Not only is this data typically collected repeatedly over time, but more than one of each per timepoint is possible, if, for example, multiple vials of blood are tested, or multiple dilutions of a product are tested on one sample.
- From a database point of view, assay/specimen datasets at least two, and possibly three keys: participant ID and timepoint, plus an optional third key such as a specimenID.