This tutorial shows you how to set up a new cohort study from scratch. It shows you how to assemble and configure the backbone of a study: the properties, datasets, cohorts, and specimens. And it prepares your data for integration, analysis, and presentation. For details on fleshing out and working with a study that is already up and running, see Tutorial: Cohort Studies.

Tutorial Aim

This tutorial is primarily about 'putting the pieces together': data integration. It shows you how to join together disconnected datasets into a single, analyzable whole, so you can ask questions that span the whole data landscape. It shows you how to combine heterogeneous datasets (datasets with different shapes and sources), compare cohort performance, and view trends over time.

Starting with Excel spreadsheets, the tutorial assembles a LabKey "study": a repository of integrated data that you can explore and analyze.

Tutorial Scenario

Imagine you are studying treatments for HIV infection. Your aim is to evaluate the effectiveness of anti-retroviral (ARV) treatments in human subjects, using viral load, blood lymphocytes, and cytokine production as measures of successful treatment. You have already collected specimens and data over two years, and are now ready to perform an analysis and evaluation of the data, to get answers to your key questions:

  • How do the ARV treated participants compare to the untreated participants?
  • How do the ARV treatments perform compared to one another?
  • What trends emerge over time for the key measures (viral load, lymphocyte percentage, and cytokine production)?
But your data is scattered and disconnected in different Excel spreadsheets. How do you put all of the pieces together and get answers to your core questions? Below, we will join the separated datasets into a single, analyzable whole using LabKey Server's data integration features.

How Data Integration Works

LabKey Server aligns data using the following data columns:

  • ParticipantID columns
  • Date or VisitID columns
  • SpecimenID columns
When your data includes these columns, LabKey Server organizes the data around the unique id values, linking and integrating the data around these id values, and creating separate profiles of each participant, where you can compare their performance over time.

Tutorial Steps

