Tutorial: Extract-Transform-Load (ETL)

2024-03-29

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

This tutorial shows you how to create and use a simple ETL within a single database. You can use this example to understand the basic parts of the ETL. Extracting data, transforming it, and loading it elsewhere.

Scenario

In the tutorial scenario, imagine you are a researcher who wants to identify a group of participants for a research study. The participants must meet certain criteria to be included in the study, such as having a certain condition or diagnosis. You already have the following in place:

  • You have a running installation of LabKey Server which includes the dataintegration module.
  • You already have access to a large database of demographic information of candidate participants. This source database is continually being updated with new data and new candidates for your study.
  • You have an empty table called "Patients" on your LabKey Server into which you want to put data for the study candidates. This is the target.
How do you get the records that meet your study's criteria from the source to the target? In this tutorial, you will set up an ETL process to solve this problem. The ETL script will query the source for participants that fit your criteria. If it finds any such records, it will copy them into your target. In addition, this process will run on a schedule: every hour it will re-query the database looking for new, or updated, records that fit your criteria.

Tutorial Steps

  1. ETL Tutorial: Set Up - Set up a sample ETL workspace.
  2. ETL Tutorial: Run an ETL Process - Run an ETL process.
  3. ETL Tutorial: Create a New ETL Process - Add a new ETL process for a new query.

First Step