ETL: Extract Transform Load: /Documentation/Archive/22.3

ETL: Extract Transform Load

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

Extract-Transform-Load mechanisms, available with the LabKey dataintegration module, let you encapsulate some of the most common database tasks, specifically:

Extracting data from a source database
Transforming it, and
Loading it into a target database (the same or another one)

Example Scenarios

At a high level, ETLs provide a system for structuring a sequence of operations or actions on data. There are a wide variety of flexible options available to support diverse scenarios. Including, but certainly not limited to:

Integrate data from multiple sources into a shared data warehouse.
Migrate from one database schema to another, especially where the source schema is an external database.
Coalescing many tables into one, or distributing (aka, provisioning) one table into many.
Normalize data from different systems.
Synchronizing data in scheduled increments.
Logging and auditing the details of repeated migration or analysis processes.

ETL Definition Options

ETL functionality can be (1) included in a custom module or (2) written directly into the user interface. ETLs can also be either standalone or run a series of other ETLs or stored database procedures in sequence. Depending on your needs, you may decide to use a combination of the methods for defining ETLs.

Module ETLs can more easily be placed under version control and need only be defined once to be deployed on multiple systems (such as staging and production).
ETLs defined in the user interface are easier to start from scratch on a running server, but will be defined only on the server where they are created.
A combination approach, where a 'wrapper' ETL in the user interface calls a set of module-defined component ETLs may give you the right blend of features.

ETL Development Process

When developing an ETL for your own custom scenario, it is good practice to follow a staged approach, beginning with the simplest process and adding layers of complexity only after the first steps are well understood and functioning as expected.

Identify the source data. Understand structure, location, access requirements.
Create the target table. Define the location and end requirements for data you will be loading.
Create an ETL to perform your transformations, truncating the target each time (as the simplest pathway).
If your ETL should run on a schedule, modify it to make it run on that schedule.
Add additional options, such as a timestamp based filter strategy.
Switch to a target data merge if desired.
Add handling of deletes of data from the source.

More details about this procedure are available in the topic: ETL: Planning

Topics

The following topics will get you started developing ETL scripts and processes and packaging them as modules:

Get Started with the Basics

Tutorial: Extract-Transform-Load (ETL) - Introduction to ETLs with a basic single database example to explore.
ETL: User Interface - Enable and run existing ETLs within the server UI.
ETL: Create a New ETL - Define a new ETL within the server UI.
ETL: Planning - Considerations and Tradeoffs.
ETL: Attributes

<transforms> Topics

ETL: Transform Types and Tasks - Different types of task can be executed within a transform element.
ETL: Target Options - Options for the ETL Destination.
ETL: Column Mapping - Map columns in an ETL process.
ETL: Queuing ETL Processes

Scheduling

ETL: Schedules - Configure the refresh schedule.

Filtering: Either on Source Data or to Run ETLs Incrementally

ETL: Filter Strategies

More Topics

ETL: Transactions
ETL: Stored Procedures - Use a stored procedure as a transformation.
ETL: Logs and Error Handling - Logging and error reporting.
ETL: Examples
ETL: Module Structure
Premium Resource: ETL Best Practices - (Premium Resource) Recommendations for ETL design and scheduling.

LabKey Support

LabKey Support