ETL: Planning: /Documentation/Archive/20.7

ETL: Planning

Documentation: Version 20.7

ETL: Planning

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

This topic provides guidance for planning the structure of your ETLS.

Multiple Steps in a Single ETL or Multiple ETLs
ETLs Across LabKey Containers
Command Tasks
Permission to Run

Multiple Steps in a Single ETL or Multiple ETLs?

Do changes to the source affect multiple target datasets at once? If so consider configuring multiple steps in one ETL definition.
Do source changes impact a single target dataset? Consider using multiple ETL definitions, one for each dataset.
Are the target queries relational? Consider multiple steps in one ETL definition.
Do you need steps to always run in a particular order? Use multiple steps in a single ETL. Multiple ETLs may run in parallel or out of order, particularly if one is long running and needs to occur first.
Should the entire series of steps run in a single transaction? If so, then use multiple steps ina single ETL.

ETLs Across LabKey Containers

ETLs are constructed as operations in a destination folder, pulling information from a remote or linked source location OR from the local container itself. If this source location is also within your same LabKey Server, but in a different container, such as in a parallel folder, there are two ways to accomplish this with your ETL:

Create a linked schema for the source table in the destination folder. Then your ETL is created in the destination folder and simply provides this linked schema and query name as the source.
Make your LabKey Server a remote connection to itself. Then you can access the source folder on the "remote connection" and provide the different container path there.

Command Tasks

Once a command task has been registered in a pipeline task xml file, you can specify the task as an ETL step.

<transform id="ProcessingEngine" type="ExternalPipelineTask" 
        externalTaskId="org.labkey.api.pipeline.cmd.CommandTask:myEngineCommand"/>

See this example module for an ETL that calls a pipeline job: ETLPipelineTest.zip

Permission to Run

ETL processes are run in the context of a folder. If run manually, they run with the permissions of the initiating user. If scheduled, they will run with the permissions of a "service user" which can be configured by the folder administrator.

Docs & Product Feedback

Was this content helpful?

Log in or register an account to provide feedback

Pages

Documentation Home

Getting Started

Introduction to LabKey Server

Navigate Server

Longitudinal Studies

Set Up a Development Machine

LabKey Client APIs

Develop Modules

Tutorial: Hello World Module

Map of Module Files

Module Loading Using the Server UI

Module Editing Using the Server UI

Example Modules

Modules: Queries, Views and Reports

Modules: JavaScript Libraries

Modules: Assay Types

Modules: Folder Types

Modules: Query Metadata

Modules: Report Metadata

Modules: Custom Header

Modules: Custom Banner

Modules: Custom Footer

Modules: SQL Scripts

Modules: Domain Templates

Modules: Custom Login Page

Modules: Custom Site Welcome Page

ETL: Extract Transform Load

Deploy Modules to a Production Server

Premium Resource: Migrate Module from SVN to GitHub

Upgrade Modules

Main Credits Page

module.properties Reference

module.xml Reference

Common Development Tasks

LabKey Open Source Project

Developer Reference

Troubleshoot LabKey Server

LabKey Biologics

Premium Resources

Community Resources

expand all collapse all