This topic provides guidance for planning the structure of your ETLS.
Multiple Steps in a Single ETL or Multiple ETLs?
- Do changes to the source affect multiple target datasets at once? If so consider configuring multiple steps in one ETL definition.
- Do source changes impact a single target dataset? Consider using multiple ETL definitions, one for each dataset.
- Are the target queries relational? Consider multiple steps in one ETL definition.
- Do you need steps to always run in a particular order? Use multiple steps in a single ETL. Multiple ETLs may run in parallel or out of order, particularly if one is long running and needs to occur first.
- Should the entire series of steps run in a single transaction? If so, then use multiple steps ina single ETL.
ETLs Across LabKey Containers
ETLs are constructed as operations in a destination folder, pulling information from a remote or linked source location OR from the local container itself. If this source location is also within your same LabKey Server, but in a different container, such as in a parallel folder, there are two ways to accomplish this with your ETL:
- Create a linked schema for the source table in the destination folder. Then your ETL is created in the destination folder and simply provides this linked schema and query name as the source.
- Make your LabKey Server a remote connection to itself. Then you can access the source folder on the "remote connection" and provide the different container path there.
Once a command task has been registered in a pipeline task xml file
, you can specify the task as an ETL step.
<transform id="ProcessingEngine" type="ExternalPipelineTask"
See this example module for an ETL that calls a pipeline job: ETLPipelineTest.zip
Permission to Run
ETL processes are run in the context of a folder. If run manually, they run with the permissions of the initiating user. If scheduled, they will run with the permissions of a "service user" which can be configured by the folder administrator.