This topic is under construction for the 21.3 (March 2021) release of LabKey Server.

This topic covers the types of tasks available in the <transform> element. Each element sets the type to an available task:

Transform Task

The basic transform task is org.labkey.di.pipeline.TransformTask. Syntax looks like:

...
<transform id="step1" type="org.labkey.di.pipeline.TransformTask">
<description>Copy to target</description>
<source schemaName="etltest" queryName="source" />
<destination schemaName="etltest" queryName="target" />
</transform>
...

The <source> refers to a schemaName that appears local to the container, but may in fact be an external schema or linked schema that was previously configured.

Remote Query Transform Step

In addition to supporting use of external schemas and linked schemas, ETL modules can access data through a remote connection to an alternate LabKey Server. The setup and syntax are as follows:

To set up a remote connection, see Manage Remote Connections.

The transform type is RemoteQueryTransformStep and your <source> element must include the remoteSource in addition to the schema and query name on that remoteSource as shown below:

...
<transform type="RemoteQueryTransformStep" id="step1">
<source remoteSource="EtlTest_RemoteConnection" schemaName="study" queryName="etl source" />
… <!-- the destination and other options for the transform are included here -->
</transform>
...

Note that using <deletedRowSource> with an <incrementalFilter> strategy does not support a remote connection.

Queue Job Task

Calling an ETL from another ETL is accomplished by using the ETL type TaskRefTransformStep and including a <taskref> that refers to org.labkey.di.steps.QueueJobTask.

Learn more and see syntax examples in this topic:

Run Report Task

To run a report from an ETL, first create your report. You will need the reportID (number) and names/expected values of any parameters to that report. In addition, the report must set the property "runInBackground" to "true" to be runnable from an ETL. In a <ReportName>.report.xml file, the ReportDescriptor would look like:

<ReportDescriptor descriptorType="rReportDescriptor" reportName="etlReport" xmlns="http://labkey.org/query/xml">
<Properties>
<Prop name="runInBackground">true</Prop>
</Properties>
<tags/>
</ReportDescriptor>

Within your ETL, include a transform of type TaskRefTransformStep that calls the <taskref> org.labkey.di.pipeline.RunReportTask. Syntax looks like:

...
<transform id="step1" type="TaskRefTransformStep">
<taskref ref="org.labkey.di.pipeline.RunReportTask">
<settings>
<setting name="reportId" value="db:307"/>
<setting name="myparam" value="myvalue"/>
</settings>
</taskref>
</transform>
...

Stored Procedures

When working with a stored procedure, you use a <transform> of type StoredProcedure. Syntax looks like:

...
<transform id="ExtendedPatients" type="StoredProcedure">
<description>Calculates date of death or last contact for a patient, and patient ages at events of interest</description>
<procedure schemaName="patient" procedureName="PopulateExtendedPatients" useTransaction="true">
</procedure>
</transform>
...

External Pipeline Task - Command Tasks

Once a command task has been registered in a pipeline task xml file, you can specify the task as an ETL step. In this example, "myEngineCommand.pipeline.xml" is already available. It could be incorporated into an ETL with syntax like this:

...
<transform id="ProcessingEngine" type="ExternalPipelineTask"
externalTaskId="org.labkey.api.pipeline.cmd.CommandTask:myEngineCommand"/>

...

To see a listing of all the registered pipeline tasks on your server, including their respective taskIds:

  • Select (Admin) > Site > Admin Console.
  • Under Diagnostics, click Pipelines and Tasks.

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all