This topic covers the types of tasks available in the <transform> element. Each element sets the type to an available task:

Transform Task

The basic transform task is org.labkey.di.pipeline.TransformTask. Syntax looks like:

...
<transform id="step1" type="org.labkey.di.pipeline.TransformTask">
<description>Copy to target</description>
<source schemaName="etltest" queryName="source" />
<destination schemaName="etltest" queryName="target" />
</transform>
...

The <source> refers to a schemaName that appears local to the container, but may in fact be an external schema or linked schema that was previously configured. For example, if you were referencing a source table in a schema named "myLinkedSchema", you would use:

...
<transform id="step1" type="org.labkey.di.pipeline.TransformTask">
<description>Copy to target</description>
<source schemaName="myLinkedSchema" queryName="source" />
<destination schemaName="etltest" queryName="target" />
</transform>
...

Remote Query Transform Step

In addition to supporting use of external schemas and linked schemas, ETL modules can access data through a remote connection to an alternate LabKey Server. The setup and syntax are as follows:

To set up a remote connection, see Manage Remote Connections.

The transform type is RemoteQueryTransformStep and your <source> element must include the remoteSource in addition to the schema and query name on that remoteSource as shown below:

...
<transform type="RemoteQueryTransformStep" id="step1">
<source remoteSource="EtlTest_RemoteConnection" schemaName="study" queryName="etl source" />
… <!-- the destination and other options for the transform are included here -->
</transform>
...

Note that using <deletedRowSource> with an <incrementalFilter> strategy does not support a remote connection.

Queue Job Task

Calling an ETL from another ETL is accomplished by using the ETL type TaskRefTransformStep and including a <taskref> that refers to org.labkey.di.steps.QueueJobTask.

Learn more and see syntax examples in this topic:

Run Report Task

To run a report from an ETL, first create your report. You will need the reportID (number) and names/expected values of any parameters to that report. In addition, the report must set the property "runInBackground" to "true" to be runnable from an ETL. In a <ReportName>.report.xml file, the ReportDescriptor would look like:

<ReportDescriptor descriptorType="rReportDescriptor" reportName="etlReport" xmlns="http://labkey.org/query/xml">
<Properties>
<Prop name="runInBackground">true</Prop>
</Properties>
<tags/>
</ReportDescriptor>

Within your ETL, include a transform of type TaskRefTransformStep that calls the <taskref> org.labkey.di.pipeline.RunReportTask. Syntax looks like:

...
<transform id="step1" type="TaskRefTransformStep">
<taskref ref="org.labkey.di.pipeline.RunReportTask">
<settings>
<setting name="reportId" value="db:307"/>
<setting name="myparam" value="myvalue"/>
</settings>
</taskref>
</transform>
...

Note: Currently only R reports are supported in this feature.

Add New TaskRefTask

The queueing and report running tasks above are implemented using the TaskRefTask. This is a very flexible mechanism, allowing you to provide any Java code to be run by the task on the pipeline thread. This task does not have input or output data from the ETL pipeline, it is simply a Java thread, with access to the Java APIs, that will run synchronously in the pipeline queue.

To add a new TaskRefTask, write your Java code in a module and reference it using syntax similar to the above for the RunReportTask. If the module includes "MyTask.java", syntax for calling it from an ETL would look like:

...
<transform id="step1" type="TaskRefTransformStep">
<taskref ref="[Module path].MyTask">
...
</taskref>
</transform>
...

Stored Procedures

When working with a stored procedure, you use a <transform> of type StoredProcedure. Syntax looks like:

...
<transform id="ExtendedPatients" type="StoredProcedure">
<description>Calculates date of death or last contact for a patient, and patient ages at events of interest</description>
<procedure schemaName="patient" procedureName="PopulateExtendedPatients" useTransaction="true">
</procedure>
</transform>
...

External Pipeline Task - Command Tasks

Once a command task has been registered in a pipeline task xml file, you can specify the task as an ETL step. In this example, "myEngineCommand.pipeline.xml" is already available. It could be incorporated into an ETL with syntax like this:

...
<transform id="ProcessingEngine" type="ExternalPipelineTask"
externalTaskId="org.labkey.api.pipeline.cmd.CommandTask:myEngineCommand"/>

...

To see a listing of all the registered pipeline tasks on your server, including their respective taskIds:

  • Select (Admin) > Site > Admin Console.
  • Under Diagnostics, click Pipelines and Tasks.

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all