This topic covers the types of tasks available in the <transform> element. Each element sets the
type to an available task:
Transform Task
The basic transform task is
org.labkey.di.pipeline.TransformTask. Syntax looks like:
...
<transform id="step1" type="org.labkey.di.pipeline.TransformTask">
<description>Copy to target</description>
<source schemaName="etltest" queryName="source" />
<destination schemaName="etltest" queryName="target" />
</transform>
...
The <source> refers to a schemaName that appears local to the container, but may in fact be an
external schema or
linked schema that was previously configured. For example, if you were referencing a source table in a schema named "myLinkedSchema", you would use:
...
<transform id="step1" type="org.labkey.di.pipeline.TransformTask">
<description>Copy to target</description>
<source schemaName="myLinkedSchema" queryName="source" />
<destination schemaName="etltest" queryName="target" />
</transform>
...
Remote Query Transform Step
In addition to supporting use of
external schemas and
linked schemas, ETL modules can access data through a remote connection to an alternate LabKey Server. The setup and syntax are as follows:
To set up a remote connection, see
Manage Remote Connections.
The transform type is
RemoteQueryTransformStep and your <source> element must include the
remoteSource in addition to the schema and query name on that remoteSource as shown below:
...
<transform type="RemoteQueryTransformStep" id="step1">
<source remoteSource="EtlTest_RemoteConnection" schemaName="study" queryName="etl source" />
… <!-- the destination and other options for the transform are included here -->
</transform>
...
Note that using <deletedRowSource> with an <incrementalFilter> strategy does not support a remote connection.
Queue Job Task
Calling an ETL from another ETL is accomplished by using the ETL type
TaskRefTransformStep and including a <taskref> that refers to
org.labkey.di.steps.QueueJobTask.
Learn more and see syntax examples in this topic:
Run Report Task
To run a report from an ETL, first create your report. You will need the
reportID (number) and names/expected values of any parameters to that report. In addition, the report must set the property "runInBackground" to "true" to be runnable from an ETL. In a <ReportName>.report.xml file, the ReportDescriptor would look like:
<ReportDescriptor descriptorType="rReportDescriptor" reportName="etlReport" xmlns="http://labkey.org/query/xml">
<Properties>
<Prop name="runInBackground">true</Prop>
</Properties>
<tags/>
</ReportDescriptor>
Within your ETL, include a transform of type
TaskRefTransformStep that calls the <taskref>
org.labkey.di.pipeline.RunReportTask. Syntax looks like:
...
<transform id="step1" type="TaskRefTransformStep">
<taskref ref="org.labkey.di.pipeline.RunReportTask">
<settings>
<setting name="reportId" value="db:307"/>
<setting name="myparam" value="myvalue"/>
</settings>
</taskref>
</transform>
...
Note: Currently only R reports are supported in this feature.
Add New TaskRefTask
The
queueing and
report running tasks above are implemented using the
TaskRefTask. This is a very flexible mechanism, allowing you to provide any Java code to be run by the task on the pipeline thread. This task does not have input or output data from the ETL pipeline, it is simply a Java thread, with access to the Java APIs, that will run synchronously in the pipeline queue.
To add a new
TaskRefTask, write your Java code in a module and reference it using syntax similar to the above for the
RunReportTask. If the module includes "MyTask.java", syntax for calling it from an ETL would look like:
...
<transform id="step1" type="TaskRefTransformStep">
<taskref ref="[Module path].MyTask">
...
</taskref>
</transform>
...
Stored Procedures
When working with a
stored procedure, you use a <transform> of type
StoredProcedure. Syntax looks like:
...
<transform id="ExtendedPatients" type="StoredProcedure">
<description>Calculates date of death or last contact for a patient, and patient ages at events of interest</description>
<procedure schemaName="patient" procedureName="PopulateExtendedPatients" useTransaction="true">
</procedure>
</transform>
...
External Pipeline Task - Command Tasks
Once a command task has been registered in a
pipeline task xml file, you can specify the task as an ETL step. In this example, "myEngineCommand.pipeline.xml" is already available. It could be incorporated into an ETL with syntax like this:
...
<transform id="ProcessingEngine" type="ExternalPipelineTask"
externalTaskId="org.labkey.api.pipeline.cmd.CommandTask:myEngineCommand"/>
...
To see a listing of all the registered pipeline tasks on your server, including their respective taskIds:
- Select (Admin) > Site > Admin Console.
- Under Diagnostics, click Pipelines and Tasks.
Related Topics