The ETL itself runs on your LabKey Server. It can also call
Stored Procedures; scripts that will run on either the source or target database. These scripts live outside the ETL framework, but are referenced and sequenced within it, generally as part of a multistep ETL. Using stored procedures may be of interest if you are already using a legacy set of business rules or other SQL scripts. It also allows you to write in the native SQL for your data source, if that is of interest.
This topic describes how to use stored procedures in your etl. Companion topics cover
special behavior for specific database types.
Stored Procedures as Source Queries
Instead of extracting data directly from a source query and loading it into a target query, an ETL process can call one or more stored procedures that themselves move data from the source to the target (or the procedures can transform the data in some other way). For example, the following ETL process runs a stored procedure to populate the Patients table. If you are familiar with the
tutorial example, notice that here, the actual action, source, and destination are not visible in the ETL XML. Those details are assumed to be in the procedure named "PopulateExtendedPatients". It is good practice to name your stored procedures in an identifying way and use <description> elements to clarify the actions they will take.
<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
<name>Populate Patient Table</name>
<description>Populate Patients table with calculated and converted values.</description>
<transforms>
<transform id="ExtendedPatients" type="StoredProcedure">
<description>Calculates date of death or last contact for a patient, and patient ages at events of interest</description>
<procedure schemaName="patient" procedureName="PopulateExtendedPatients">
</procedure>
</transform>
</transforms>
<!-- run at 3:30am every day -->
<schedule><cron expression="0 30 3 * * ?"/></schedule>
</etl>
Use Transaction for Procedure
Wrap the call of the procedure in a transaction by setting the attribute
useTransaction to true. This is also the default setting, so that if any part of the stored procedure fails, any partially-completed changes would be rolled back.
...
<transform id="ExtendedPatients" type="StoredProcedure">
<procedure schemaName="patient" procedureName="PopulateExtendedPatients" useTransaction="true">
</procedure>
</transform>
...
For debugging purposes, you can turn off the transaction wrapper setting useTransaction to "false".
Pass Parameters to Stored Procedures
You can include <parameters> to stored procedures within the <procedure> element. The value you provide could be hard coded in the ETL, as shown here,
...
<transform id="checkToRun" type="StoredProcedure">
<procedure schemaName="patient" procedureName="someTestWork" useTransaction="false">
<parameter name="MyParam" value="4"/>
<parameter name="AnotherParam" value="before"/>
</procedure>
</transform>
...
You can include a setting for
noWorkValue on a parameter to block running of the procedure when the given parameter has that value. Learn more in this topic:
ETL: Check For Work From a Stored Procedure
Behavior/Examples for Different Database Types
Related Topics