You can set up a stored procedure as a gating procedure within an ETL process by adding a 'noWorkValue' attribute to a 'parameter' element. The stored procedure is used to check if there is work for the ETL job to do. If the output value of StagingControl parameter is equal to its noWorkValue, it indicates to the system that there is no work for the ETL job to do, and any following transforms will be not be run, otherwise subsequence transforms will be run. In the following example, the transform "checkToRun" controls whether the following transform "queuedJob" will run.

<transform id="checkToRun" type="StoredProcedure">
<procedure schemaName="patient" procedureName="workcheck" useTransaction="false">
<parameter name="StagingControl" value="1" noWorkValue="-1"/>
</procedure>
</transform>
<transform id="queuedJob">
<source schemaName="patient_source" queryName="etl_source" />
<destination schemaName="patient_target" queryName="Patients" targetOption="merge"/>
</transform>

The noWorkValue can either be a hard-coded string (for example, "-1", shown above), or you can use a substitution syntax to indicate a comparison should be against the input value of a certain parameter.

For example, the following parameter indicates there is no work for the ETL job if the output batchId is the same as the output parameter persisted from the previous run.

<parameter name="batchId" noWorkValue="${batchId}"/>

Example

In the ETL transform below, the gating procedure checks if there is a new ClientStagingControlID to process. If there is, the ETL job goes into the queue. When the job starts, the procedure is run again in the normal job context; the new ClientStagingControlID is returned again. The second time around, the output value is persisted into the global space, so further procedures can use the new value. Because the gating procedure is run twice, don’t use this with stored procedures that have other data manipulation effects! There can be multiple gating procedures, and each procedure can have multiple gating params, but during the check for work, modified global output param values are not shared between procedures.

<transform id="CheckForWork" type="StoredProcedure">
<description>Check for new batch</description>
<procedure schemaName="patient" procedureName="GetNextClientStagingControlID">
<parameter name="ClientStagingControlID" value="-1" scope="global" noWorkValue="${ClientStagingControlID}"/>
<parameter name="ClientSystem" value="LabKey-nlp-01" scope="global"/>
<parameter name="StagedTable" value="PathOBRX" scope="global"/>
</procedure>
</transform>

Discussion

previousnext
 
expand all collapse all