ETL: Stored Procedures in MS SQL Server

2024-04-19

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

You can call a stored procedure as a transform step to leverage existing database resources.

Example - Normalize Data

The following ETL process uses the stored procedure normalizePatientData to modify the source data.

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
<name>Target #1 (Normalize Gender Values - Stored Procedure)</name>
<description>Runs a stored procedure.</description>
<transforms>
<transform id="storedproc" type="StoredProcedure">
<description>Runs a stored procedure to normalize values in the Gender column.</description>
<procedure schemaName="target1" procedureName="normalizePatientData">
</procedure>
</transform>
</transforms>
</etl>

The stored procedure is shown below.

CREATE procedure [target1].[normalizePatientData] (@transformRunId integer)
as
begin
UPDATE Patients SET Gender='Female' WHERE (Gender='f' OR Gender='F');
UPDATE Patients SET Gender='Male' WHERE (Gender='m' OR Gender='M');
end
GO

Parameters

The <procedure> element can have <parameter> child elements that specify the initial seed values passed in as input/output parameters. Note that The "@" sign prefix for parameter names in the ETL xml configuration is optional.

<procedure … >
<parameter name="@param1" value="100" override="false"/>
<parameter name="@param2" value="200" override="false"/>
</procedure>

The output values of all input/output parameters are persisted in the database, and are used as input values for the next pass. These values take precedence over the initial seed values specified in the xml file. To reset and force the use of the value from the xml file, set the optional override attribute to "true".

<procedure schemaName="external" procedureName="etlTestRunBased">
<parameter name="@femaleGenderName" value="Female" override="false"/>
<parameter name="@maleGenderName" value="Male" override="false"/>
</procedure>

CREATE procedure [target1].[normalizePatientData] (@transformRunId integer,
@maleGenderName VARCHAR(25),
@femaleGenderName VARCHAR(25))
as
begin
UPDATE Patients SET Gender=@femaleGenderName WHERE (Gender='f' OR Gender='F');
UPDATE Patients SET Gender=@maleGenderName WHERE (Gender='m' OR Gender='M');
end
GO

Parameters - Special Processing

The following parameters are given special processing.

NameDirectionDatatypeNotes
@transformRunIdInputintAssigned the value of the current transform run id.
@filterRunIdInput or Input/OutputintFor RunFilterStrategy, assigned the value of the new transfer/transform to find records for. This is identical to SimpleQueryTransformStep’s processing. For any other filter strategy, this parameter is available and persisted for stored procedure to use otherwise. On first run, will be set to -1.
@filterStartTimestampInput or Input/OutputdatetimeFor ModifiedSinceFilterStrategy with a source query, this is populated with the IncrementalStartTimestamp value to use for filtering. This is the same as SimpleQueryTransformStep. For any other filter strategy, this parameter is available and persisted for stored procedure to use otherwise. On first run, will be set to NULL.
@filterEndTimestampInput or Input/OutputdatetimeFor ModifiedSinceFilterStrategy with a source query, this is populated with the IncrementalEndTimestamp value to use for filtering. This is the same as SimpleQueryTransformStep. For any other filter strategy, this parameter is available and persisted for stored procedure to use otherwise. On first run, will be set to NULL.
@containerIdInputGUID/Entity IDIf present, will always be set to the id for the container in which the job is run.
@rowsInsertedInput/OutputintShould be set within the stored procedure, and will be recorded as for SimpleQueryTransformStep. Initialized to -1. Note: The TransformRun.RecordCount is the sum of rows inserted, deleted, and modified.
@rowsDeletedInput/OutputintShould be set within the stored procedure, and will be recorded as for SimpleQueryTransformStep. Initialized to -1. Note: The TransformRun.RecordCount is the sum of rows inserted, deleted, and modified.
@rowsModifiedInput/OutputintShould be set within the stored procedure, and will be recorded as for SimpleQueryTransformStep. Initialized to -1. Note: The TransformRun.RecordCount is the sum of rows inserted, deleted, and modified.
@returnMsgInput/OutputvarcharIf output value is not empty or null, the string value will be written into the output log.
@debugInputbitConvenience to specify any special debug processing within the stored procedure. May consider setting this automatically from the Verbose flag.
Return CodespecialintAll stored procedures must return an integer value on exit. “0” indicates correct processing. Any other value will indicate an error condition and the run will be aborted.

To write to the ETL log file, use a 'print' statement inside the procedure.

Log Rows Modified

Use special parameters to log the number of rows inserted, changed, etc. as follows:

CREATE procedure [target1].[normalizePatientData] (@transformRunId integer
, @parm1 varchar(25) OUTPUT
, @gender varchar(25) OUTPUT
, @rowsInserted integer OUTPUT
, @rowCount integer OUTPUT
, @rowsDeleted integer OUTPUT
, @rowsModified integer OUTPUT
, @filterStartTimestamp datetime OUTPUT)
as
begin
SET @rowsModified = 0
UPDATE Patients SET Gender='Female' WHERE (Gender='f' OR Gender='F');
SET @rowsModified = @@ROWCOUNT
UPDATE Patients SET Gender='Male' WHERE (Gender='m' OR Gender='M');
SET @rowsModified += @@ROWCOUNT
end

Optional Source

An optional source must be used in combination with the RunFilterStrategy or ModifiedSinceFilterStrategy filter strategies.

<transforms>
<transform id="storedproc" type="StoredProcedure">
<description>
Runs a stored procedure to normalize values in the Gender column.
</description>
<!-- Optional source element -->
<!-- <source schemaName="study" queryName="PatientsWarehouse"/> -->
<procedure schemaName="target1" procedureName="normalizePatientData">
</procedure>
</transform>
</transforms>

Transactions

By default all stored procedures are wrapped as transactions, so that if any part of the procedure fails, any changes already made are rolled back. For debugging purposed, turn off the transaction wrapper setting useTransaction to "false":

<procedure schemaName="target1" procedureName="normalizePatientData" useTransaction="false">
</procedure>