ETL: Stored Procedures in PostgreSQL

2024-03-29

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

ETLs can call PostgreSQL functions as part of a transform step using a stored procedure. The general syntax for including stored procedures in ETLs is covered here: ETL: Stored Procedures. This topic covers requirements specific to Postgres functions.

ETL XML Configuration

The <procedure> element is included for a <transform> element of type StoredProcedure. In this example, the stored procedure is "PopulateExtended" in the "patient" schema and takes a parameter.

...
<transform id="callfunction" type="StoredProcedure">
<procedure schemaName="patient" procedureName="PopulateExtended">
<parameter name="inoutparam" value="before"/>
</procedure>
</transform>
...

Function and Parameter Requirements

PostgreSQL functions called by an ETL process must meet the following requirements:

  • The PostgreSQL function must be of return type record.
  • Parameter names, including the Special Processing parameters (see table below), are case-insensitive.
  • There can be an arbitrary number of custom INPUT and/or INPUT/OUTPUT parameters defined for the function.
  • There can be at most one pure OUTPUT parameter. This OUTPUT parameter must be named "return_status" and must be of type INTEGER. If present, the return_status parameter must be assigned a value of 0 for successful operation. Values > 0 are interpreted as error conditions.
  • Function overloading of differing parameter counts is not currently supported. There can be only one function (procedure) in the PostgreSQL database with the given schema & name combination.
  • Optional parameters in PostgreSQL are not currently supported. An ETL process using a given function must provide a value for every custom parameter defined in the function.
  • PostgreSQL does not have a "print" statement. Writing to the ETL log can be accomplished with a "RAISE NOTICE" statement, for example:
RAISE NOTICE '%', 'Test print statement logging';
  • The "@" sign prefix for parameter names in the ETL configuration xml is optional (for both SQL Server and PostgreSQL). When IN/OUT parameters are persisted in the dataintegration.transformConfiguration.transformState field, their names are consistent with their native dialect (an "@" prefix for SQL Server, no prefix for PostgreSQL).

Parameters - Special Processing

The following parameters are given special processing.

Note that the output values of INOUT's are persisted to be used as inputs on the next run.

NameDirectionDatatypeNotes
transformRunIdInputintAssigned the value of the current transform run id.
filterRunIdInput or Input/OutputintFor RunFilterStrategy, assigned the value of the new transfer/transform to find records for. This is identical to SimpleQueryTransformStep's processing. For any other filter strategy, this parameter is available and persisted for functions to use otherwise. On first run, will be set to -1.
filterStartTimestampInput or Input/OutputdatetimeFor ModifiedSinceFilterStrategy with a source query, this is populated with the IncrementalStartTimestamp value to use for filtering. This is the same as SimpleQueryTransformStep. For any other filter strategy, this parameter is available and persisted for functions to use otherwise. On first run, will be set to NULL.
filterEndTimestampInput or Input/OutputdatetimeFor ModifiedSinceFilterStrategy with a source query, this is populated with the IncrementalEndTimestamp value to use for filtering. This is the same as SimpleQueryTransformStep. For any other filter strategy, this parameter is available and persisted for functions to use otherwise. On first run, will be set to NULL.
containerIdInputGUID/Entity IDIf present, will always be set to the id for the container in which the job is run.
rowsInsertedInput/OutputintShould be set within the function, and will be recorded as for SimpleQueryTransformStep. Initialized to -1. Note: The TransformRun.RecordCount is the sum of rows inserted, deleted, and modified.
rowsDeletedInput/OutputintShould be set within the functions, and will be recorded as for SimpleQueryTransformStep. Initialized to -1. Note: The TransformRun.RecordCount is the sum of rows inserted, deleted, and modified.
rowsModifiedInput/OutputintShould be set within the functions, and will be recorded as for SimpleQueryTransformStep. Initialized to -1. Note: The TransformRun.RecordCount is the sum of rows inserted, deleted, and modified.
returnMsgInput/OutputvarcharIf output value is not empty or null, the string value will be written into the output log.
debugInputbitConvenience to specify any special debug processing within the stored procedure.
return_statusspecialintAll functions must return an integer value on exit. “0” indicates correct processing. Any other value will indicate an error condition and the run will be aborted.

Example PostgreSQL Function

CREATE OR REPLACE FUNCTION patient.PopulateExtended
(IN transformrunid integer
, INOUT rowsinserted integer DEFAULT 0
, INOUT rowsdeleted integer DEFAULT 0
, INOUT rowsmodified integer DEFAULT 0
, INOUT returnmsg character varying DEFAULT 'default message'::character varying
, IN filterrunid integer DEFAULT NULL::integer
, INOUT filterstarttimestamp timestamp without time zone DEFAULT NULL::timestamp without time zone
, INOUT filterendtimestamp timestamp without time zone DEFAULT NULL::timestamp without time zone
, INOUT runcount integer DEFAULT 1
, INOUT inoutparam character varying DEFAULT ''::character varying
, OUT return_status integer)
RETURNS record AS
$BODY$

BEGIN

/*
*
* Function logic here
*
*/

RETURN;

END;
$BODY$
LANGUAGE plpgsql;

Related Topics