ETL: Column Mapping: /home/

ETL: Column Mapping

Column Mapping
Constants
Creation and Modification Columns
Data Integration Columns
Transformation Java Classes

Column Mapping

If your source and target tables have different column names, you can configure a mapping between the columns, such that data from one column will be loaded into the mapped column, even if it has a different name. For example, suppose you are working with the following tables:

Source Table Columns	Target Table Columns
ParticipantId	SubjectId
StartDate	Date
Gender	Sex
TreatmentGroup	Treatment
Cohort	Group

Below we add a mapping such that data from "ParticipantId" is loaded into the column "SubjectId". Add column mappings to your ETL configuration using a <columnTransforms> element, with <column> elements to define each name mapping. For example:

<transform id="transform1">
    <source schemaName="study" queryName="Participants"/>
    <destination schemaName="study" queryName="Subjects" targetOption="merge">
        <columnTransforms>
            <column source="ParticipantId" target="SubjectId"/>
            <column source="StartDate" target="Date"/>
            <column source="Gender" target="Sex"/>
            <column source="TreatmentGroup" target="Treatment"/>
            <column source="Cohort" target="Group"/>
        </columnTransforms>
    </destination>
</transform>

Column mapping is supported for both query and file destinations. Mapping one source column onto many destination columns is not supported.

Container Columns

Container columns can be used to integrate data across different containers within LabKey Server. For example, data gathered in one project can be referenced from other locations as if it were available locally. However, ETL processes are limited to running within a single container. You cannot map a target container column to anything other than the container in which the ETL process is run.

Constants

To assign a constant value to a given target column, use a constant in your ETL configuration .xml file. For example, this sample would write "schema1.0" into the sourceVersion column of every row processed:

<constants>
   <column name="sourceVersion" type="VARCHAR" value="schema1.0"/>
</constants>

If a column named "sourceVersion" exists in the source query, the constant value specified in your ETL xml file is used instead.

Constants can be set at both:

The top level of your ETL xml: the constant is applied for every step in the ETL process.
At an individual transform step level: the constant is only applied for that step and overrides any global constant that may have been set.

<destination schemaName="vehicle" queryName="etl_target">
   <constants>
       <column name="sourceVersion" type="VARCHAR" value="myStepValue"/>
   </constants>
</destination>

Creation and Modification Columns

If the source table includes the following columns, they will be populated in the target table with the same names:

EntityId
Created
CreatedBy
Modified
ModifiedBy

If the source tables include values for these columns, they will be retained. CreatedBy and ModifiedBy are integer columns that are lookups into the core.users table. When the source table includes a username value for one of these fields, the matching user is found in the core.user table and that user ID value is used. If no matching user is found, a deactivated user will be generated on the LabKey side and the column populated with that new user ID.

If no data is provided for these columns, they will be populated with the time and user information from the running of the ETL process.

DataIntegration Columns

Adding the following data integration ('di') columns to your target table will enable integration with other related data and log information.

Column Name	PostresSQL Type	MS SQL Server Type	Notes
diTransformRunId	INT	INT
diRowVersion	TIMESTAMP	DATETIME
diModified	TIMESTAMP	DATETIME	Values here may be updated in later data mergers.
diModifiedBy	USERID	USERID	Values here may be updated in later data mergers.
diCreated	TIMESTAMP	DATETIME	Values here are set when the row is first inserted via a ETL process, and never updated afterwards
diCreatedBy	USERID	USERID	Values here are set when the row is first inserted via a ETL process, and never updated afterwards

The value written to diTransformRunId will match the value written to the TransformRunId column in the table dataintegration.transformrun, indicating which ETL run was responsible for adding which rows of data to your target table.

Transformation Java Classes

The ETL pipeline allows Java developers to add a transformation java class to a particular column. This Java class can validate, transform or perform some other action on the data values in the column. For details and an example, see ETL: Examples

Reference

ETL XML Reference

LabKey Support

LabKey Support