If your ETL source and target tables have different column names, you can configure a mapping between the columns, such that data from one column will be loaded into the mapped column, even if it has a different name.
Column Mapping Example
If your source and target tables have different column names, you can configure a mapping so that data from one column will be loaded into the mapped column. For example, suppose you are working with the following tables:
|Source Table Columns||Target Table Columns|
If you want data from "ParticipantId" to be loaded into the column "SubjectId", use a transform like the following. Add column mappings to your ETL configuration using a <columnTransforms> element, with <column> elements to define each name mapping. For example:
<source schemaName="study" queryName="Participants"/>
<destination schemaName="study" queryName="Subjects" targetOption="merge">
<column source="ParticipantId" target="SubjectId"/>
<column source="StartDate" target="Date"/>
<column source="Gender" target="Sex"/>
<column source="TreatmentGroup" target="Treatment"/>
<column source="Cohort" target="Group"/>
Column mapping is supported for both query and file destinations. Mapping one source column onto many destination columns is not supported.
Container columns can be used to integrate data across different containers within LabKey Server. For example, data gathered in one project can be referenced from other locations as if it were available locally. However, ETL processes are limited to running within a single container. You cannot map a target container column to anything other than the container in which the ETL process is run.
To assign a constant value to a given target column, use a constant in your ETL configuration .xml file. For example, this sample would write "schema1.0" into the sourceVersion column of every row processed:
<column name="sourceVersion" type="VARCHAR" value="schema1.0"/>
If a column named "sourceVersion" exists in the source query, the constant value specified in your ETL xml file is used instead.
Constants can be set at both:
- The top level of your ETL xml: the constant is applied for every step in the ETL process.
- At an individual transform step level: the constant is only applied for that step and overrides any global constant that may have been set.
<destination schemaName="vehicle" queryName="etl_target">
<column name="sourceVersion" type="VARCHAR" value="myStepValue"/>
Creation and Modification Columns
If the source table includes the following columns, they will be populated in the target table with the same names:
If the source tables include values for these columns, they will be retained. CreatedBy and ModifiedBy are integer columns that are lookups into the core.users table. When the source table includes a username value for one of these fields, the matching user is found in the core.user table and that user ID value is used. If no matching user is found, a deactivated user will be generated on the LabKey side and the column populated with that new user ID.
If no data is provided for these columns, they will be populated with the time and user information from the running of the ETL process.
Adding the following data integration ('di') columns to your target table will enable integration with other related data and log information.
|Column Name||PostresSQL Type||MS SQL Server Type||Notes|
|diModified||TIMESTAMP||DATETIME||Values here may be updated in later data mergers.|
|diModifiedBy||USERID||USERID||Values here may be updated in later data mergers.|
|diCreated||TIMESTAMP||DATETIME||Values here are set when the row is first inserted via a ETL process, and never updated afterwards|
|diCreatedBy||USERID||USERID||Values here are set when the row is first inserted via a ETL process, and never updated afterwards|
The value written to diTransformRunId will match the value written to the TransformRunId column in the table dataintegration.transformrun, indicating which ETL run was responsible for adding which rows of data to your target table.
Transformation Java Classes
The ETL pipeline allows Java developers to add a transformation java class to a particular column. This Java class can validate, transform or perform some other action on the data values in the column. For details and an example, see ETL: Examples