If your ETL source and destination tables have different column names, you can configure a mapping between the columns, such that data from one column from the source will be loaded into the mapped column in the destination, even if it has a different name.
Column Mapping Example
If your source and target tables have different column names, you can configure a mapping so that data from one column will be loaded into the mapped column. For example, suppose you are working with the following tables:
|Source Table Columns||Destination Table Columns|
If you want data from "ParticipantId" to be loaded into the column "SubjectId", use a transform like the following. Add column mappings to your ETL configuration using a <columnTransforms> element, with <column> elements to define each name mapping. For example:
<source schemaName="study" queryName="Participants"/>
<destination schemaName="study" queryName="Subjects" targetOption="merge">
<column source="ParticipantId" target="SubjectId"/>
<column source="StartDate" target="Date"/>
<column source="Gender" target="Sex"/>
<column source="TreatmentGroup" target="Treatment"/>
<column source="Cohort" target="Group"/>
Column mapping is supported for both query and file destinations. Mapping one source column onto many destination columns is not supported.
Container columns can be used to integrate data across different containers within LabKey Server. For example, data gathered in one project can be referenced from other locations as if it were available locally. However, ETL processes are limited to running within a single container. You cannot map a target container column to anything other than the container in which the ETL process is run.
To assign a constant value to a given target column, use a constant in your ETL configuration .xml file. For example, this sample would write "schema1.0" into the sourceVersion column of every row processed:
<column name="sourceVersion" type="VARCHAR" value="schema1.0"/>
If a column named "sourceVersion" exists in the source query, the constant value specified in your ETL xml file is used instead.
Constants can be set at both:
- The top level of your ETL xml: the constant is applied for every step in the ETL process.
- At an individual transform step level: the constant is only applied for that step and overrides any global constant that may have been set.
<destination schemaName="vehicle" queryName="etl_target">
<column name="sourceVersion" type="VARCHAR" value="myStepValue"/>
Creation and Modification Columns
- If the ETL destination table includes Created, CreatedBy, Modified and/or ModifiedBy columns (which most built-in LabKey data structures have by default), the ETL framework will automatically populate these columns with the timestamp at the time of the ETL run and the ETL user who ran the ETL or enabled the ETL schedule.
- Created/CreatedBy columns will be populated with the first time the ETL imported the row and Modified/ModifiedBy columns will be populated when there are any updates to the row in the destination.
- If the source query also includes values for Created, CreatedBy, Modified or ModifiedBy, these source values will be imported in place of the automatically populated columns, meaning that the destination will reflect the creation and modification of the row in the source, not the running of the ETL.
In summary, if the source and destination both include any of the following columns, they will be populated in the destination table with the same names:
If you want to retain both original data creation/modification information and the ETL process creation/modification information, add the data integration columns described below to your destination.
CreatedBy and ModifiedBy are integer columns that are lookups into the core.users table. When the source table is on another server these user Ids will most likely not match the user Ids on the destination server, so extra handling will have to be done either in Java column tranforms or trigger scripts on the destination server to look up the correct user Id.
If importing Created/Modified information from the source system, the destination will show the same information, i.e. the time (and user Id) for the creation or modification of the data in that source row. You may also want to record information about the timing and userID for the ETL action itself, i.e. when that data was "moved" to the destination table, as opposed to when the data itself was modified in the source.
Adding the following data integration ('di') columns
to your target table will enable integration with other related data and log information.
|Column Name||PostresSQL Type||MS SQL Server Type||Notes|
|diModified||TIMESTAMP||DATETIME||When the destination row was modified via ETL. Values here may be updated in later data merges.|
|diModifiedBy||USERID||USERID||The userID under which the row was modified. Values here may be updated in later data merges.|
|diCreated||TIMESTAMP||DATETIME||When the destination row was created via ETL. Values here are set when the row is first inserted via an ETL, and never updated afterwards.|
|diCreatedBy||USERID||USERID||The userID under which that ETL was run to create the row. Values here are set when the row is first inserted via an ETL, and never updated afterwards|
The value written to diTransformRunId will match the value written to the TransformRunId column in the table dataintegration.transformrun, indicating which ETL run was responsible for adding which rows of data to your target table.
Java developers can add a Java class to handle the transformation step of an ETL process. This Java class can validate, transform or perform some other action on the data values in the column. For example, you might use one to look up the userID from an email address in order to provide a valid link to the users table from a data import.
For details and an example, see ETL: Examples