If your ETL source and destination tables have different column names, you can configure a mapping between the columns, such that data from one column from the source will be loaded into the mapped column in the destination, even if it has a different name.
If your source and target tables have different column names, you can configure a mapping so that data from one column will be loaded into the mapped column. For example, suppose you are working with the following tables:
Source Table Columns | Destination Table Columns |
---|---|
ParticipantId | SubjectId |
StartDate | Date |
Gender | Sex |
TreatmentGroup | Treatment |
Cohort | Group |
If you want data from "ParticipantId" to be loaded into the column "SubjectId", use a transform like the following. Add column mappings to your ETL configuration using a <columnTransforms> element, with <column> elements to define each name mapping. For example:
<transform id="transform1">
<source schemaName="study" queryName="Participants"/>
<destination schemaName="study" queryName="Subjects" targetOption="merge">
<columnTransforms>
<column source="ParticipantId" target="SubjectId"/>
<column source="StartDate" target="Date"/>
<column source="Gender" target="Sex"/>
<column source="TreatmentGroup" target="Treatment"/>
<column source="Cohort" target="Group"/>
</columnTransforms>
</destination>
</transform>
Column mapping is supported for both query and file destinations. Mapping one source column onto many destination columns is not supported.
Container columns can be used to integrate data across different containers within LabKey Server. For example, data gathered in one project can be referenced from other locations as if it were available locally. However, ETL processes are limited to running within a single container. You cannot map a target container column to anything other than the container in which the ETL process is run.
To assign a constant value to a given target column, use a constant in your ETL configuration .xml file. For example, this sample would write "schema1.0" into the sourceVersion column of every row processed:
<constants>
<column name="sourceVersion" type="VARCHAR" value="schema1.0"/>
</constants>
If a column named "sourceVersion" exists in the source query, the constant value specified in your ETL xml file is used instead.
Constants can be set at both:
<destination schemaName="vehicle" queryName="etl_target">
<constants>
<column name="sourceVersion" type="VARCHAR" value="myStepValue"/>
</constants>
</destination>
If importing Created/Modified information from the source system, the destination will show the same information, i.e. the time (and user Id) for the creation or modification of the data in that source row. You may also want to record information about the timing and userID for the ETL action itself, i.e. when that data was "moved" to the destination table, as opposed to when the data itself was modified in the source.
Adding the following data integration ('di') columns to your target table will enable integration with other related data and log information.
Column Name | Type | Notes |
---|---|---|
diTransformRunId | INT | |
diRowVersion | TIMESTAMP | |
diModified | TIMESTAMP | When the destination row was modified via ETL. Values here may be updated in later data merges. |
diModifiedBy | USERID | The userID under which the row was modified. Values here may be updated in later data merges. |
diCreated | TIMESTAMP | When the destination row was created via ETL. Values here are set when the row is first inserted via an ETL, and never updated afterwards. |
diCreatedBy | USERID | The userID under which that ETL was run to create the row. Values here are set when the row is first inserted via an ETL, and never updated afterwards |
The value written to diTransformRunId will match the value written to the TransformRunId column in the table dataintegration.transformrun, indicating which ETL run was responsible for adding which rows of data to your target table.
Java developers can add a Java class to handle the transformation step of an ETL process. This Java class can validate, transform or perform some other action on the data values in the column. For example, you might use one to look up the userID from an email address in order to provide a valid link to the users table from a data import.
For details and an example, see ETL: Examples