Column Mapping

If your source and target tables have different column names, you can configure a mapping between the columns, such that data from one column will be loaded into the mapped column, even if it has a different name. For example, suppose you are working with the following tables:

Source Table ColumnsTarget Table Columns
ParticipantIdSubjectId
StartDateDate
GenderSex
TreatmentGroupTreatment
CohortGroup

Below we add a mapping such that data from "ParticipantId" is loaded into the column "SubjectId". Add column mappings to your ETL configuration using a <columnTransforms> element, with <column> elements to define each name mapping. For example:

<transform id="transform1">
<source schemaName="study" queryName="Participants"/>
<destination schemaName="study" queryName="Subjects" targetOption="merge">
<columnTransforms>
<column source="ParticipantId" target="SubjectId"/>
<column source="StartDate" target="Date"/>
<column source="Gender" target="Sex"/>
<column source="TreatmentGroup" target="Treatment"/>
<column source="Cohort" target="Group"/>
</columnTransforms>
</destination>
</transform>

Column mapping is supported for both query and file destinations. Mapping one source column onto many destination columns is not supported.

Container Columns

Container columns can be used to integrate data across different containers within LabKey Server. For example, data gathered in one project can be referenced from other locations as if it were available locally. However, ETL processes are limited to running within a single container. You cannot map a target container column to anything other than the container in which the ETL process is run.

Constants

To assign a constant value to a given target column, use a constant in your ETL configuration .xml file. For example, this sample would write "schema1.0" into the sourceVersion column of every row processed:

<constants>
<column name="sourceVersion" type="VARCHAR" value="schema1.0"/>
</constants>

If a column named "sourceVersion" exists in the source query, the constant value specified in your ETL xml file is used instead.

Constants can be set at both:

  • The top level of your ETL xml: the constant is applied for every step in the ETL process.
  • At an individual transform step level: the constant is only applied for that step and overrides any global constant that may have been set.
<destination schemaName="vehicle" queryName="etl_target">
<constants>
<column name="sourceVersion" type="VARCHAR" value="myStepValue"/>
</constants>
</destination>

Creation and Modification Columns

If the source table includes the following columns, they will be populated in the target table with the same names:

  • EntityId
  • Created
  • CreatedBy
  • Modified
  • ModifiedBy
If the source tables include values for these columns, they will be retained. CreatedBy and ModifiedBy are integer columns that are lookups into the core.users table. When the source table includes a username value for one of these fields, the matching user is found in the core.user table and that user ID value is used. If no matching user is found, a deactivated user will be generated on the LabKey side and the column populated with that new user ID.

If no data is provided for these columns, they will be populated with the time and user information from the running of the ETL process.

DataIntegration Columns

Adding the following data integration ('di') columns to your target table will enable integration with other related data and log information.

Column NamePostresSQL TypeMS SQL Server TypeNotes
diTransformRunIdINTINT 
diRowVersionTIMESTAMPDATETIME 
diModifiedTIMESTAMPDATETIMEValues here may be updated in later data mergers.
diModifiedByUSERIDUSERIDValues here may be updated in later data mergers.
diCreatedTIMESTAMPDATETIMEValues here are set when the row is first inserted via a ETL process, and never updated afterwards
diCreatedByUSERIDUSERIDValues here are set when the row is first inserted via a ETL process, and never updated afterwards

The value written to diTransformRunId will match the value written to the TransformRunId column in the table dataintegration.transformrun, indicating which ETL run was responsible for adding which rows of data to your target table.

Transformation Java Classes

The ETL pipeline allows Java developers to add a transformation java class to a particular column. This Java class can validate, transform or perform some other action on the data values in the column. For details and an example, see ETL: Examples

Reference

Discussion

previousnext
 
expand all collapse all