Suppose you wanted to expand the Patients dataset to also include male participants who are "Natural Controllers" of HIV.

To do this, we add a SQL query that returns a selection of records from the Demographics table, in particular all Male participants who are Natural Controllers.

And we'll create a new ETL process from scratch, drawing on the new SQL query.

Create a New Source Query

  • Locate the source code for the ETL module. Depending on where you deployed it, go to either <LABKEY_HOME>/externalModules/etlmodule or <LABKEY_HOME>build/deploy/modules/etlmodule.
  • Within that folder, go to the subdirectory ./queries/study.
  • In that directory, create a file named "MaleNC.sql". Note: our sample includes this file with the extension ".sql-removeToUse" if you prefer to simply rename it instead of creating it anew.
  • Open the file in a text editor and copy and paste the following code into the file:
SELECT Demographics.ParticipantId,
FROM Demographics
WHERE Demographics.Gender = 'm' AND Demographics.TreatmentGroup = 'Natural Controller'
  • Save the file.
  • Restart the server.
  • Log back in as an administrator and return to the "ETL Workspace" folder.

Create a New ETL Process

ETL processes are defined by XML configuration files that specify the data source, the data target, and other properties. Here we create a new configuration that draws from the query we just created above.

  • In the etlmodule/etls directory, create a new XML file called "MaleNC.xml".
  • Copy the following into MaleNC.xml, and save. Note: Again, you can instead simply remove "-removeToUse" from the file included with our sample."
<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="">
<name>Demographics >>> Patients (Males)</name>
<description>Update data for study on male patients.</description>
<transform id="males">
<source schemaName="study" queryName="MaleNC"/>
<destination schemaName="study" queryName="Patients" targetOption="merge"/>
<poll interval="1h"/>
  • Notice that this configuration file has our query (MaleNC) as its source, and the Patients query as its target.
  • Refresh the browser. Notice that the system will add your new module to the server. Click Next if necessary to complete the update.
  • In the "ETL Workspace" folder, notice our new ETL process is now listed in the Data Transforms web part.

Run the ETL Process

  • Click Run Now next to the new process name. You will need to have signed in to see this button.
  • Refresh in the pipeline window until the job completes, then click the ETL Workspace link.
  • New records will have been copied to the Patients table, making a total of 43 records (42 if you skipped the step of changing the gender of a participant in the source data during the previous tutorial step).

Previous Step


expand all collapse all