In this step you will download and install:
- a basic workspace for working with ETL processes
- a working ETL module that can move data from the source database into the Patients table on your system.
- Download the folder archive:
- Download the server module:
Set Up ETL Workspace
In this step you will import a pre-configured workspace in which to develop ETL processes. (Note that there is nothing mandatory about the way this workspace has been put together -- your own ETL workspace may be different, depending on the needs of your project. This particular workspace has been configured especially for this tutorial as a shortcut
to avoid many set up steps, steps such as connecting to source datasets, adding an empty dataset to use as the target of ETL scripts, and adding ETL-related web parts.)
- Go a project where you have administrative privileges and log in.
- Create a subfolder of type "Study" to use as a workspace:
- Go to (Admin) > Folder > Management.
- Click Create Subfolder.
- On the Create Folder (Step 1) page, enter the Name "ETL Workspace".
- Under Folder Type, select Study.
- Click Next.
- On the Users/Permissions (Step 2) page, click Finish.
- Import ETLWorkspace.folder.zip into the folder:
- In the Study Overview panel, click Import Study.
- On the Folder Management page, confirm Local zip archive is selected and click Choose File.
- Select the folder archive that you downloaded: ETLWorkspace.folder.zip.
- Click Import Study.
- When the import is complete, click ETL Workspace to see the workspace.
You now have a workspace where you can develop ETL scripts. It includes:
- A LabKey Study with various datasets to use as data sources
- An empty dataset named Patients to use as a target destination
- The ETLs tab provides an area to manage and run your ETL processes. Notice that this tab contains three web parts:
- Data Transforms shows the available ETL processes. Currently it is empty because there are none defined.
- The Patients dataset (the target dataset for the process) is displayed, also empty because no ETL process has been run yet. When you run an ETL process in the next step the the empty Patients dataset will begin to fill with data.
- The Demographics dataset (the source dataset for this tutorial) is displayed with more than 200 records.
Add the ETL Module
ETL processes are added to LabKey Server as part of a "module". Modules are packets of functionality that are easy to distribute to other LabKey Servers. Modules can contain a wide range of functionality, not just ETL-related functionality. For example, they can include HTML pages, SQL queries, R script reports, and more. Module resources are for the most part "file-based", that is, they contain files such as .HTML, .SQL, and .R files which are deployed to the server and surfaced in various places in the user interface where users can interact with them. For deployment to the server, the module files are zipped up into a .zip archive, which is renamed as a ".module" file. In this case, the module you will deploy contains two resources:
- An ETL configuration file (called "FemaleARV.xml") which defines how the ETL process works
- A SQL query which defines the source data for the ETL process
To deploy the module:
- If you are a developer working with the LabKey Server source code directly, then:
- Copy the file etlModule.module to the directory /build/deploy/modules and restart the server.
- If you are working with an installer-based version of LabKey Server, then:
- Copy the file etlModule.module to the directory LABKEY_HOME/externalModules and restart the server. (On Windows you can restart the server using the Services panel.)
- Enable the module in your workspace folder:
- Log back into your restarted server with administrative privileges.
- In the "ETL Workspace" folder, go to (Admin) > Folder > Management.
- Click the Folder Type tab.
- In the Modules list (on the right) place a checkmark next to ETLModule.
- Click Update Folder.
- The ETL script is now ready to run. Notice it has been added to the list under Data Transforms.