In this step you will download and install a basic workspace for working with ETL processes. In our example, the data warehouse is represented by another table on the same server, rather than requiring configuration of multiple machines.
Set Up ETL Workspace
In this step you will import a pre-configured workspace in which to develop ETL processes. There is nothing mandatory about the way this tutorial workspace has been put together -- your own ETL workspace may be different, depending on the needs of your project. This particular workspace has been configured especially for this tutorial as a shortcut
to avoid many set up steps, steps such as connecting to source datasets, adding an empty dataset to use as the target of ETL scripts, and adding ETL-related web parts.
- Download the folder archive:
- Log in to your server and navigate to your "Tutorials" project. Create it if necessary.
- If you don't already have a server to work on where you can create projects, start here.
- If you don't know how to create projects and folders, review this topic.
- Create a new subfolder named "ETL Workspace". Choose the folder type "Study" and accept other defaults.
- Import ETLWorkspace.folder.zip into the folder:
- In the Study Overview panel, click Import Study.
- On the Folder Management page, confirm Local zip archive is selected and click Choose File.
- Select the folder archive that you downloaded: ETLWorkspace.folder.zip.
- Click Import Study.
- When the import is complete, click ETL Workspace to see the workspace. (You may need to refresh your browser to see the complete status.)
You now have a workspace where you can develop ETL scripts. It includes:
- A LabKey Study with various datasets to use as data sources
- An empty dataset named Patients to use as a target destination
- The ETL Workspace tab provides an area to manage and run your ETL processes. Notice that this tab contains three web parts:
- Data Transforms shows the available ETL processes. Currently it is empty because there are none defined.
- The Patients dataset (the target dataset for the process) is displayed, also empty because no ETL process has been run yet. When you run an ETL process in the next step the the empty Patients dataset will begin to fill with data.
- The Demographics dataset (the source dataset for this tutorial) is displayed with more than 200 records.
Create an ETL