The data processing pipeline performs long-running, complex processing jobs in the background. Applications include:
  • Automating data upload
  • Performing bulk import of large data files
  • Performing sequential transformations on data during import to the system
Users can configure their own pipeline tasks, such as configuring a custom R script pipeline, or use one of the predefined pipelines, which include study import, MS2 processing, and flow cytometry analysis.

The pipeline handles queuing and workflow of jobs when multiple users are processing large runs. It can be configured to provide notifications of progress, allowing the user or administrator to respond quickly to problems.

For example, an installation of LabKey Server at the Fred Hutch Cancer Research Center uses the data processing pipeline for daily automated upload and synchronization of datasets, including case report forms and specimen information stored at the lab level around the world. Its pipeline is also used for export/import of complete studies to transfer them between servers, such as staging and production servers.

View Data Pipeline Grid

The Data Pipeline grid displays information about current and past pipeline jobs. You can add a Data Pipeline web part to a page, or view the site-wide pipeline grid:

  • Select (Admin) > Site > Admin Console.
  • Click Admin Console Links.
  • Click Pipeline (in the Management section).

The pipeline grid shows a line for each current and past pipeline job. Options:

  • Click Process and Import Data to initiate a new job.
  • Use Setup to change file permissions, set up a pipeline override, and control email notifications.
  • (Grid Views), (Charts and Reports), (Export) grid options are available as on other grids. You may need to pause the refresh to use these options.
  • Select the checkbox for a row to enable Retry, Delete, Cancel, and Complete options for that job.
  • (Pause) will pause the refreshing of status on this grid.
  • Click (Print) to generate a printout of the status grid.

Pause Grid Refresh

The pipeline status grid will refresh every 15 seconds. If you wish to use the grid customizer or create current status charts, you can pause this refresh by clicking the (Pause) icon. Note that the pipeline jobs themselves are not paused - only the refresh of the grid displaying current status.

After making your changes using the (Grid Views) > Customize Grid, resume the pipeline status refresh using (Play).

Initiate a Pipeline Job

  • From the pipeline status grid, click Process and Import Data. You will see the current contents of the pipeline root. Drag and drop additional files to upload them.
  • Navigate to and select the intended file or folder. If you navigate into a subdirectory tree to find the intended files, the pipeline file browser will remember that location when you return to import other files later.
  • Click Import.

Delete a Pipeline Job

To delete a pipeline job, click the checkbox for the row on the data pipeline grid, and click (Delete). You will be asked to confirm the deletion.

If there are associated experiment runs that were generated, you will have the option to delete them at the same time via checkboxes. In addition, if there are no usages of files in the pipeline analysis directory when the pipeline job is deleted (i.e., files attached to runs as inputs or outputs), we will delete the analysis directory from the pipeline root. The files are not actually deleted, but moved to a ".deleted" directory that is hidden from the file-browser.

Cancel a Pipeline Job

To cancel a pipeline job, select the checkbox for the intended row and click Cancel. The job status will be set to "CANCELLING/CANCELLED" and excecution halted.

Use Pipeline Override to Mount a File Directory

You can configure a pipeline override to identify a specific location for the storage of files for usage by the pipeline.

Set Up Email Notifications (Optional)

If you or others wish to be notified when a pipeline job succeeds or fails, you can configure email notifications at the site, project, or folder level. Email notification settings are inherited by default, but this inheritance may be overridden in child folders.

  • Open the Email Notifications panel at the desired level:
    • At the site level, select Admin > Site > Admin Console. Click Admin Console Links.
Under Management, click Pipeline Email Notification.
    • At the project or folder level, select Admin > Go To Module > Pipeline, then click Setup.
  • Check the appropriate box(es) to configure notification emails to be sent when a pipeline job succeeds and/or fails.
  • Check the "Send to owner" box to automatically notify the user initiating the job.
  • Add additional email addresses and select the frequency and timing of notifications.
  • In the case of pipeline failure, there is a second option to define a list of escalation users. If configured, these users can be notified from the pipeline job details view directly using the Escalate Job Failure button.
  • Click Update.

Related Topics


Was this content helpful?

Log in or register an account to provide feedback

expand all collapse all