Data Processing Pipeline

_Documentation
The data processing pipeline performs long-running, complex processing jobs in the background. Applications include:
  • Automating data upload
  • Performing bulk import of large data files
  • Performing sequential transformations on data during import to the system
Users can configure their own pipeline tasks, such as configuring a custom R script pipeline, or use one of the predefined pipelines, which include study import, MS2 processing, and flow cytometry analysis.

The pipeline handles queuing and workflow of jobs when multiple users are processing large runs. It can be configured to provide notifications of progress, allowing the user or administrator to respond quickly to problems.

For example, an installation of LabKey Server at the Fred Hutch Cancer Research Center uses the data processing pipeline for daily automated upload and synchronization of datasets, including case report forms and specimen information stored at the lab level around the world. Its pipeline is also used for export/import of complete studies to transfer them between servers, such as staging and production servers.

View Data Pipeline Grid

The Data Pipeline grid displays information about current and past pipeline jobs. You can add a Data Pipeline web part to a page, or view the site-wide pipeline grid:

  • Select Admin > Site > Admin Console.
  • Click Pipeline.
  • Select the checkbox for a row to enable Retry, Delete, and Cancel options for that job.
  • Click Process and Import Data to initiate a new job.
  • Navigate to and select the intended file or folder. If you navigate into a subdirectory tree to find the intended files, the pipeline file browser will remember that location when you return to import other files later.
  • Click Import.

Delete a Pipeline Job

To delete a pipeline job, click the checkbox for the row on the data pipeline grid, and click Delete. You will be asked to confirm the deletion.

If there are associated experiment runs that were generated, you have the option to delete them at the same time. In addition, if there are no usages of files in the pipeline analysis directory when the pipeline job is deleted (i.e., files attached to runs as inputs or outputs), we will delete the analysis directory from the pipeline root. The files are not actually deleted, but moved to a ".deleted" directory that is hidden from the file-browser.

Cancel a Pipeline Job

To cancel a pipeline job, select the checkbox for the intended row and click Cancel. The job status will be set to "CANCELLED" and excecution halted.

Use Pipeline Override to Mount a File Directory

You can configure a pipeline override to identify a specific location for the storage of files for usage by the pipeline.

Set Up Email Notifications (Optional)

If you or others wish to be notified when a pipeline job succeeds or fails, you can configure email notifications at the site, project, or folder level. Email notification settings are inherited by default, but this inheritance may be overridden in child folders.

  • Open the Email Notifications panel at the desired level:
    • At the site level, select Admin > Site > Admin Console, then click Pipeline Email Notification.
    • At the project or folder level, select Admin > Go To Module > Pipeline, then click Setup.
  • Check the appropriate box(es) to configure notification emails to be sent when a pipeline job succeeds and/or fails.
  • Check the "Send to owner" box to automatically notify the user initiating the job.
  • Add additional email addresses and select the frequency and timing of notifications.
  • In the case of pipeline failure, there is a second option to define a list of escalation users. If configured, these users can be notified from the pipeline job details view directly using the Escalate Job Failure button.
  • Click Update.

Related Topics


previousnext
 
expand allcollapse all