Data Processing Pipeline

2024-03-29

The data processing pipeline performs long-running, complex processing jobs in the background. Applications include:
  • Automating data upload
  • Performing bulk import of large data files
  • Performing sequential transformations on data during import to the system
Users can configure their own pipeline tasks, such as configuring a custom R script pipeline, or use one of the predefined pipelines, which include study import, MS2 processing, and flow cytometry analysis.

The pipeline handles queuing and workflow of jobs when multiple users are processing large runs. It can be configured to provide notifications of progress, allowing the user or administrator to respond quickly to problems.

For example, an installation of LabKey Server might use the data processing pipeline for daily automated upload and synchronization of datasets, case report forms, and sample information stored at the lab level around the world. The pipeline is also used for export/import of complete studies when transferring them between staging and production servers.

Topics:

View Data Pipeline Grid

The Data Pipeline grid displays information about current and past pipeline jobs. You can add a Data Pipeline web part to a page, or view the site-wide pipeline grid:

  • Select (Admin) > Site > Admin Console.
  • Under Management, click Pipeline.

The pipeline grid shows a line for each current and past pipeline job. Options:

  • Click Process and Import Data to initiate a new job.
  • Use Setup to change file permissions, set up a pipeline override, and control email notifications.
  • (Grid Views), (Charts and Reports), (Export) grid options are available as on other grids.
  • Select the checkbox for a row to enable Retry, Delete, Cancel, and Complete options for that job.
  • Click (Print) to generate a printout of the status grid.

Initiate a Pipeline Job

  • From the pipeline status grid, click Process and Import Data. You will see the current contents of the pipeline root. Drag and drop additional files to upload them.
  • Navigate to and select the intended file or folder. If you navigate into a subdirectory tree to find the intended files, the pipeline file browser will remember that location when you return to import other files later.
  • Click Import.

Delete a Pipeline Job

To delete a pipeline job, click the checkbox for the row on the data pipeline grid, and click (Delete). You will be asked to confirm the deletion.

If there are associated experiment runs that were generated, you will have the option to delete them at the same time via checkboxes. In addition, if there are no usages of files in the pipeline analysis directory when the pipeline job is deleted (i.e., files attached to runs as inputs or outputs), we will delete the analysis directory from the pipeline root. The files are not actually deleted, but moved to a ".deleted" directory that is hidden from the file-browser.

Cancel a Pipeline Job

To cancel a pipeline job, select the checkbox for the intended row and click Cancel. The job status will be set to "CANCELLING/CANCELLED" and execution halted.

Use Pipeline Override to Mount a File Directory

You can configure a pipeline override to identify a specific location for the storage of files for usage by the pipeline.

Set Up Email Notifications (Optional)

If you or others wish to be notified when a pipeline job succeeds or fails, you can configure email notifications at the site, project, or folder level. Email notification settings are inherited by default, but this inheritance may be overridden in child folders.

  • In the project or folder of interest, select Admin > Go To Module > Pipeline, then click Setup.
  • Check the appropriate box(es) to configure notification emails to be sent when a pipeline job succeeds and/or fails.
  • Check the "Send to owner" box to automatically notify the user initiating the job.
  • Add additional email addresses and select the frequency and timing of notifications.
  • In the case of pipeline failure, there is a second option to define a list of Escalation Users.
  • Click Update.
  • Site and application administrators can also subscribe to notifications for the entire site.
    • At the site level, select Admin > Site > Admin Console.
    • Under Management, click Pipeline Email Notification.

Customize Notification Email

You can customize the email notification(s) that will be sent to users, with different templates for failed and successful pipeline jobs. Learn more in this topic:

In addition to the standard substitutions available, custom parameters available for pipeline job emails are:

Parameter NameTypeFormatDescription
dataURLStringPlainLink to the job details for this pipeline job
jobDescriptionStringPlainThe job description
setupURLStringPlainURL to configure the pipeline, including email notifications
statusStringPlainThe job status
timeCreatedDatePlainThe date and time this job was created
userDisplayNameStringPlainDisplay name of the user who originated the action
userEmailStringPlainEmail address of the user who originated the action
userFirstNameStringPlainFirst name of the user who originated the action
userLastNameStringPlainLast name of the user who originated the action

Escalate Job Failure

Once Escalation Users have been configured, these users can be notified from the pipeline job details view directly using the Escalate Job Failure button. Click the ERROR status from the pipeline job log, then click Escalate Job Failure.

Related Topics