Troubleshoot the Enterprise Pipeline

2024-03-28

This topic is under construction for the 24.3 (March 2024) release of LabKey Server with embedded Tomcat 10. For the previous documentation, click here.

Using the data pipeline with ActiveMQ is not a typical configuration and may require significant customization. If you are interested in using this feature, please contact LabKey to inquire about support options.

This topic covers some general information about monitoring, maintaining, and troubleshooting the Enterprise Pipeline.

Determine Which Jobs and Tasks Are Actively Running

Each job in the pipeline is composed of one or more tasks. These tasks are assigned to run at a particular location. Locations might include the web server and one or more remote servers for other tools. Each location may have one or more worker threads that runs the tasks.

When jobs are submitted, the first task in the pipeline will be added to the queue in a WAITING state. As soon as there is a worker thread available, it will take the job from the queue and change the state to RUNNING. When it is done, it will put the task back on the queue in the COMPLETE state. The web server should immediately advance the job to the next task and put it back in the queue in the WAITING state.

If jobs remain in an intermediate COMPLETE state for more than a few seconds, there is something wrong and the pipeline is not properly advancing the jobs.

Similarly, if there are jobs in the WAITING state for any of the locations, and no jobs in the RUNNING state for those locations, something is wrong and the pipeline is not properly running the jobs.

Troubleshooting Stuck Jobs

Waiting for Other Jobs to Complete

If jobs are sitting in a WAITING state, other jobs may be running, perhaps in other folders. Check to see if any others jobs are running via the Pipeline link on the Admin Console and filtering the list of jobs. If other jobs are running, your job may simply be waiting in the queue.

ActiveMQ/JMS Connection Lost

If no jobs are actively running, the server may have lost connectivity with the rest of the system. Check the labkey.log file for errors.

Ensure that ActiveMQ is still running. While LabKey Server will automatically try to reestablish the connection, in some cases you may need to shut down LabKey Server, restart ActiveMQ, and then start LabKey Server again to completely restore the connectivity.

Remote Pipeline Server Connectivity Lost

If the primary LabKey Server is having no problems and you are using a Remote Pipeline Server, it may have lost its ActiveMQ connection or encountered other errors. Check its labkey.log file for possible information.

Try restarting using the following sequence:

  1. Delete or cancel the waiting jobs through the Admin Console
  2. Shut down the remote server
  3. Shut down ActiveMQ
  4. Restart ActiveMQ
  5. Restart the remote server
  6. Submit a new job

Resetting ActiveMQ's Storage

If the steps above do not resolve the issue, try resetting ActiveMQ's state while it is shut down in the above sequence.
  • Go to the directory where ActiveMQ is installed
  • Rename the .\data directory to .\data_backup
If this solves the problem, you can safely delete the data_backup directory afterwards.

Reload User Undefined

If you see an error message similar to the following:

26 Jun 2018 02:00:00,071 ERROR: The specified reload user is invalid

Consider whether any user accounts responsible for reload pipeline jobs may have been deleted, instead of deactivated.

To resolve errors of this kind, locate any automated reload that may have been defined and deactivate it. Reactivate the reload with a new, valid user account.