This topic covers some general information about monitoring, maintaining, and troubleshooting the Enterprise Pipeline.

Determine Which Jobs and Tasks Are Actively Running

Each job in the pipeline is composed of one or more tasks. These tasks are assigned to run at a particular location. Locations might include the web server and one or more remotes server for RAW to mzXML conversion, and other tools. Each location may have one or more worker threads that runs the tasks. A typical installation might have the following locations that run the specified tasks:

Location# of threadsTasks
Web Server1CHECK FASTA IMPORT RESULTS
Web Server, high priority1MOVE RUNS
Conversion server1+MZXML CONVERSION
Other remote server1+SEARCH ANALYSIS

When jobs are submitted, the first task in the pipeline will be added to the queue in the WAITING (SEARCH WAITING, for example) state. As soon as there is a worker thread available, it will take the job from the queue and change the state to RUNNING. When it is done, it will put the task back on the queue in the COMPLETE state. The web server should immediately advance the job to the next task and put it back in the queue in the WAITING state.

If jobs remain in an intermediate COMPLETE state for more than a few seconds, there is something wrong and the pipeline is not properly advancing the jobs.

Similarly, if there are jobs in the WAITING state for any of the locations, and no jobs in the RUNNING state for those locations, something is wrong and the pipeline is not properly running the jobs.

Troubleshooting Stuck Jobs

Waiting for Other Jobs to Complete

If jobs are sitting in a WAITING state, other jobs may be running, perhaps in other folders. Check to see if any others jobs are running via the Pipeline link on the Admin Console and filtering the list of jobs. If other jobs are running, your job may simply be waiting in the queue.

ActiveMQ/JMS Connection Lost

If no jobs are actively running, the server may have lost connectivity with the rest of the system. Check the labkey.log file for errors. A message like this (abbreviated for readability):

org.mule.providers.FatalConnectException: ReconnectStrategy "org.mule.providers.SimpleRetryConnectionStrategy" failed to reconnect receiver on endpoint "ActiveMqJmsConnector{this=4973ffee, started=false, initialised=false, name='jmsConnectorFastaCheckWork', disposed=true, numberOfConcurrentTransactedReceivers=4, createMultipleTransactedReceivers=true, connected=false, supportedProtocols=[jms], serviceOverrides=null}"
at org.mule.providers.SimpleRetryConnectionStrategy.doConnect(SimpleRetryConnectionStrategy.java:130)
...
Caused by: org.mule.providers.ConnectException: Initialisation Failure: Could not connect to broker URL: tcp://ActiveMQServer:61616. Reason: java.net.ConnectException: Connection refused: connect
at org.mule.providers.jms.JmsConnector.doConnect(JmsConnector.java:381)
...
Caused by: javax.jms.JMSException: Could not connect to broker URL: tcp://ActiveMQServer:61616. Reason: java.net.ConnectException: Connection refused: connect
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.base/java.net.PlainSocketImpl.waitForConnect(Native Method)
...

indicates that the server has lost its connection to ActiveMQ. Ensure that ActiveMQ is still running. While LabKey Server will automatically try to reestablish the connection, in some cases you may need to shut down LabKey Server, restart ActiveMQ, and then start LabKey Server again to completely restore the connectivity.

Remote Pipeline Server Connectivity Lost

If the primary LabKey Server is having no problems and you are using a Remote Pipeline Server, it may have lost its ActiveMQ connection or encountered other errors. Check its labkey.log file for possible information.

Try restarting using the following sequence:

  1. Delete or cancel the waiting jobs through the Admin Console
  2. Shut down the remote server
  3. Shut down Tomcat
  4. Shut down ActiveMQ
  5. Restart ActiveMQ
  6. Restart Tomcat
  7. Restart the remote server
  8. Submit a new job

Resetting ActiveMQ's Storage

If the steps above do not resolve the issue, try resetting ActiveMQ's state. Follow the steps above, but between steps 4 and 5 (after shutting down ActiveMQ and before restarting it) add these steps:
  • Go to the directory where ActiveMQ is installed
  • Rename the .\data directory to .\data_backup
and continue the rest of the steps to restart the services and try submitting a job. Assuming this solves the problem, you can safely delete the data_backup directory afterwards.

Reload User Undefined

If you see an error message similar to the following:

26 Jun 2018 02:00:00,071 ERROR: The specified reload user is invalid

Consider whether any user accounts responsible for reload pipeline jobs may have been deleted, instead of deactivated.

To resolve errors of this kind, locate any automated reload that may have been defined and deactivate it. Reactivate the reload with a new, valid user account.

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all