Hello,
We have been running a 12.3 server w/ ActiveMQ and a remote pipeline server. This pipeline is pretty active, running a number of sequence pipeline jobs per week and a nightly pipeline job to calculate some genetics numbers.
Starting this weekend, our nightly pipeline job stopped running. On the app server, the pipeline job starts successfully and the first task (a local one) runs. The second task in this pipeline runs on the remote server. When it hits this step the pipeline job's status is set to 'waiting', but this task never appears to actually start.
On the remote server, labkey.log shows lines like this on startup:
INFO MuleManager 2013-07-23 06:40:34,185 main : Starting agents...
INFO MuleManager 2013-07-23 06:40:34,185 main : Agents Successfully Started
INFO MuleManager 2013-07-23 06:40:34,323 main :
**********************************************************************
* Not Set *
* Version: Not Set Build: Not Set *
* Not Set *
* Not Set *
* *
* Server started: 7/23/13 6:40 AM *
* Server ID: LabKey_Pipeline *
* JDK: 1.7.0_07 (mixed mode) *
* OS: Linux (2.6.32-358.11.1.el6.x86_64, amd64) *
* Host: <removed> *
* *
* Agents Running: None *
**********************************************************************
INFO Job 2013-07-23 06:41:11,299 PipelineTaskRunnerUMO.2 : Starting to run task 'org.labkey.ehr.pipeline.GeneticCalculationsRTask' for job '(NOT SUBMITTED) studyDefinition/kinship.txt (EHR Kinship Calculation)' with log file /lkfiles/studyDefinition/kinship/EHR Kinship Calculation/kinship.txt.log
INFO Job
This looks pretty normal to me, and it seems to be reading the job from activemq. Given that it's reading the queued job, I would have expected it to start running that task, but it doesnt. I have taken these troubleshooting steps:
1) restart pipeline java process
2) cancel/restart pipeline job
3) restart tomcat/activemq on production server
4) purge the activemq queue and retry
This seems to impact all types of remote pipeline jobs on this server. I have not consciously changed any pipeline configs in some time.
Can you suggest any troubleshooting steps to take, or information that could help?
Thanks. |