job stuck waiting status

General Server Forum (Inactive)
job stuck waiting status toan nguyen  2017-12-01 14:33
Status: Closed
 
Hi,

   a user submit a job but it is stuck in waiting status in the activemq log after it finish the first FAASTA check.

01 Dec 2017 11:40:26,278 INFO : X! Tandem search for all
01 Dec 2017 11:40:26,281 INFO : =======================================
01 Dec 2017 11:40:26,283 INFO : abac.mzXML
01 Dec 2017 11:40:26,285 INFO : casfdsa.mzXML
01 Dec 2017 11:40:26,287 INFO : cafdsa.mzXML
01 Dec 2017 11:40:26,288 INFO :dsafsa.mzXML
01 Dec 2017 11:40:26,845 INFO : Starting to run task 'org.labkey.ms2.pipeline.FastaCheckTask' at location 'webserver-fasta-check'
01 Dec 2017 11:40:26,854 INFO : Check FASTA validity
01 Dec 2017 11:40:26,856 INFO : =======================================
01 Dec 2017 11:40:26,912 INFO : Checking sequence file validity of xxxxxxxxxxxx.fasta
01 Dec 2017 11:40:29,534 INFO :
01 Dec 2017 11:40:29,537 INFO : Successfully completed task 'org.labkey.ms2.pipeline.FastaCheckTask'



  The labkey.log show

INFO Job 2017-12-01 11:40:29,539 JobRunnerFastaCheckUMO.1 : Successfully completed task 'org.labkey.ms2.pipeline.FastaCheckTask' for job '(NOT SUBMITTED) Phospho (Trypsin_phosphorylation_no_fractions)' with log file xxxxxx.xxxxxx/all.log

 The activemq job.queue shows job in waiting status. Can someone help ? I don't see any thing in the log that what is it waiting for ? submitted ?

<__submitted>false</__submitted>


Thanks
Toan.

ACTIVEMQ_HOME: /usr/local/apache-activemq-5.1.0
ACTIVEMQ_BASE: /usr/local/apache-activemq-5.1.0
JMS_CUSTOM_FIELD:LABKEY_TASKSTATUS = WAITING
JMS_CUSTOM_FIELD:MULE_SESSION = SUQ9N2FkNjJiYjQtZDZjZi0xMWU3LWFlM2UtOWZhNmIyZWQyMGRkO0lEPTdhZDYyYmI0LWQ2Y2YtMTFlNy1hZTNlLTlmYTZiMmVkMjBkZA==
JMS_CUSTOM_FIELD:MULE_ORIGINATING_ENDPOINT = endpoint.jms.job.queue
JMS_CUSTOM_FIELD:LABKEY_JOBID = b56c9169-b85d-1035-a396-e7f1a88b84ba
JMS_BODY_FIELD:JMSText = <org.labkey.ms2.pipeline.tandem.XTandemPipelineJob>
  <__dirSequenceRoot>file:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/</__dirSequenceRoot>
  <__fractions>false</__fractions>
  <__protocolName>Trypsin_phosphorylation_no_fractions</__protocolName>
  <__joinedBaseName>all</__joinedBaseName>
  <__baseName>xxxxxxxxxxxx</__baseName>
  <__dirData>file:/xxxxxxxxxxxxxx</__dirData>
  <__dirAnalysis>file:/sxxxxxxxxxxxxxxxxxxxxxxxxxxxx/</__dirAnalysis>
  <__fileParameters>file:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/tandem.xml</__fileParameters>
  <__filesInput class="java.util.Collections$SingletonList">
    <element class="file">file:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.mzXML</element>
  </__filesInput>
  <__inputTypes>
    <org.labkey.api.util.massSpecDataFileType>
      <__suffixes>
        <string>.msprefix.mzXML</string>
        <string>.mzXML</string>
      </__suffixes>
      <__antiTypes/>
      <__defaultSuffix>.mzXML</__defaultSuffix>
      <__contentTypes class="java.util.Collections$EmptyList"/>
      <__dir>false</__dir>
      <__preferGZ>false</__preferGZ>
      <__supportGZ>true</__supportGZ>
      <__caseSensitiveOnCaseSensitiveFileSystems>true</__caseSensitiveOnCaseSensitiveFileSystems>
      <__extensionsMutuallyExclusive>true</__extensionsMutuallyExclusive>
    </org.labkey.api.util.massSpecDataFileType>
  </__inputTypes>
  <__splittable>true</__splittable>
  <__parametersDefaults>
    <entry>
.......
      <string>pipeline, database</string>
      <string>ipi_human_plus.fasta</string>
 <__info>
    <__containerId>b1448d16-ee1c-1033-bfa5-0434ad51e74d</__containerId>
    <__urlString>xxxxxxxxxxxxxxxxxxxx/searchXTandem.view?</__urlString>
    <__userEmail>xxxxxx</__userEmail>
    <__userId>1064</__userId>
  </__info>
  <__jobGUID>b56c9169-b85d-1035-a396-e7f1a88b84ba</__jobGUID>
  <__parentGUID>b56c9110-b85d-1035-a396-e7f1a88b84ba</__parentGUID>
  <__activeTaskId>org.labkey.ms2.pipeline.tandem.XTandemSearchTask</__activeTaskId>
  <__activeTaskStatus class="org.labkey.api.pipeline.PipelineJob$TaskStatus">waiting</__activeTaskStatus>
  <__activeTaskRetries>0</__activeTaskRetries>
  <__pipeRoot class="org.labkey.pipeline.api.PipeRootImpl">
    <__containerId>4fb97543-cf7a-1029-9fe6-f528764d84fc</__containerId>
    <__uris>
      <java.net.URI>file:xxxxxxxxxx</java.net.URI>
      <java.net.URI>file:xxxxxxxxxxxx</java.net.URI>
    </__uris>
    <__entityId>4fb97544-cf7a-1029-9fe6-f528764d84fc</__entityId>
    <__searchable>false</__searchable>
    <__isDefaultRoot>false</__isDefaultRoot>
  </__pipeRoot>
 <__logFile>file:xxxxxxx.log</__logFile>
  <__interrupted>false</__interrupted>
  <__submitted>false</__submitted>
  <__errors>0</__errors>
  <__actionSet>
 
 
toan nguyen responded:  2017-12-01 15:23
How do I find out what is stuck/waiting for ?

JMS_CUSTOM_FIELD:LABKEY_TASKSTATUS = WAITING
 
jeckels responded:  2017-12-04 16:48
Hi Toan,

Please try the following to get your jobs moving again.

Shut down Tomcat, and then shut down ActiveMQ. Ensure that both have fully exited (via a "ps" command on Linux or Task Manager on Windows, making sure there are no "java" processes running. Also shut down any LabKey remote pipeline servers that might be connected to ActiveMQ on other servers.

Restart ActiveMQ.

Restart Tomcat.

Restart the LabKey remote pipeline servers (if applicable).

The server should automatically requeue jobs that were in flight prior to the problems. If they don't move, try submitting a new job and see if it makes progress.

If things still aren't working correctly, please post the labkey.log file from the web server's TOMCAT/logs directory.

Thanks,
Josh
 
toan nguyen responded:  2017-12-04 17:27
Hi Josh,

  Thank you for checking.

I restarted the processes lot of time. Here is the fresh ones.
I see labkey log file shows the jobs requeue but that is it.
The activemq queue showing the job in WAITING status and the submitted flag is "NOT SUBMITTED"

I am not sure what missing. I probably have missed some configuration. The thing is I don't know what it is waiting for.

root 6483 6482 40 16:57 pts/11 00:00:14 /usr/local/java/bin/java -Xmx512M -Dorg.apache.activemq.UseDedicatedTaskRunner=true -Dcom.sun.management.jmxremote -Djavax.net.ssl.keyStorePassword=password -Djavax.net.ssl.trustStorePassword=password -Djavax.net.ssl.keyStore=/usr/local/apache-activemq-5.1.0/conf/broker.ks -Djavax.net.ssl.trustStore=/usr/local/apache-activemq-5.1.0/conf/broker.ts -Dactivemq.classpath=/usr/local/apache-activemq-5.1.0/conf; -Dactivemq.home=/usr/local/apache-activemq-5.1.0 -Dactivemq.base=/usr/local/apache-activemq-5.1.0 -jar /usr/local/apache-activemq-5.1.0/bin/run.jar start xbean:/usr/local/apache-activemq-5.1.0/conf/activemq.xml

labkey 6566 1 99 16:58 pts/11 00:00:08 /usr/local/java/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms128m -Xmx2048m -XX:-HeapDumpOnOutOfMemoryError -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start

Thanks
Toan

INFO RequeueLostJobsRequest 2017-12-04 16:59:32,623 Thread-20 : Requeueing jobs for location webserver-fasta-check
INFO RequeueLostJobsRequest 2017-12-04 16:59:33,005 Thread-20 : Requeueing jobs for location webserver-high-priority
INFO RequeueLostJobsRequest 2017-12-04 16:59:33,135 Thread-20 : Requeueing jobs for location webserver
INFO RequeueLostJobsRequest 2017-12-04 16:59:35,861 Thread-20 : Requeueing jobs for location msdapl
 
Jon (LabKey DevOps) responded:  2017-12-12 15:03
Reviewed logs, but not finding any kind of smoking gun here. I can see the ActiveMQ jobs kicking back on, but nothing to indicate why they're not processing to completion.
 
toan nguyen responded:  2017-12-12 15:28
thanks I figured it out.