Hi,
When trying to import a mascot dat file, I got the following error:
ERROR: XMLStreamException in hasNext()
com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </search_hit>; expected </gamma>.
at [row,col {unknown-source}]: [14607,18]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:605)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3256)
at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3198)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2830)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at org.labkey.common.tools.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:246)
at org.labkey.common.tools.SimpleXMLStreamReader.skipTo(SimpleXMLStreamReader.java:78)
at org.labkey.common.tools.SimpleXMLStreamReader.skipToStart(SimpleXMLStreamReader.java:64)
at org.labkey.common.tools.PepXmlLoader$FractionIterator.hasNext(PepXmlLoader.java:101)
at org.labkey.ms2.PepXmlImporter.importRun(PepXmlImporter.java:88)
at org.labkey.ms2.MS2Importer.upload(MS2Importer.java:181)
at org.labkey.ms2.MS2Manager.importRun(MS2Manager.java:426)
at org.labkey.ms2.pipeline.MS2ImportPipelineJob.run(MS2ImportPipelineJob.java:85)
at org.labkey.ms2.pipeline.mascot.MascotImportPipelineJob.run(MascotImportPipelineJob.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
This would indicate a problem with the XML file, so here is a snippet of the pepXML file:
<spectrum_query spectrum="Wed_Sep__3_10-16-48_2008.spectrum1640.0000.0000.2" start_scan="0000" end_scan="0000" precursor_neutral_mass="910.4860" assumed_charge="2" index="1319">
<search_result>
<search_hit hit_rank="1" peptide="EPWAPSPQ" peptide_prev_aa="R" peptide_next_aa="-" protein="A0N5T0" num_tot_proteins="1" num_matched_ions="6" tot_num_ions="14" calc_neutral_pep_mass="910.4185" massdiff="+0.0675" num_tol_term="2" num_missed_cleavages="0" is_rejected="0" protein_descr="V<gamma>1 protein (Fragment) OS=Homo sapiens GN=V<gamma>1 PE=4 SV=1">
<search_score name="ionscore" value="4.17"/>
<search_score name="identityscore" value="33.17"/>
<search_score name="star" value="1"/>
<search_score name="homologyscore" value="17.02"/>
<search_score name="expect" value="39.76"/>
</search_hit>
</search_result>
</spectrum_query>
It seems pepXML does generate a well-formed XML file, but the pepXML parser in Labkey-CPAS uses a too naive parsing algorithm that isn't XML-compliant, as the "<gamma>" tags are in between quotation marks and thus are not part of the XML markup.
The stacktrace does indicate a custom parser is used instead of a proper XML parser, which confirms my conclusion.
The job itself is flagged as COMPLETE, even though this is clearly a fatal error.
-Aschwin |
|
aschwin.vanderwoude responded: |
2008-09-03 01:13 |
Hmm xmllint also complained about the XML file, so apparently the "<>" characters aren't allowed to be used within values of attributes. Or perhaps some parsers are more strict than others, although in my opinion an xml parser should allow for such characters in places where they don't have meaning.
Any way, the problem can be solved, until a proper fix exists, by using the following wrapper script for Mascot2XML, when running Labkey on Linux.
#!/bin/sh
Mascot2XML.bin $@
FILE=`echo $1|sed -e 's/.dat/.xml/'`
OLDFILE=`echo "$FILE.old"`
mv $FILE $OLDFILE
cat $OLDFILE|sed -r -e 's/<gamma>|<kappa>/ /g' > $FILE
rm $OLDFILE |
|
aschwin.vanderwoude responded: |
2008-09-03 01:59 |
Although the run completed, and the log doesn't present any sort of trouble, the data doesn't show up in the "MS2 run" list on the MS2 dashboard.
If I do "Process and Import data" and perform "Import peptides" one of dat.files, I am presented with all the MS2 runs importing dat files. Each of them seem to present me with the mascot data, but I cannot compare between them and ProteinProphet data is unavailable as well.
Is mascot dat-file import non-functional in general at the moment?
-Aschwin |
|
jeckels responded: |
2008-09-05 09:35 |
Hi Aschwin,
I believe that the CPAS XML parser is correctly enforcing the XML spec by requiring that '<' be encoded in an attribute value, though it does seem like that might not be truly necessary.
If you're directly importing MS2 search results, you will unfortunately need to go to your folder's main portal page and add the 'MS2 Runs' web part. The default list of MS2 runs is actually the 'MS2 Runs (Enhanced)' which adds functionality but only shows runs that include experimental metadata, which the CPAS pipeline automatically creates. I know this is confusing and it's been a known issue for some time but it hasn't bubbled to the top of any of our clients' priority lists.
Thanks,
Josh |
|
|
|