Error importing protein prophet prot.xml files

CPAS Forum (Inactive)
Error importing protein prophet prot.xml files rob_ewing50  2011-09-19 11:46
Status: Closed
 
Hello:
I get the following errors from time to time in importing protein prophet prot.xml files:
It seems to be something to do with the xml itself - but I cannot find the place in the file it refers to. Anyone seen this or know how to find the referenced line in the xml file?
I have attached the guilty prot.xml file.
I am using version 11 of labkey and TPP v4.3 JETSTREAM rev 1, Build 201101131447 to run peptide and protein prophet.

thanks.


18 Feb 2011 21:19:48,550 INFO : Importing MS/MS results is 93% complete
18 Feb 2011 21:19:48,860 INFO : Importing MS/MS results is 94% complete
18 Feb 2011 21:19:48,953 INFO : Importing MS/MS results is 95% complete
18 Feb 2011 21:19:51,080 INFO : Importing MS/MS results is 96% complete
18 Feb 2011 21:19:51,829 INFO : Importing MS/MS results is 97% complete
18 Feb 2011 21:19:51,937 INFO : Importing MS/MS results is 98% complete
18 Feb 2011 21:19:52,048 INFO : Importing MS/MS results is 99% complete
18 Feb 2011 21:19:53,324 INFO : Importing MS/MS results is 100% complete
18 Feb 2011 21:19:55,803 INFO : 25.02 seconds to import spectra from /mnt/rewing/cpas/labkey10/IP_CTNNB1/./bCat_IP_APCKI_HCT116_20Apr2010.tgz
18 Feb 2011 21:19:55,854 INFO : Starting to update SeqId column
18 Feb 2011 21:20:07,242 INFO : 11.39 seconds to update SeqId column
18 Feb 2011 21:20:07,299 INFO : Starting to update SequencePosition column
18 Feb 2011 21:20:10,759 INFO : 3.46 seconds to update SequencePosition column
18 Feb 2011 21:20:10,815 INFO : Starting to update peptide and spectrum counts
18 Feb 2011 21:20:10,934 INFO : 0.12 seconds to update peptide and spectrum counts
18 Feb 2011 21:20:10,981 INFO : ========================================
18 Feb 2011 21:20:11,023 INFO : Summary of all timed tasks:
18 Feb 2011 21:20:11,073 INFO :
18 Feb 2011 21:20:11,115 INFO : 0.20 seconds to clear out any previously imported data
18 Feb 2011 21:20:11,156 INFO : 5.23 seconds to import FASTA file
18 Feb 2011 21:20:11,199 INFO : 14.40 seconds to import peptide search results
18 Feb 2011 21:20:11,265 INFO : 25.02 seconds to import spectra
18 Feb 2011 21:20:11,316 INFO : 11.39 seconds to update SeqId column
18 Feb 2011 21:20:11,366 INFO : 3.46 seconds to update SequencePosition column
18 Feb 2011 21:20:11,416 INFO : 0.12 seconds to update peptide and spectrum counts
18 Feb 2011 21:20:11,466 INFO :
18 Feb 2011 21:20:11,516 INFO : 59.82 seconds to import "interact-bCat_IP_APCKI_HCT116_20Apr2010.pep.xml" and import spectra
18 Feb 2011 21:20:11,566 INFO : ========================================
18 Feb 2011 21:20:23,990 INFO : Starting to move data into ms2.PeptidesMemberships
18 Feb 2011 21:20:26,599 INFO : Finished with moving data into ms2.PeptidesMemberships after 2609 ms
18 Feb 2011 21:20:26,652 INFO : Starting to move data into ms2.ProteinGroupMemberships
18 Feb 2011 21:20:32,075 INFO : Finished with moving data into ms2.ProteinGroupMemberships after 5422 ms
18 Feb 2011 21:20:32,449 INFO : ProteinProphet file import finished successfully, 5110 protein groups loaded
18 Feb 2011 21:20:32,506 INFO : ProteinProphet import took 81 seconds.
23 May 2011 19:23:46,352 INFO : Starting to load ProteinProphet file /mnt/rewing/cpas/labkey10/IP_CTNNB1/./interact-bCat_IP_APCKI_HCT116_20Apr2010.prot.xml
23 May 2011 19:23:46,636 INFO : Resolved referenced PepXML file to /mnt/rewing/cpas/labkey10/IP_CTNNB1/./interact-bCat_IP_APCKI_HCT116_20Apr2010.pep.xml
23 May 2011 19:23:46,687 INFO : Starting import from interact-bCat_IP_APCKI_HCT116_20Apr2010.pep.xml
23 May 2011 19:23:46,825 INFO : Starting to clear out any previously imported data for interact-bCat_IP_APCKI_HCT116_20Apr2010.pep.xml
23 May 2011 19:23:47,142 INFO : 0.32 seconds to clear out any previously imported data for interact-bCat_IP_APCKI_HCT116_20Apr2010.pep.xml
23 May 2011 19:23:47,194 INFO : Starting to import FASTA file /data/public/sequence/ipi.HUMAN.v3.49.mod.fasta
23 May 2011 19:23:52,627 INFO : FASTA file "/data/public/sequence/ipi.HUMAN.v3.49.mod.fasta" has already been imported
23 May 2011 19:23:52,774 INFO : 5.58 seconds to import FASTA file /data/public/sequence/ipi.HUMAN.v3.49.mod.fasta
23 May 2011 19:23:52,826 INFO : Starting to import peptide search results for fraction 1, analysis of file null
23 May 2011 19:23:52,876 INFO : Importing MS/MS results is 0% complete
23 May 2011 19:23:53,158 INFO : Importing MS/MS results is 1% complete
23 May 2011 19:23:54,051 INFO : Importing MS/MS results is 2% complete
23 May 2011 19:23:54,436 INFO : Importing MS/MS results is 3% complete
23 May 2011 19:23:54,711 INFO : Importing MS/MS results is 4% complete
23 May 2011 19:23:55,430 INFO : Importing MS/MS results is 5% complete
23 May 2011 19:23:55,840 INFO : Importing MS/MS results is 6% complete
23 May 2011 19:23:56,232 INFO : Importing MS/MS results is 7% complete
23 May 2011 19:23:56,621 INFO : Importing MS/MS results is 8% complete
23 May 2011 19:23:56,837 INFO : Importing MS/MS results is 9% complete
23 May 2011 19:23:57,150 INFO : Importing MS/MS results is 10% complete
23 May 2011 19:23:57,446 INFO : Importing MS/MS results is 11% complete
23 May 2011 19:23:58,059 INFO : Importing MS/MS results is 12% complete
23 May 2011 19:23:58,638 INFO : Importing MS/MS results is 13% complete
23 May 2011 19:23:59,157 INFO : Importing MS/MS results is 14% complete
23 May 2011 19:23:59,988 INFO : Importing MS/MS results is 15% complete
23 May 2011 19:24:00,511 INFO : Importing MS/MS results is 16% complete
23 May 2011 19:24:01,020 INFO : Importing MS/MS results is 17% complete
23 May 2011 19:24:01,504 INFO : Importing MS/MS results is 18% complete
23 May 2011 19:24:02,129 INFO : Importing MS/MS results is 19% complete
23 May 2011 19:24:02,790 INFO : Importing MS/MS results is 20% complete
23 May 2011 19:24:03,612 INFO : Importing MS/MS results is 21% complete
23 May 2011 19:24:04,149 INFO : Importing MS/MS results is 22% complete
23 May 2011 19:24:04,761 INFO : Importing MS/MS results is 23% complete
23 May 2011 19:24:11,239 INFO : Importing MS/MS results is 24% complete
23 May 2011 19:24:12,084 INFO : Importing MS/MS results is 25% complete
23 May 2011 19:24:12,831 INFO : Importing MS/MS results is 26% complete
23 May 2011 19:24:13,363 INFO : Importing MS/MS results is 27% complete
23 May 2011 19:24:14,282 INFO : Importing MS/MS results is 28% complete
23 May 2011 19:24:14,860 INFO : Importing MS/MS results is 29% complete
23 May 2011 19:24:15,342 INFO : Importing MS/MS results is 30% complete
23 May 2011 19:24:21,546 INFO : Importing MS/MS results is 31% complete
23 May 2011 19:24:30,748 INFO : Importing MS/MS results is 32% complete
23 May 2011 19:24:31,149 INFO : Importing MS/MS results is 33% complete
23 May 2011 19:24:32,181 INFO : Importing MS/MS results is 34% complete
23 May 2011 19:24:34,221 INFO : Importing MS/MS results is 35% complete
23 May 2011 19:24:34,653 INFO : Importing MS/MS results is 36% complete
23 May 2011 19:24:35,257 INFO : 42.43 seconds to import peptide search results for fraction 1, analysis of file null
23 May 2011 19:24:35,298 INFO : Starting to import spectra from /mnt/rewing/cpas/labkey10/IP_CTNNB1/./bCat_IP_APCKI_HCT116_20Apr2010.tgz
23 May 2011 19:24:35,349 INFO : Importing MS/MS results is 37% complete
23 May 2011 19:24:35,570 INFO : Importing MS/MS results is 38% complete
23 May 2011 19:24:36,762 INFO : Importing MS/MS results is 39% complete
23 May 2011 19:24:37,084 INFO : Importing MS/MS results is 40% complete
23 May 2011 19:24:37,458 INFO : Importing MS/MS results is 41% complete
23 May 2011 19:24:38,080 INFO : Importing MS/MS results is 42% complete
23 May 2011 19:24:38,240 INFO : Importing MS/MS results is 43% complete
23 May 2011 19:24:38,609 INFO : Importing MS/MS results is 44% complete
23 May 2011 19:24:38,891 INFO : Importing MS/MS results is 45% complete
23 May 2011 19:24:41,572 INFO : Importing MS/MS results is 46% complete
23 May 2011 19:24:41,908 INFO : Importing MS/MS results is 47% complete
23 May 2011 19:24:42,726 INFO : Importing MS/MS results is 48% complete
23 May 2011 19:24:43,193 INFO : Importing MS/MS results is 49% complete
23 May 2011 19:24:43,738 INFO : Importing MS/MS results is 50% complete
23 May 2011 19:24:47,157 INFO : Importing MS/MS results is 51% complete
23 May 2011 19:24:48,118 INFO : Importing MS/MS results is 52% complete
23 May 2011 19:24:48,700 INFO : Importing MS/MS results is 53% complete
23 May 2011 19:24:49,137 INFO : Importing MS/MS results is 54% complete
23 May 2011 19:24:50,454 INFO : Importing MS/MS results is 55% complete
23 May 2011 19:24:52,374 INFO : Importing MS/MS results is 56% complete
23 May 2011 19:24:52,736 INFO : Importing MS/MS results is 57% complete
23 May 2011 19:24:52,959 INFO : Importing MS/MS results is 58% complete
23 May 2011 19:24:53,321 INFO : Importing MS/MS results is 59% complete
23 May 2011 19:24:53,547 INFO : Importing MS/MS results is 60% complete
23 May 2011 19:24:53,737 INFO : Importing MS/MS results is 61% complete
23 May 2011 19:24:54,088 INFO : Importing MS/MS results is 62% complete
23 May 2011 19:24:54,319 INFO : Importing MS/MS results is 63% complete
23 May 2011 19:24:54,522 INFO : Importing MS/MS results is 64% complete
23 May 2011 19:24:54,802 INFO : Importing MS/MS results is 65% complete
23 May 2011 19:24:56,054 INFO : Importing MS/MS results is 66% complete
23 May 2011 19:24:58,991 INFO : Importing MS/MS results is 67% complete
23 May 2011 19:25:08,471 INFO : Importing MS/MS results is 68% complete
23 May 2011 19:25:10,722 INFO : Importing MS/MS results is 69% complete
23 May 2011 19:25:11,014 INFO : Importing MS/MS results is 70% complete
23 May 2011 19:25:11,232 INFO : Importing MS/MS results is 71% complete
23 May 2011 19:25:11,410 INFO : Importing MS/MS results is 72% complete
23 May 2011 19:25:11,601 INFO : Importing MS/MS results is 73% complete
23 May 2011 19:25:13,334 ERROR: XMLStreamException in hasNext()
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[123448,141]
Message: Element type "analysis_result" must be followed by either attribute specifications, ">" or "/>".
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:588)
    at org.labkey.api.reader.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:246)
    at org.labkey.api.reader.SimpleXMLStreamReader.skipTo(SimpleXMLStreamReader.java:81)
    at org.labkey.api.reader.SimpleXMLStreamReader.skipToStart(SimpleXMLStreamReader.java:67)
    at org.labkey.ms2.reader.PepXmlLoader$FractionIterator.hasNext(PepXmlLoader.java:104)
    at org.labkey.ms2.PepXmlImporter.importRun(PepXmlImporter.java:86)
    at org.labkey.ms2.MS2Importer.upload(MS2Importer.java:199)
    at org.labkey.ms2.MS2Manager.importRun(MS2Manager.java:585)
    at org.labkey.ms2.MS2Manager.addRun(MS2Manager.java:570)
    at org.labkey.ms2.ProteinProphetImporter.importRun(ProteinProphetImporter.java:349)
    at org.labkey.ms2.ProteinProphetImporter.importFile(ProteinProphetImporter.java:78)
    at org.labkey.ms2.pipeline.ProteinProphetPipelineJob.run(ProteinProphetPipelineJob.java:69)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
23 May 2011 19:25:13,401 ERROR: MS2 import failed
java.lang.RuntimeException: XMLStreamException in hasNext()
    at org.labkey.ms2.reader.PepXmlLoader$FractionIterator.hasNext(PepXmlLoader.java:109)
    at org.labkey.ms2.PepXmlImporter.importRun(PepXmlImporter.java:86)
    at org.labkey.ms2.MS2Importer.upload(MS2Importer.java:199)
    at org.labkey.ms2.MS2Manager.importRun(MS2Manager.java:585)
    at org.labkey.ms2.MS2Manager.addRun(MS2Manager.java:570)
    at org.labkey.ms2.ProteinProphetImporter.importRun(ProteinProphetImporter.java:349)
    at org.labkey.ms2.ProteinProphetImporter.importFile(ProteinProphetImporter.java:78)
    at org.labkey.ms2.pipeline.ProteinProphetPipelineJob.run(ProteinProphetPipelineJob.java:69)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[123448,141]
Message: Element type "analysis_result" must be followed by either attribute specifications, ">" or "/>".
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:588)
    at org.labkey.api.reader.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:246)
    at org.labkey.api.reader.SimpleXMLStreamReader.skipTo(SimpleXMLStreamReader.java:81)
    at org.labkey.api.reader.SimpleXMLStreamReader.skipToStart(SimpleXMLStreamReader.java:67)
    at org.labkey.ms2.reader.PepXmlLoader$FractionIterator.hasNext(PepXmlLoader.java:104)
    ... 15 more
23 May 2011 19:25:13,484 ERROR: ProteinProphet load failed
java.lang.RuntimeException: XMLStreamException in hasNext()
    at org.labkey.ms2.reader.PepXmlLoader$FractionIterator.hasNext(PepXmlLoader.java:109)
    at org.labkey.ms2.PepXmlImporter.importRun(PepXmlImporter.java:86)
    at org.labkey.ms2.MS2Importer.upload(MS2Importer.java:199)
    at org.labkey.ms2.MS2Manager.importRun(MS2Manager.java:585)
    at org.labkey.ms2.MS2Manager.addRun(MS2Manager.java:570)
    at org.labkey.ms2.ProteinProphetImporter.importRun(ProteinProphetImporter.java:349)
    at org.labkey.ms2.ProteinProphetImporter.importFile(ProteinProphetImporter.java:78)
    at org.labkey.ms2.pipeline.ProteinProphetPipelineJob.run(ProteinProphetPipelineJob.java:69)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[123448,141]
Message: Element type "analysis_result" must be followed by either attribute specifications, ">" or "/>".
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:588)
    at org.labkey.api.reader.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:246)
    at org.labkey.api.reader.SimpleXMLStreamReader.skipTo(SimpleXMLStreamReader.java:81)
    at org.labkey.api.reader.SimpleXMLStreamReader.skipToStart(SimpleXMLStreamReader.java:67)
    at org.labkey.ms2.reader.PepXmlLoader$FractionIterator.hasNext(PepXmlLoader.java:104)
    ... 15 more
 
 
jeckels responded:  2011-09-19 12:21
Hello,

The error here is looks like it's in the pepXML file:

/mnt/rewing/cpas/labkey10/IP_CTNNB1/./interact-bCat_IP_APCKI_HCT116_20Apr2010.pep.xml

The line number within the file should be 123448. I don't think I've seen this exact error before, but I have seen similar ones that turned out to be TPP tools failing to XML encode protein names that contain special characters like ", &, <, or >.

I'd be curious to know what the 5-10 lines before and after line 123448 look like. Hopefully it's easy to manually fix up, but it would be great to see if we can prevent the generation of invalid files in the first place.

Thanks,
Josh
 
rob_ewing50 responded:  2011-09-19 12:39
hi Josh:
thanks for the quick response.
I was looking in the prot.xml rather than the pep.xml - hence unable to find offending lines.
There is indeed a protein description with "<" at line 123448 - here is the section of the file.
Will manually correct and retry.

thx


<spectrum_query spectrum="S-7_Apr_20_2010.5979.5979.2" start_scan="5979" end_scan="5979" precursor_neutral_mass="1811.9972" assumed_charge="2" index="5018">
<search_result>
<search_hit hit_rank="1" peptide="SLLEQYHLGLDQKLR" peptide_prev_aa="K" peptide_next_aa="K" protein="IPI:IPI00749506.1 Gene_Symbol=LOC91316 F<lambda>8 protein (Fragment)" num_tot_proteins="1" num_matched_ions="4" tot_num_ions="28" calc_neutral_pep_mass="1811.9893" massdiff="+0.0079" num_tol_term="2" num_missed_cleavages="1" is_rejected="0" protein_descr="Gene_Symbol=LOC91316 F<lambda>8 protein (Fragment)">
<lambda>
<lambda>
<lambda>
<lambda>
<search_score name="ionscore" value="0.98"/>
<search_score name="identityscore" value="24.15"/>
<search_score name="star" value="0"/>
<search_score name="homologyscore" value="13.28"/>
<search_score name="expect" value="10.37"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.0033" all_ntt_prob="(0.0000,0.0007,0.0033)">