|
brendanx responded: |
2006-06-12 11:00 |
I have a fix for this, graciously provided by Wong Chee Hong of Singapore. I will try to get it checked into the Trans Proteomic Pipeline today, and attach a new Windows version of Mascot2XML to this support message soon. You may also be interested to know that Chee Hong and collegues are working on a full Mascot pipeline (both web-server-local and cluster compatible) for CPAS, which they have agreed to contribute to the project. |
|
brendanx responded: |
2006-06-12 12:16 |
I have committed that change to Sashimi CVS. If you are running on Windows, you can find a new Mascot2XML.exe attached to this message. Let me know how it works. (NOTE: On 7/18 I updated the Mascot2XML.exe with a more complete fix. -brendan) |
|
|
msun responded: |
2006-06-12 16:57 |
Thanks for the quick response and insight with regards to the upcoming Mascot pipeline. It looks interesting indeed, I can't wait to try it out once its done. I am using Linux over here. So I checked on the Sashimi CVS and saw the update to MascotConverter.cxx. I pulled the tpp package down and compiled; however, I am still getting the same error when uploading the resulting pepXML file from the new Mascot2XML. Thanks,
Mark |
|
brendanx responded: |
2006-06-12 20:35 |
Are you positive the newly built version is running? I'd suggest adding a printf to the code just above where the next_aa and prev_aa attributes get written: printf("peptide="%s" prev_aa="%c" next_aa="%c"",
peptides_k, prev_AAs_k, next_AAs_k); Then rebuild and rerun. If the version you are building gets run, then you will see extra output about each peptide in the log, telling you what the values were. Since, you are not seeing the attributes written to your pepXML, we should expect to see a whole lot of lines like: peptide="AAAAAAAA" prev_aa="?" next_aa="?" in the log. If that is indeed the case, then we'll have to debug further from there to understand why the values are not being set. |
|
wongch responded: |
2006-06-12 20:38 |
Hi Mark, Is the error message "ERROR: null value in column "prevaa" violates not-null constraint" or some other? What is the command that you have used? Can I trouble you to check if you can find the text "peptide_prev_aa" or "peptide_next_aa" in your Mascot2XML output?
- If yes, and you are still getting this error message, you might want to check that the database you specified is the same one uploaded to Mascot server. (i.e. some peptides have these attributes but others do not 'cos the protein cannot be found in the sequence database)
2. If no, you might want to use Win32 binary provided by Brendan to check if the problem is re-producible. (Just need to check for the presence of "peptide_prev_aa" or "peptide_next_aa".) Cheers,
Chee-Hong. |
|
msun responded: |
2006-06-13 10:18 |
Hi Brendan and Chee-Hong, Chee-Hong: The error I am getting is indeed ..."prevaa" violates not-null constraint... I placed the printf statement modified to:
printf("peptide="%s" prev_aa="%c" next_aa="%c"", peptides_k, prev_AAs_k, foll_AAs_k); above the if(prev_AAs_k != '?') conditional (line 2444 before modification). STDOUT gives prev_aa="?" next_aa="?" for all peptides, so not surprisingly there are no "peptide_prev_aa" or "peptide_next_aa" in the Mascot2XML output file. Taking a peek into the generated XML file shows:
<search_hit hit_rank="1" peptide="EWEYSYK" protein="Q5C036_SCHJA" num_tot_proteins="1" num_matched_ions=" 4" tot_num_ions=" 12" calc_neutral_pep_mass="1003.4286" massdiff="+0.2" num_tol_term="2" num_missed_cleavages="0" is_rejected="0">
So it appears we are missing only the peptide_prev/next_aa values that you've identified earlier. The command I used for Mascot2XML was:
Mascot2XML F004268.dat -D/opt/mascot/sequence/MSDB/current/MSDB_20050929.fasta I am not sure if this will help with the debugging, but I should mention that a large number of the scans are being set to 000. Some example error output:
active: query22
Warning: could not find scan numbers of spectrum File:blank005Jun52006.wiff,Sample:blank005(samplenumber1),Elution:35.96min,Period:1,Cycle(s):2052(Experiment2)(Chargenotautodetermined)
Set to 0000 I will try out Brendan's Win32 binary once I can get Cygwin up and running. Once again, thanks for all your help,
Mark |
|
brendanx responded: |
2006-06-13 12:50 |
That is a native Windows binary. No Cygwin required. I would next search your MascotConverter.cxx for 'prev_aa'. You can see the two locations where it should get set. Use printf debugging around there to figure out why the code is never called. For instance, you'll see the conditional: if(match != NULL) { If that is true, it appears to me that you can't possibly end up with the '?' character. Do you ever enter there? Could you possibly have the wrong FASTA file? Also, attach your copy of MascotConverter.cxx, so just to sanity check that it is the head CVS revision… |
|
brendanx responded: |
2006-06-13 12:51 |
I said no Cygwin required, but it does require some other DLLs included with the Windows installation of CPAS. You probably want to install CPAS on the machine before trying to use the binary I sent. |
|
msun responded: |
2006-06-14 16:54 |
Hi Brendan and Chee-Hong, Thanks for all your input. It has greatly helped with the debugging aspect (errors I've made that prevent prev_aa assignment). I've made some interesting discoveries. The new Mascot2XML works to a large extent in that a large proportion of the peptide_prev/next_aa attributes are getting assigned; however, there are occurances where proteins are not getting assigned their peptide_prev/next_aa values assigned despite their existance in the MSDB fasta file. I've attached the source dat file and the resulting pepXML file from my compiled Mascot2XML (the files contain very few spectra, but demonstrate the point nonetheless). Once such example is the protein Q7SZW4_BRARE. The Win32 Mascot2XML binary posted by Brendan also produced the same problem. The MSDB file I'm matching against was from the automatic download which as the name: MSDB_20050929x.fasta We are currently using Mascot Daemon 2.1 which has the qXXX_pX_terms=prevaa1,nextaa1;prevaa2,nextaa2;etc. line in the dat file. So it seemed rather wasteful to run through the entire fasta file again given that the parameters are already present in the dat file. Consequently, I thought it might be beneficial if a flag of some sort was set to allow users of Mascot Daemon 2.1 to skip the final search and just parse the dat file for the prev/next_aa information to save a significant amount of time. I've tried placing the generated pepXML file from the new Mascot2XML into CPAS again. Since the first protein in the file has the peptide_prev/next_aa attributes set, it appears to uploads the 1GB MSDB fasta file but then dies producing:
14 Jun 2006 08:59:31,076 INFO : Clearing out existing MS2 data for F004268-tt.pep.xml
14 Jun 2006 08:59:32,039 INFO : Finished clearing out existing MS2 data for F004268-tt.pep.xml
14 Jun 2006 08:59:32,061 INFO : Loading FASTA file
14 Jun 2006 12:16:19,808 ERROR: Uncaught exception in PiplineJob: 22(DONE) test01/F004268-tt (F004268)
java.lang.OutOfMemoryError: Java heap space Further attempts to upload files resulted in the same error even when I disabled the upload FASTA file option. This is on a computer with 1.5GB of RAM and 2GB of swap, which makes this error all the more strange. If you need me to do anything, I'll be more than happy to help out. Thanks,
Mark |
|
|
msun responded: |
2006-06-14 17:02 |
I should clearify one thing. After I attempted to upload the pepXML file and getting the OutOfMemoryError, I tried to upload another pepXML file, but with the upload FASTA file option turned off. This also resulted in an error, but stated:
org.postgresql.util.PSQLException: No results were returned by the query. Attached is the full log file. Thanks,
Mark |
|
|
Matthew Bellew responded: |
2006-06-14 17:33 |
Java has a default max memory usage, regardless of the amount of memory on your machine. To change it use the -Xmx option. Find whatever script launches tomcat, and try adding something like "-Xmx1024M" right after "java". |
|
vensel responded: |
2006-06-19 10:26 |
Hi I am trying to run the MASCOT2XML.exe program on windows. I keep getting the error: The application has failed to start because zlib1.dll was not found. I have tried this on two different window machines, one with version 1.4 of CPAS installed and the other with a cygwin. The application will not run on either machine under fdos or cygwin. Thabks, Bill |
|
brendanx responded: |
2006-06-19 10:31 |
The MASCOT2XML.exe that comes with CPAS is a native Windows EXE, so cygwin is not required. If you have a default install of CPAS, zlib1.dll should be sitting in the same directory as MASCOT2XML.exe. Is it? Is this directory on your path? |
|
msun responded: |
2006-06-19 10:40 |
Hi Matthew, I used the -Xmx1024M option and let uploaded a FASTA file over the weekend and it seems to have worked as I have yet to receive an OutOfMemoryError. Thanks! Bill: You can also do a search for zlib1.dll and copy that to the directory in which Mascot2XML.exe resides if you don't want to set paths. Mark |
|
vensel responded: |
2006-06-19 12:54 |
Thanks for the suggestions. I set the path and then tried to run MASCOT2XML.exe as follows
C:MASCOT_SHAREmascot2xml_TEST>mascot2xml.exe *.dat -DC:INETPUBMASCOTsequenceBrachypodiumRScurrentBrachypodium_Em
boss_Rs_060613.fasta -E trypsin
filepath: C:MASCOT_SHAREmascot2xml_TEST, extn: F004572
active: parameters
mod: Oxidation (M) with mass: 0 variable index: 0
mod: M with mass: 147.035 variable index: 1 So it seemed as if it was running okay, but then Trypsin....
aa: C mass: 160.031 index: 0 static
aa: M mass: 147.035 index: 1 variable
searching c:INETPUBMASCOTsequenceBrachypodiumRScurrentBrachypodium_Emboss_Rs_060613.fasta.....5%.....10%.....15%..
...20%.....30%.....35%.....40%.....45%.....50%.....60%.....65%.....70%.....75%.....80%.....90%.....95%...done
65. opening Mon_Jun_19_12-47-59_2006.spectrum65.0598.0598.4.out
directory: C:MASCOT_SHAREmascot2xml_TESTconverter
ls > C:MASCOT_SHAREmascot2xml_TESTlist.txt
bsdtar czf C:MASCOT_SHAREmascot2xml_TESTF004572.tgz --files-from=C:MASCOT_SHAREmascot2xml_TESTlist.txt
C:MASCOT_SHAREmascot2xml_TESTF004572.tgz
current: C:MASCOT_SHAREmascot2xml_TEST
rm -f C:MASCOT_SHAREmascot2xml_TESTlist.txt
rm -f -r C:MASCOT_SHAREmascot2xml_TESTconverter
warning: cannot open "F004572.mzXML" for reading MS instrument info. Any suggestions welcome. Thanks, Bill |
|
brendanx responded: |
2006-08-10 07:59 |
This is just a warning that the converter cannot find a mzXML file from which to read information about your mass spectrometer. Specifically, this warning means the resulting mzXML file will be missing the following attributes from the <msms_run_summary> tag: msManufacturer
msModel
msIonization
msMassAnalyzer
msDetector The converter is looking for an mzXML with the same base name as the input file. Note that I say _the_ input file. It does not look to me like Mascot2XML is set up to handle multiple input files, so unless "*.dat" above resolves to a single file, you may be using the program incorrectly. So, if you are converting basename.dat, then the converter tries to find basename.mzXML, and issues the warning above if the file does not exist. All this said, really the place to look for help with using tools from the Sashimi Trans Proteomic Pipeline is the Sashimi project itself. Here is a link to their SourceForge.net forums: http://sourceforge.net/forum/?group_id=69281 |
|
adam responded: |
2007-01-03 16:02 |
|
|
|
|