Occasionally CPAS files are flagged as Errors, apparently due to data failing to meet Peptide Prophet and/or Protein Prophet score requirements. However, examination of Mascot reports for the offending datasets reveal at least some valid and high scoring peptide assignments, not null or obviously flawed data. The fact that the models may not have sufficient information for quality scoring should not kick them out of the processing stream. Ability to post process and review ALL valid result runs is required. |
|
Brian Connolly responded: |
2010-12-28 15:51 |
Sanford,
By default, when performing a search via CPAS, the following will be done
1) The search engine will be run (MASCOT in this case)
2) PeptideProphet will be run on the search engine results
3) ProteinProphet will be run using the PeptideProphet results
(NOTE: there are other tasks/programs that can be added to the pipeline processing is desired, but that is for another time)
If all 3 of these tasks finish successfully, then CPAS will import the results into the database. If any of them fails, CPAS will mark the search as ERROR and will not import the search results into the database.
The CPAS search pipeline assumes that the successful execution of all tasks is required for the search to be considered COMPLETE. And an error by any of them is not acceptable.
In your case, it seems that you are able to get satisfactory results via the search engine, but PeptideProphet and ProteinProphet are erroring out. The work around to import the search engine results is to
1) click on "Process and Import Data"
2) Goto the directory which holds the file you just searched.
3) Open the mascot sub-directory and then the sub-directory with the same name as your protocol (in this case "remainingS2_filesDec21_Copy6")
4) Select the pepXML file in this directory. It should be named something like "09MAR16_OT_03.pep.xml"
5) Click on Import Data button
This will import the search engine results in the CPAS database
There currently is no way to perform a MASCOT or TANDEM search without also executing the TPP tools (ie ProteinProphet and PeptideProphet). Is that something you would be interested in or is this an isolated situation?
Thank you,
Brian |
|
markeys responded: |
2011-01-03 08:29 |
Thanks for suggesting the work-around that enables viewing of the pepXML.raw files using the viewing tools in CPAS. We do think there is reason to perform a MASCOT or TANDEM search without also executing the TPP tools as an option. In the examples from our lab, 3 of 109 runs failed the PeptideProphet tests. Two of these contain some valid peptide identifications, manually verified.
We use CPAS to process several hundred runs in each experiment, and then export the resulting data(pepXML or TANDEM or Mascot files) to another software tool (MassSieve) to analyze and compare/contrast data from the collective runs. MassSieve allows user-set scoring filters so the user can determine thresholds for the group comparisons (expectation score,number of times peptide observed, number of peptides/protein). MassSieve would be a useful addition to CPAS. |
|
markeys responded: |
2011-01-06 07:22 |
Update - adding PeptideProphet parameter modifications (decoy tag and accurate mass) to the Mascot processing protocol reduced the problem; a more complete solution would be to process raw_pepXMLs from multiple runs through PeptideProphet as a single file. That seems to be an option when the Globus cluster server is used with CPAS. |
|
jeckels responded: |
2011-01-06 09:31 |
Originally, you needed to be using the cluster version of the pipeline to combine the results of multiple input mzXMLs into a single resulting pepXML or protXML, but that's no longer the case.
If you add:
<note label="pipeline, data type" type="input">fractions</note>
to your search protocol, you'll get the behavior you're describing.
If you've seen documentation that still refers to this only being available in the cluster pipeline, please send us a link and we'll revise it.
Thanks,
Josh |
|
markeys responded: |
2011-01-06 12:33 |
Thanks, that is very useful guidance - we tested with 6 runs and the resulting pepXML is fine. The pepXML contains the original run file information, but that does not seem an accessible parameter to view on CPAS (scan number is no longer specific when multiple runs are combined).
In the Configure Common Parameters page, we overlooked the "pipeline, data type" instructions, and skipped to the "Globus and Cluster Configuration Parameters" accounting for the misunderstanding. |
|
jeckels responded: |
2011-01-06 13:34 |
You can view the source mzXML file for each peptide. Depending on the view that you're using, either do Views->Customize View and add the Fraction->File Name column, or click the Pick Peptide Columns button and add the FractionName column.
Thanks,
Josh |
|
|
|