Mascot results DO NOT LINK correctly in CPAS 1.7

CPAS Forum (Inactive)
Mascot results DO NOT LINK correctly in CPAS 1.7 ioannis.moutsatsos  2007-04-11 15:00
Status: Closed
 

Greetings to all.
I have donwloaded and installed CPAS v1.7 on a workstation running WinXP-Pro with 2Gb Memory and Firefox 2.0.

I'm now testing CPAS with Mascot running on a Linux server. The Mascot server is versions 2.1.03
I have configured the Mascot server and search parameters according to the online instructions. A number of steps appear to be working as it is shown in the attached log file Test16.zip (such as the Creation and submission of the search task, search resutl retrieval, as well as the creation of the pep.xml file)

MS2 results can be viewed when Grouping: none When grouping is set to Protein Collapsed or Protein Expanded the proteins are shown but the peptides are missing. I have traced this issue to null values in the ms2.peptidesdata.seqid column for these peptides. When I manually update the seqid column with the right value, the peptides can now be seen.

I have also noticed that protein IDs stored in the ms2.peptidesdata.protein column are different for Mascot and X!Tandem.
For example, the following 2 searches store the following value in the ms2.peptidesdata.protein column

  • Mascot: IPI00296337
  • X!Tandem: IPI:IPI00296337.2|SWISS-PROT:P78527-1|ENSE......
  • Finally when I try to view Grouping: Protein Prophet Collapsed or Protein Prophet Expanded neither proteins nor peptides are returned. X!Tandem peptide/protein results are viewed fine under all circumstances.

    In every case, I'm using the same fasta database for CPAS as the one that the Mascot server is using (IPI_human v3.25)


    Could you please advice as to what might be happening here. Any help would be greatly appreciated.
     
     
    wongch responded:  2007-04-12 10:02
    Assigned To: wongch
    In brief, this is more of the way that seqid is looked up and complicated by the many possible ways to extract an identifier from the concatenated protein identifiers.

    1) The way that we work around this locally is to process the IPI_human databases to generate a fasta database in the form of:
    >IPI00296337 IPI:IPI00296337.2|SWISS-PROT:P78527-1|ENSE...
    ...

       This database is used by both X! Tandem and Mascot so that the ID matches.

    2) Alternatively, you can also configure your Mascot server to take every character before the space as the protein id. This will require increase in your Mascot's config file for max. protein id length.

    Approach 1) accommodates Mascot while approach 2) accommodates X! Tandem.

    This is not optimal. We should file an issue and find a more convenient way for the user.
     
    jbdamask responded:  2007-05-21 14:55
    Hi
    Would a third option be to remove the hard-coded "-shortid" switch from the MascotImportPipelinejob class (line 109).
    When I run my .dat files through TPP w/out this switch I see the full protein name which, presumably, would match the name in the local database.
    Is this right?

    regards,
    john
     
    wongch responded:  2007-05-21 17:55
    Hi John,

    Yes, this option is used the upcoming v2.1. If you do so, please get the newer code for Mascot2XML from Sashimi CVS as I have fixed and enhanced a number of issues related to the name lookup.

    *Chee-Hong
     
    jbdamask responded:  2007-05-22 05:37
    Thanks Chee-Hong
    I see v2.0 was released April 5 and the dev cycle is 12wks - will v2.1 be released in early July or will a patched version be available sooner?

    John
     
    wongch responded:  2007-05-22 06:10
    Hi John,

    LabKey has included a calendar link containing specific dates for the current release in the Wiki page "/Home/Developer/documentation/2.0/Development Cycle". (At the end of the page.)

    link:https://www.labkey.org/Wiki/home/Developer/documentation/2.0/page.view?name=devCycle

    Likely to be in early June if it is smooth.

    *Chee-Hong