Best Protein Name vs. Name from FASTA mixup

CPAS Forum (Inactive)
Best Protein Name vs. Name from FASTA mixup tvaisar  2013-05-07 06:57
Status: Closed
 
I ran into an issue when I started to use Uniprot database for my database searches - although searching against Uniprot FASTA I was getting back a mix of different IDs (UniprotID and IPI IDs) when I uploaded and looked at the data in Labkey server. I changed the "Best name" to "Name from FASTA file" and things got better, but I was getting lot of UniprotIDs which I could not find online in Uniprot database either because they were updated or because they were removed. I thought it was because my database was not the current one so I uploaded and did my search against a current version of the Uniprot database. When uploaded to Labkey server I am now getting correct UniprotIDs when I look at individual samples, but when I use "Compare" for multiple samples it reverts to reporting the "outdated" protein IDs.
I can see that when I look up that particular protein ID the header contains both the ID from FASTA file but after it in parentheses is the "old" incorrect ID.
I tried to use the Sequences schema in the Query, but I do not see a field/variable which would correspond to the "Name from FASTA". I can only see "First Name" or "Best Name".

Any suggestions?

Tomas Vaisar
 
 
jeckels responded:  2013-05-08 10:05
Hi Tomas,

I'd recommend that you set the "Best Name" for the proteins to the name from the updated FASTA file. You can do this by going to Admin->Admin Console->Protein Databases. Find the most current version of the FASTA file in the "FASTA Files" list. Check the box in front of it. From the button bar, choose Set Protein Best Name->To Name From FASTA.

You should then be able to use the "Best Name" field and have it populated with the name you want to be using.

Thanks,
Josh
 
tvaisar responded:  2013-05-08 10:43
Hi Josh,

Thanks for the response. This is indeed what I did. However, I find it works only if you are looking at one set at a time. Once you use the Compare function (and it does not matter whether it is Search Engine or Protein Prophet (query)) you start getting mix of IDs - mostly ones which do correspond to the FASTA file but also some which do not.
When I go to make a custom Query in MS2 > Sequences I run into the same problem - some IDs in the Best name column are the correct ones and some are clearly old.
I attach screen shots showing this. This IDs like tr|XXXXX|XXXXX_MOUSE are the ones from the current database, those like XXX_MOUSE are from previous versions (not sure from where).

Thanks,

Tomas