Error uploading human fasta file

CPAS Forum (Inactive)
Error uploading human fasta file msun  2006-06-19 12:02
Status: Closed
 
Hi,

It appears that CPAS only accepts particular characters in fasta files. I've tried uploading human_e.fasta (ftp://ftp.thegpm.org/fasta/ensembl/) and ipi.HUMAN.fasta (ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/) resulting in the following error:
org.postgresql.util.PSQLException: ERROR: duplicate key violates unique constraint "uq_proteinsequences_databaseid_lookupstring"

Attached is the full error log.

This occurs even when the CPAS installation is fresh and the database is empty.

Mark

 
 
msun responded:  2006-06-20 13:36
I found that if you use "Update SeqIDs" as oppose to the "Reload" button, all fasta information makes it through. Hope this saves somebody some time.

Mark

 
adam responded:  2006-07-19 16:13
Note that this error occurs any time you attempt to load a FASTA file that has duplicate sequences with the same "name." We use a complicated rule to determine the exact name, but basically it's the sequence description up to the first space. Many search engines use this name to refer to the putative protein in results.

We are failing on purpose here. If your FASTA contains multiple sequences with the same name then any reference to those protein names will be ambiguous… we won't be able to link the search engine results with the proper sequence.

We should provide a better error message. We may also be able to change the way we handle this name and perhaps start using a longer version of "name" that would (hopefully) disambiguate the sequences. This would require that all modern search engines give us a long version of the name, but that may be the case now.

 
adam responded:  2007-01-04 09:04