This topic covers how to register a new nucleotide sequence using the graphical user interface. To register using the API, or to bulk import sequences from an Excel spreadsheet, see Use the Registry API
Nucleotide Sequence Validation
For nucleotide sequences, we allow DNA and RNA bases (ACTGU) as well as the IUPAC notation
degenerate bases (WSMKRYBDHVNZ). On import, whitespace will be removed from a nucleotide sequence. If the sequence contains other letters or symbols, an error will be raised.
For protein sequences, we only allow standard amino acids letters and zero or more trailing stop codon '*'. On import, whitespace will be removed from a protein sequence. If the sequence contains stop codons in the middle of the sequence or a other letters or symbols, an error will be raised.
When translating a nucleotide codon triple to a protein sequence, where the codon contains one or more of the degenerate bases, the system attempts to find a single amino acid that could be mapped to by all of the possible nucleotide combinations for that codon. If a single amino acid is found, it will be used in the translated protein. If not, the codon will be translated as an 'X'.
For example, the nucleotide sequence 'AAW' is ambiguous since it could map to either 'AAA' or 'AAT' (representing Lysine and Asparagine respectively), so 'AAW' will be translated as an 'X' However, 'AAR' maps to either 'AAA" or 'AAG' which are both are translated to Lysine, so it will be translated as a 'K'.
Create a Nucleotide Sequence
To add a new nucleotide sequence to the registry:
The wizard has two tabs:
On the Register a new Nucleotide Sequence
page, in the Details
panel, populate the fields:
- Name: Provide a name, or one will be generated for you. Hover to see the naming pattern
- Description: (Optional) A text description of the sequence.
- Alias: (Optional) Alternative names for the sequence. Type a name, click enter when complete. Continue to add more as needed.
- Nucleotide Sequence Parents: (Optional) Parent components. A related sequence the new sequence is derived from, for example, related as a mutation. You can select more than one parent. Start typing to narrow the pulldown menu of options.
- Sequence: (Required) The nucleotide sequence
- Annotations: (Optional) A comma separated list of annotation information:
- Name - a freeform name
- Category - region or feature
- Type - for example, Leader, Variable, Tag, etc.
- Start and End Positions are 1-based offsets within the sequence.
Review the details on the Confirm
Options to complete registration:
- Finish: Register this nucleotide sequence and exit.
- Finish and translate protein: Both register this nucleotide sequence and register the corresponding protein. This option will take you to the registry wizard for a new protein, prepopulating it with the protein sequence based on the nucleotide sequence you just defined.