serve to group together Molecules (ex: antibodies or proteins) around common portions of protein sequences, once signal or leader peptides have been cleaved. Molecule Sets server as the "common name" for a set of molecules. Many Molecules may be grouped together in a single Molecule Set.Molecular Species
serve as alternate forms for a given molecule. For example, a given antibody may give rise to multiple Molecular Species: one Species corresponding to the leaderless, or "mature", portion of the original antibody and another Molecular Species corresponding to its "mature, desK" (cleaved of signal peptides and heavy chain terminal lysine) form.
Both Molecular Species and Sets are calculated and created by the registry itself. Their creation is triggered when the user adds a Molecule to the registry. Users can
also manually register Molecular Species, but generally do NOT register their own Molecule Sets. Detailed triggering and creation rules are described below.
When a molecule is added to the registry, the following additional entities are calculated and added, depending on the nature of the molecule. These additional entities can include:
- Molecular Species
- Molecule Sets
- Other Protein Sequences
can be created (= registered) based on one or more of the following:
- a protein sequence
- a nucleotide sequence
- other molecules
Rules for Entity Calculation/Creation
When the molecule contains only protein sequences
- A mature molecular species is created, consisting of the leaderless segments of the protein sequences, provided a leader portion is identifiable. The leader segment has to be:
- Have annotation start with residue #1
- The annotation Type is "Leader"
- The annotation Category is "Region"
- (If no leader portion is identifiable, then the species will be identical to the Molecule which has just been created.)
- Additionally, a mature desK molecular species is created (provided that there are terminal lysines on heavy chains).
- New protein sequences are created corresponding to any species created, either mature, mature des-K, or both. The uniqueness constraints imposed by the registry are in effect, so already registered proteins will be re-used, not duplicated.
- A molecule set is created, provided that the mature molecular species is new, i.e., is not the same (components and sequences) as any other mature molecular species of another molecule. If there is an already existing mature species in the registry, then the new molecule is associated with that set.
When the molecule contains anything in addition
to protein sequences, then:
- Physical properties are not calculated.
- Molecular species are not created.
- A molecule set is created, which has only this molecule within it.
Aliases and Descriptions
When creating a Molecule, the auto-generated Molecule Set (if there is a new one) will have the same alias as the Molecule. If the new Molecule is tied to an already existing Molecule Set, the alias is appended with alias information from the new Molecule.
When creating a molecule, the auto-generated molecular species (both mature and mature desK) should have the same alias as the molecule. Similarly, when creating a molecular species from a molecule (manually), the alias field will pre-populate with the alias from the molecule.
For molecular species that are auto-generated, if it is creating new protein sequences (one or more) as the components of that molecular species, they have a Description:
- Mature of “PS-15”
- Mature, desK of “PS-16”
For molecular species that are auto-generated, if it is using already existing protein sequences (one or more) as the components of that molecular species, they have appended Descriptions based on where it came from:
- Mature of “PS-17”
- Mature, desK of “PS-18”