This topic covers the process of assigning chain and structure formats in Biologics LIMS.
Overview
After the classification engine has determined the pattern of regions within the sequence, the chain and structure format information that was provided is used to first match the regions to a chain pattern and then to match the assigned chain formats to a unique structure format.
If antibody regions are present in a ProteinSeq, but it does not match a chain format, it will be assigned a chain format of "Unrecognized Antibody Chain" and the Molecule is assigned a structure format of "Unrecognized Antibody Format".
If no antibody regions are present, the ProteinSeq is assigned a chain format of "Non-Antibody Chain" and the Molecule structure format is set to "Non-Antibody".
In order for changes to the chain and structure formats to take effect, an administrator needs to clear caches from the Administration Console.
Note that by default, the classification engine is configured to detect antibody and not TCR regions.
Chain Formats
Chain formats are stored in the ChainFormats List.
A chain format specification is specified at the region level using region abbreviations. Recognized regions are listed in the following table.
Region | Abbreviation |
---|
Leader | Ldr |
Light Variable | KV/LmdV |
Light Constant | KCnst/LmdCnst |
Kappa Leader | KLdr |
Kappa Variable | KV |
Kappa Constant Ig Domain | KCnst-Ig |
Post Kappa Constant Ig Present in some species as a short C-terminal tail | KCnst-Po |
Lambda Leader | LmdLdr |
Lambda Variable | LmdV |
Lambda Constant Ig Domain | LmdCnst-Ig |
Post Lambda Constant Ig Present in some species as a short C-terminal tail | LmdCnst-Po |
Heavy Leader | HLdr |
Heavy Variable | HV |
Heavy Constant Ig Domain | HCnst-Ig |
Heavy Constant Hinge | Hinge |
Heavy Constant Fc N-terminal Domain | Fc-N |
Heavy Constant Fc C-terminal Domain | Fc-C |
Post Heavy Constant Ig | HCnst-Po |
Linker | Lnk |
Tag | Tag |
Protease Cleavage Site | Cut |
Unrecognized | Unk |
TCR-alpha Leader | TRALdr |
TCR-alpha Variable | TRAV |
TCR-alpha Constant Ig Domain | TRACnst-Ig |
TCR-alpha Constant Connector | TRACnst-Connector |
TCR-alpha Constant Transmembrane Domain | TRACnst-TM |
TCR-beta Leader | TRBLdr |
TCR-beta Variable | TRBV |
TCR-beta Constant Ig Domain | TRBCnst-Ig |
TCR-beta Constant Connector | TRBCnst-Connector |
TCR-beta Constant Transmembrane Domain | TRBCnst-TM |
TCR-delta Leader | TRDLdr |
TCR-delta Variable | TRDV |
Chain Format Syntax
1. Regions are separated by a semicolon. Optional regions are surrounded by braces {}.
Example 1: A kappa light chain is specified as:
{Ldr} ; KV ; KCnst-Ig ; {KCnst-Po}
...where only the variable and constant regions are required to be present.
2. OR choices are separated by a '|' and enclosed in parentheses like (A | B) or by braces like {A | B}
Example 2: an scFv is specified as:
{Ldr | Unk<M>} ; (HV ; Lnk ; KV/LmdV | KV/LmdV ; Lnk ; HV)
where an optional leader or methionine can be present at the N-terminus followed by a heavy and light variable region connected via a linker in either orientation.
3. A colon-separated prefix to a region abbreviation indicates a particular germline gene.
Example 3: an IgG2 heavy chain is specified as:
{Ldr} ; HV ; IgG2:HCnst-Ig ; IgG2:Hinge ; IgG2:Fc-N ; IgG2:Fc-C ; IgG2:HCnst-Po
4. A sequence-level specification can be specified after the region abbreviation in '<>'.
Example 4: an IgG1 Heavy Chain Fab is specified as:
{Ldr} ; HV ; IgG1:HCnst-Ig ; IgG1:Hinge<!113-123>
which indicates that ASN positions 113 to 123 should not be present.
5. Example 5: an IgG1 HC Knob-into-Hole + phage + disulfide (Knob) is specified as:
{Ldr} ; HV ; IgG1:HCnst-Ig ; IgG1:Hinge ; IgG1:Fc-N ; IgG1:Fc-C<C15,W30> ; IgG1:HCnst-Po
where ASN position 15 of the Fc-C domain must be a cysteine and position 30 must be a tryptophan.
6. Square brackets are used to indicate which Fv a variable region is a part of.
Example 6: an IgG1 CrossMab CH1-CL Fab Heavy Chain is specified as:
{Ldr} ; HV[3] ; IgG1:HCnst-Ig ; Unk<EPKSCD> ; Lnk ; HV[2] ; Unk<A> ; KCnst/LmdCnst ;
IgG1:Hinge<!1-107> ; IgG1:Fc-N ; IgG1:Fc-C<C15,W30> ; IgG1:HCnst-Po
7. Finally the special character '⊃' is used to specify a particular region subtype. Currently, this is only used to indicate a VHH as a subtype of VH.
Example 7: an IgG1 HCab Chain is specified as:
{Ldr} ; HV⊃VHH ; IgG1:Hinge ; IgG1:Fc-N ; IgG1:Fc-C ; IgG1:HCnst-Po
Structure Formats
Structure formats are stored in the StructureFormats List where the only important information is the name and abbreviation. The more important table is ChainStructureJunction which describes the chain combinations that map to a structure format.
For example, there are two chain combinations for an IgG1, 2 copies of a Kappa Light Chain + 2 copies of an IgG1 Heavy Chain or 2 copies of a Lambda Light Chain + 2 copies of an IgG1 Heavy Chain. In the ChainStructureJunction this is represented like this:
Structure Format | Chain Format | Combination | Stoichiometry | Num Distinct* | Fv Num Overrides** |
---|
IgG1 | Kappa Light Chain | 1 | 2 | 1 | |
IgG1 | IgG1 Heavy Chain | 1 | 2 | 1 | |
IgG1 | Lambda Light Chain | 2 | 2 | 1 | |
IgG1 | IgG1 Heavy Chain | 2 | 2 | 1 | |
*The
Num Distinct column is used to indicate the number of sequence-distinct copies of that chain type. This is normally 1 but there are a few odd formats like a Trioma or Quadroma bi-specific IgG1 where the Num Distinct value might be 2.
**The
Fv Num Override column is for the rare situations where because of how the chains are combined, the default Fv Num values in the chain format spec need to be overridden.
For example in a CrossMab CH1-CL Fab, The Fv Num Override value of '1#1/1#3' indicates that for one copy of the Kappa chain the 1st V-region is part of Fv#1 and for the other copy the 1st V-region is part of Fv#3.
Related Topics