Biologics: Chain and Structure Formats

2024-04-20

Premium Feature — Available with LabKey Biologics LIMS. Learn more or contact LabKey.

This topic covers the process of assigning chain and structure formats in Biologics LIMS.

Overview

After the classification engine has determined the pattern of regions within the sequence, the chain and structure format information that was provided is used to first match the regions to a chain pattern and then to match the assigned chain formats to a unique structure format.

If antibody regions are present in a ProteinSeq, but it does not match a chain format, it will be assigned a chain format of "Unrecognized Antibody Chain" and the Molecule is assigned a structure format of "Unrecognized Antibody Format".

If no antibody regions are present, the ProteinSeq is assigned a chain format of "Non-Antibody Chain" and the Molecule structure format is set to "Non-Antibody".

In order for changes to the chain and structure formats to take effect, an administrator needs to clear caches from the Administration Console.

Note that by default, the classification engine is configured to detect antibody and not TCR regions.

Chain Formats

Chain formats are stored in the ChainFormats List.

A chain format specification is specified at the region level using region abbreviations. Recognized regions are listed in the following table.

RegionAbbreviation
LeaderLdr
Light VariableKV/LmdV
Light ConstantKCnst/LmdCnst
Kappa LeaderKLdr
Kappa VariableKV
Kappa Constant Ig DomainKCnst-Ig
Post Kappa Constant Ig
Present in some species as a short C-terminal tail
KCnst-Po
Lambda LeaderLmdLdr
Lambda VariableLmdV
Lambda Constant Ig DomainLmdCnst-Ig
Post Lambda Constant Ig
Present in some species as a short C-terminal tail
LmdCnst-Po
Heavy LeaderHLdr
Heavy VariableHV
Heavy Constant Ig DomainHCnst-Ig
Heavy Constant HingeHinge
Heavy Constant Fc N-terminal DomainFc-N
Heavy Constant Fc C-terminal DomainFc-C
Post Heavy Constant IgHCnst-Po
LinkerLnk
TagTag
Protease Cleavage SiteCut
UnrecognizedUnk
TCR-alpha LeaderTRALdr
TCR-alpha VariableTRAV
TCR-alpha Constant Ig DomainTRACnst-Ig
TCR-alpha Constant ConnectorTRACnst-Connector
TCR-alpha Constant Transmembrane DomainTRACnst-TM
TCR-beta LeaderTRBLdr
TCR-beta VariableTRBV
TCR-beta Constant Ig DomainTRBCnst-Ig
TCR-beta Constant ConnectorTRBCnst-Connector
TCR-beta Constant Transmembrane DomainTRBCnst-TM
TCR-delta LeaderTRDLdr
TCR-delta VariableTRDV

Chain Format Syntax

1. Regions are separated by a semicolon. Optional regions are surrounded by braces {}.

Example 1: A kappa light chain is specified as:

{Ldr} ; KV ; KCnst-Ig ; {KCnst-Po}
...where only the variable and constant regions are required to be present.

2. OR choices are separated by a '|' and enclosed in parentheses like (A | B) or by braces like {A | B}

Example 2: an scFv is specified as:

{Ldr | Unk<M>} ; (HV ; Lnk ; KV/LmdV | KV/LmdV ; Lnk ; HV)
where an optional leader or methionine can be present at the N-terminus followed by a heavy and light variable region connected via a linker in either orientation.

3. A colon-separated prefix to a region abbreviation indicates a particular germline gene.

Example 3: an IgG2 heavy chain is specified as:

{Ldr} ; HV ; IgG2:HCnst-Ig ; IgG2:Hinge ; IgG2:Fc-N ; IgG2:Fc-C ; IgG2:HCnst-Po

4. A sequence-level specification can be specified after the region abbreviation in '<>'.

Example 4: an IgG1 Heavy Chain Fab is specified as:

{Ldr} ; HV ; IgG1:HCnst-Ig ; IgG1:Hinge<!113-123>

which indicates that ASN positions 113 to 123 should not be present.

5. Example 5: an IgG1 HC Knob-into-Hole + phage + disulfide (Knob) is specified as:

{Ldr} ; HV ; IgG1:HCnst-Ig ; IgG1:Hinge ; IgG1:Fc-N ; IgG1:Fc-C<C15,W30> ; IgG1:HCnst-Po

where ASN position 15 of the Fc-C domain must be a cysteine and position 30 must be a tryptophan.

6. Square brackets are used to indicate which Fv a variable region is a part of.

Example 6: an IgG1 CrossMab CH1-CL Fab Heavy Chain is specified as:

{Ldr} ; HV[3] ; IgG1:HCnst-Ig ; Unk<EPKSCD> ; Lnk ; HV[2] ; Unk<A> ; KCnst/LmdCnst ;
IgG1:Hinge<!1-107> ; IgG1:Fc-N ; IgG1:Fc-C<C15,W30> ; IgG1:HCnst-Po

7. Finally the special character '⊃' is used to specify a particular region subtype. Currently, this is only used to indicate a VHH as a subtype of VH.

Example 7: an IgG1 HCab Chain is specified as:

{Ldr} ; HV⊃VHH ; IgG1:Hinge ; IgG1:Fc-N ; IgG1:Fc-C ; IgG1:HCnst-Po

Structure Formats

Structure formats are stored in the StructureFormats List where the only important information is the name and abbreviation. The more important table is ChainStructureJunction which describes the chain combinations that map to a structure format.

For example, there are two chain combinations for an IgG1, 2 copies of a Kappa Light Chain + 2 copies of an IgG1 Heavy Chain or 2 copies of a Lambda Light Chain + 2 copies of an IgG1 Heavy Chain. In the ChainStructureJunction this is represented like this:

Structure FormatChain FormatCombinationStoichiometryNum Distinct*Fv Num Overrides**
IgG1Kappa Light Chain121 
IgG1IgG1 Heavy Chain121 
IgG1Lambda Light Chain221 
IgG1IgG1 Heavy Chain221 

*The Num Distinct column is used to indicate the number of sequence-distinct copies of that chain type. This is normally 1 but there are a few odd formats like a Trioma or Quadroma bi-specific IgG1 where the Num Distinct value might be 2.

**The Fv Num Override column is for the rare situations where because of how the chains are combined, the default Fv Num values in the chain format spec need to be overridden.

For example in a CrossMab CH1-CL Fab, The Fv Num Override value of '1#1/1#3' indicates that for one copy of the Kappa chain the 1st V-region is part of Fv#1 and for the other copy the 1st V-region is part of Fv#3.

Related Topics