Biologics: Chain and Structure Formats

2024-04-20

Premium Feature — Available with LabKey Biologics LIMS. Learn more or contact LabKey.

This topic covers the process of assigning chain and structure formats in Biologics LIMS.

Overview
Chain Formats

Chain Format Syntax

Structure Formats

Overview

After the classification engine has determined the pattern of regions within the sequence, the chain and structure format information that was provided is used to first match the regions to a chain pattern and then to match the assigned chain formats to a unique structure format.

If antibody regions are present in a ProteinSeq, but it does not match a chain format, it will be assigned a chain format of "Unrecognized Antibody Chain" and the Molecule is assigned a structure format of "Unrecognized Antibody Format".

If no antibody regions are present, the ProteinSeq is assigned a chain format of "Non-Antibody Chain" and the Molecule structure format is set to "Non-Antibody".

In order for changes to the chain and structure formats to take effect, an administrator needs to clear caches from the Administration Console.

Note that by default, the classification engine is configured to detect antibody and not TCR regions.

Chain Formats

Chain formats are stored in the ChainFormats List.

A chain format specification is specified at the region level using region abbreviations. Recognized regions are listed in the following table.

Region	Abbreviation
Leader	Ldr
Light Variable	KV/LmdV
Light Constant	KCnst/LmdCnst
Kappa Leader	KLdr
Kappa Variable	KV
Kappa Constant Ig Domain	KCnst-Ig
Post Kappa Constant Ig Present in some species as a short C-terminal tail	KCnst-Po
Lambda Leader	LmdLdr
Lambda Variable	LmdV
Lambda Constant Ig Domain	LmdCnst-Ig
Post Lambda Constant Ig Present in some species as a short C-terminal tail	LmdCnst-Po
Heavy Leader	HLdr
Heavy Variable	HV
Heavy Constant Ig Domain	HCnst-Ig
Heavy Constant Hinge	Hinge
Heavy Constant Fc N-terminal Domain	Fc-N
Heavy Constant Fc C-terminal Domain	Fc-C
Post Heavy Constant Ig	HCnst-Po
Linker	Lnk
Tag	Tag
Protease Cleavage Site	Cut
Unrecognized	Unk
TCR-alpha Leader	TRALdr
TCR-alpha Variable	TRAV
TCR-alpha Constant Ig Domain	TRACnst-Ig
TCR-alpha Constant Connector	TRACnst-Connector
TCR-alpha Constant Transmembrane Domain	TRACnst-TM
TCR-beta Leader	TRBLdr
TCR-beta Variable	TRBV
TCR-beta Constant Ig Domain	TRBCnst-Ig
TCR-beta Constant Connector	TRBCnst-Connector
TCR-beta Constant Transmembrane Domain	TRBCnst-TM
TCR-delta Leader	TRDLdr
TCR-delta Variable	TRDV

Chain Format Syntax

1. Regions are separated by a semicolon. Optional regions are surrounded by braces {}.

Example 1: A kappa light chain is specified as:

{Ldr} ; KV ; KCnst-Ig ; {KCnst-Po}

...where only the variable and constant regions are required to be present.

2. OR choices are separated by a '|' and enclosed in parentheses like (A | B) or by braces like {A | B}

Example 2: an scFv is specified as:

{Ldr | Unk<M>} ; (HV ; Lnk ; KV/LmdV | KV/LmdV ; Lnk ; HV)

where an optional leader or methionine can be present at the N-terminus followed by a heavy and light variable region connected via a linker in either orientation.

3. A colon-separated prefix to a region abbreviation indicates a particular germline gene.

Example 3: an IgG2 heavy chain is specified as:

{Ldr} ; HV ; IgG2:HCnst-Ig ; IgG2:Hinge ; IgG2:Fc-N ; IgG2:Fc-C ; IgG2:HCnst-Po

4. A sequence-level specification can be specified after the region abbreviation in '<>'.

Example 4: an IgG1 Heavy Chain Fab is specified as:

{Ldr} ; HV ; IgG1:HCnst-Ig ; IgG1:Hinge<!113-123>

which indicates that ASN positions 113 to 123 should not be present.

5. Example 5: an IgG1 HC Knob-into-Hole + phage + disulfide (Knob) is specified as:

{Ldr} ; HV ; IgG1:HCnst-Ig ; IgG1:Hinge ; IgG1:Fc-N ; IgG1:Fc-C<C15,W30> ; IgG1:HCnst-Po

where ASN position 15 of the Fc-C domain must be a cysteine and position 30 must be a tryptophan.

6. Square brackets are used to indicate which Fv a variable region is a part of.

Example 6: an IgG1 CrossMab CH1-CL Fab Heavy Chain is specified as:

{Ldr} ; HV[3] ; IgG1:HCnst-Ig ; Unk<EPKSCD> ; Lnk ; HV[2] ; Unk<A> ; KCnst/LmdCnst ;
IgG1:Hinge<!1-107> ; IgG1:Fc-N ; IgG1:Fc-C<C15,W30> ; IgG1:HCnst-Po

7. Finally the special character 'âŠƒ' is used to specify a particular region subtype. Currently, this is only used to indicate a VHH as a subtype of VH.

Example 7: an IgG1 HCab Chain is specified as:

{Ldr} ; HVâŠƒVHH ; IgG1:Hinge ; IgG1:Fc-N ; IgG1:Fc-C ; IgG1:HCnst-Po

Structure Formats

Structure formats are stored in the StructureFormats List where the only important information is the name and abbreviation. The more important table is ChainStructureJunction which describes the chain combinations that map to a structure format.

For example, there are two chain combinations for an IgG1, 2 copies of a Kappa Light Chain + 2 copies of an IgG1 Heavy Chain or 2 copies of a Lambda Light Chain + 2 copies of an IgG1 Heavy Chain. In the ChainStructureJunction this is represented like this:

Structure Format	Chain Format	Combination	Stoichiometry	Num Distinct*
IgG1	Kappa Light Chain	1	2	1
IgG1	IgG1 Heavy Chain	1	2	1
IgG1	Lambda Light Chain	2	2	1
IgG1	IgG1 Heavy Chain	2	2	1

*The Num Distinct column is used to indicate the number of sequence-distinct copies of that chain type. This is normally 1 but there are a few odd formats like a Trioma or Quadroma bi-specific IgG1 where the Num Distinct value might be 2.

**The Fv Num Override column is for the rare situations where because of how the chains are combined, the default Fv Num values in the chain format spec need to be overridden.

For example in a CrossMab CH1-CL Fab, The Fv Num Override value of '1#1/1#3' indicates that for one copy of the Kappa chain the 1st V-region is part of Fv#1 and for the other copy the 1st V-region is part of Fv#3.