Data Classes

2024-03-28

Data Classes are used to represent virtual entity definitions that can be customized and defined by administrators. Members of a Data Class can be connected to physical samples using LabKey's existing experiment framework, or using Sample Manager they are represented as "Sources" and in Biologics as "Registry Sources" both supporting lineage relationships with samples.

For example, an administrator might define Data Classes to represent entities such as Protein Sequences, Nucleotide Sequences, Cells Lines, and Expression Systems. A Sample Type may be derived from an Expression System Data Class, which in turn is derived from Cell Line and Vector Data Classes. See LabKey Data Structures.

Data Class Lineage & Derivation

Data Classes support the concept of parentage/lineage. When importing data into a Sample Type, to indicate a Data Class parent, provide a column named "DataInputs/<NameOfDataClass>", where <NameOfDataClass> is some Data Class. Values entered under this column indicate the parent the sample is derived from. You can enter multiple parent values separated by commas. For example to indicate that sample-1 has three parents, two in DataClassA, and one in DataClassB import the following.

NameDataInputs/DataClassADataInputs/DataClassB
sample-1data-parent1,data-parent2data-parent3

Data Classes can be linked to one another by parentage lineage using the same syntax. For example, a parent protein may produce many children proteins by some bio-engineering process. Use Data Classes to capture the parent protein and the children proteins.

NameDataInputs/DataClassADataInputs/DataClassB
protein-1data-parent1,data-parent2data-parent3
protein-2protein-1 
protein-3protein-1 
protein-4protein-1 

Learn more about Sample Type and Data Class parentage relationships in this topic:

Learn more about specifying lineage during bulk registration of entities in LabKey Biologics in this topic:

Data Class Member Naming

Choose one of these ways to assign names to the members of your Data Class:

  1. Include a Name column in your uploaded data to provide a unique name for each row.
  2. Provide a Naming Pattern when a Data Class is created. This will generate names for you.

Naming Patterns

The naming pattern can be concatenated from (1) fixed strings, (2) an auto-incrementing integer indicated by ${genid}, and (3) values from other columns in the Data Class. The following example is concatenated from three parts: "FOO" (a fixed string value), "${genid}" (an auto-incrementing integer), and ${barcode} (the value from the barcode column).

FOO-${genid}-${barcode}

Use naming patterns to generate a descriptive id, guaranteed to be unique, for the row.

Learn more about naming patterns in this topic: Sample Naming Patterns

Aliases

You can specify alias names for members of a Data Class. On import, you can select one or more alias names. These aliases are intended to be used as "friendly" names, or tags; they aren't intended to be an alternative set of unique names. You can import a set of available aliases only via the client API. No graphical UI is currently available.

LABKEY.Query.insertRows({
schemaName: "exp.data",
queryName: "myQuery",
rows: [{
barcode: "barcodenum",
alias: ["a", "b", "c", "d"]
}]
});

Deletion Prevention

For Data Class members (such as bioregistry entities) with data dependencies or references, deletion is disallowed. This ensures integrity of your data in that the origins of the data will be retained for future reference. Data Class members cannot be deleted if they:

  • Are 'parents' of derived samples or other entities
  • Have assay runs associated with them
  • Are included in workflow jobs
  • Are referenced by Electronic Lab Notebooks

Related Topics