Table of Contents

guest
2025-06-24
     Modules: Assay Types
       Tutorial: Define an Assay Type in a Module
       Assay Custom Domains
       Assay Custom Views
       Loading Custom Views
       Example Assay JavaScript Objects
       Assay Query Metadata
       Customize Batch Save Behavior
       SQL Scripts for Module-Based Assays
       Transformation Scripts
         Example Workflow: Develop a Transformation Script (perl)
         Example Transformation Scripts (perl)
         Transformation Scripts in R
         Transformation Scripts in Java
         Transformation Scripts for Module-based Assays
         Run Properties Reference
         Transformation Script Substitution Syntax
         Warnings in Transformation Scripts

Modules: Assay Types


Module-based assays allow a developer to create a new assay type with a custom schema and custom views without becoming a Java developer. A module-based assay type consists of an assay config file, a set of domain descriptions, and view html files. The assay is added to a module by placing it in an assay directory at the top-level of the module. When the module is enabled in a folder, assay designs can be created based on the type defined in the module. For information on the applicable API, see: LABKEY.Experiment#saveBatch.

Topics

Examples: Module-Based Assays

There are a handful of module-based assays in the LabKey SVN tree. You can find the modules in <LABKEY_HOME>/server/customModules. Examples include:

  • <LABKEY_HOME>/server/customModules/exampleassay/resources/assay
  • <LABKEY_HOME>/server/customModules/iaviElisa/elisa/assay/elisa
  • <LABKEY_HOME>/server/customModules/idri/resources/assay/particleSize

File Structure

The assay consists of an assay config file, a set of domain descriptions, and view html files. The assay is added to a module by placing it in an assay directory at the top-level of the module. The assay has the following file structure:

<module-name>/     assay/           ASSAY_NAME/               config.xml example               domains/ - example                   batch.xml                   run.xml                   result.xml               views/ - example                   begin.html                   upload.html                   batches.html                   batch.html                   runs.html                   run.html                   results.html                   result.html               queries/ - example                   Batches.query.xml                   Run.query.xml                   Data.query.xml                   CUSTOM_ASSAY_QUERY.query.xml                   CUSTOM_ASSAY_QUERY.sql (A query that shows up in the schema for all assay designs of this provider type)                   CUSTOM_ASSAY_QUERY/                       CUSTOM_VIEW.qview.xml               scripts/                   script1.R                   script2.pl

The only required part of the assay is the <assay-name> directory. The config.xml, domain files, and view files are all optional.

This diagram shows the relationship between the pages. The details link will only appear if the corresponding details html view is available.

How to Specify an Assay "Begin" Page

Module-based assays can be designed to jump to a "begin" page instead of a "runs" page. If an assay has a begin.html in the assay/<name>/views/ directory, users are directed to this page instead of the runs page when they click on the name of the assay in the assay list.




Tutorial: Define an Assay Type in a Module


Module-based assays provide a quick and easy method for defining new assay types beyond the types of assays already built into LabKey Server.

To create a module-based assay, you create a set of files that define the new assay design, describe the data import process, and define various types of assay views. The new assay is incorporated into your server when package these files as a module and restart your server. The new type of assay is then available on your server as the basis for new assay designs, in the same way that built-in assay types (e.g., Luminex) are available.

This tutorial explains how to incorporate a ready-made, module-based assay into your LabKey Server and make use of the new type of assay. It does not cover creation of the files that compose a module-based assay. Please refer to the "Related Topics" section below for instructions on how to create such files.

Download

First download a pre-packed .module file and deploy it to LabKey Server.

  • Download exampleassay.module. (This is a renamed .zip archive the contains the source files for the assay module.)

Add the Module to your LabKey Server Installation

  • On a local build of LabKey Server, copy exampleassay.module to a module deployment directory, such as <LABKEY_HOME>\build\deploy\modules\
    • Or
  • On a local install of LabKey Server, copy exampleassay.module to this location: <LABKEY_HOME>\externalModules\
  • Restart your server. The server will explode the directory.
  • Examine the files in the exploded directory. You will see the following structure:
exampleassay
└───assay
└───example
│ config.xml

├───domains
│ batch.xml
│ result.xml
│ run.xml

└───views
upload.html
  • upload.html contains the UI that the user will see when importing data to this type of assay.
  • batch.xml, result.xml, and run.xml provide the assay's design, i.e., the names of the fields, their data types, whether they are required fields, etc.

Enable the Module in a Folder

The assay module is now available through the UI. Here we enable the module in a folder.

  • Create or select a folder to enable the module in, for example, a subfolder in the Home project.
  • Select Admin > Folder > Management and then click the Folder Type tab.
  • Place a checkmark next to the exampleassay module (under the "Modules" column on the right).
  • Click the Update Folder button.

Use the Module's Assay Design

Next we create a new assay design based the module.

  • Select Admin > Manage Assays.
  • On the Assay List page, click New Assay Design.
  • Select LabKey Example and click Next.
  • Name this assay "FileBasedAssay"
  • Leave all other fields at default values and click Save and Close.

Import Data to the Assay Design

  • Download these two sample assay data files:
  • Click on the new FileBasedAssay in the Assay List.
  • Click the Import Data button.
  • Enter a value for Batch Name, for example, "Batch 1"
  • Click Add Excel File and select GenericAssay_Run1.xls. (Wait a few seconds for the file to upload.)
  • Notice that the Created and Modified fields are filled in automatically, as specified in the module-based assay's upload.html file.
  • Click Import Data and repeat the import process for GenericAssay_Run2.xls.
  • Click Done.

Review Imported Data

  • Click on the first run (GenericAssayRun1.xls) to see the data it contains. You will see data similar to the following:
  • You can now integrate this data into any available target studies.

Related Topics




Assay Custom Domains


A domain is a collection of fields under a data type. Each data type (e.g., Assays, Lists, Datasets, etc.) provides specialized handling for the domains it defines. Assays define multiple domains (batch, run, etc.), while Lists and Datasets define only one domain each.

An assay module can define a custom domain to replace LabKey's built-in default assay domains, by adding a schema definition in the domains/ directory. For example:

assay/<assay-name>/domains/<domain-name>.xml

The name of the assay is taken from the <assay-name> directory. The contents of <domain-name>.xml file contains the domain definition and conforms to the <domain> element from assayProvider.xsd, which is in turn a DomainDescriptorType from the expTypes.xsd XML schema. There are three built-in domains for assays: "batch", "run", and "result". This following result domain replaces the build-in result domain for assays:

result.xml

<ap:domain xmlns:exp="http://cpas.fhcrc.org/exp/xml"
xmlns:ap="http://labkey.org/study/assay/xml">
<exp:Description>This is my data domain.</exp:Description>
<exp:PropertyDescriptor>
<exp:Name>SampleId</exp:Name>
<exp:Description>The Sample Id</exp:Description>
<exp:Required>true</exp:Required>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#string</exp:RangeURI>
<exp:Label>Sample Id</exp:Label>
</exp:PropertyDescriptor>
<exp:PropertyDescriptor>
<exp:Name>TimePoint</exp:Name>
<exp:Required>true</exp:Required>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#dateTime</exp:RangeURI>
</exp:PropertyDescriptor>
<exp:PropertyDescriptor>
<exp:Name>DoubleData</exp:Name>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#double</exp:RangeURI>
</exp:PropertyDescriptor>
</ap:domain>

To deploy the module, the assay directory is zipped up as a <module-name>.module file and copied to the LabKey server's modules directory.

When you create a new assay design for that assay type, it will use the fields defined in the XML domain as a template for the corresponding domain. Changes to the domains in the XML files will not affect existing assay designs that have already been created.




Assay Custom Views


Add a Custom Details View

Suppose you want to add a [details] link to each row of an assay run table, that takes you to a custom details view for that row. You can add new views to the module-based assay by adding html files in the views/ directory, for example:

assay/<assay-name>/views/<view-name>.html

The overall page template will include JavaScript objects as context so that they're available within the view, avoiding an extra client API request to fetch it from the server. For example, the result.html page can access the assay definition and result data as LABKEY.page.assay and LABKEY.page.result respectively. Here is an example custom details view named result.html:

1 <table>
2 <tr>
3 <td class='labkey-form-label'>Sample Id</td>
4 <td><div id='SampleId_div'>???</div></td>
5 </tr>
6 <tr>
7 <td class='labkey-form-label'>Time Point</td>
8 <td><div id='TimePoint_div'>???</div></td>
9 </tr>
10 <tr>
11 <td class='labkey-form-label'>Double Data</td>
12 <td><div id='DoubleData_div'>???</div></td>
13 </tr>
14 </table>
15
16 <script type="text/javascript">
17 function setValue(row, property)
18 {
19 var div = Ext.get(property + "_div");
20 var value = row[property];
21 if (!value)
22 value = "<none>";
23 div.dom.innerHTML = value;
24 }
25
26 if (LABKEY.page.result)
27 {
28 var row = LABKEY.page.result;
29 setValue(row, "SampleId");
30 setValue(row, "TimePoint");
31 setValue(row, "DoubleData");
32 }
33 </script>

Note on line 28 the details view is accessing the result data from LABKEY.page.result. See Example Assay JavaScript Objects for a description of the LABKEY.page.assay and LABKEY.page.result objects.

Add a custom view for a run

Same as for the custom details page for the row data except the view file name is run.html and the run data will be available as the LABKEY.page.run variable. See Example Assay JavaScript Objects for a description of the LABKEY.page.run object.

Add a custom view for a batch

Same as for the custom details page for the row data except the view file name is batch.html and the run data will be available as the LABKEY.page.batch variable. See Example Assay JavaScript Objects for a description of the LABKEY.page.batch object.

Related Topics




Loading Custom Views


Module based custom views should be loaded based on their association with the target query name. In past releases, associations with the table title were also supported; the table title is found in the query's metadata.xml file. Using the table title technique to bind a custom view to a query is obsolete and searching for these table titles has a significant negative effect on performance when rendering a grid or dataview web part. Support for this legacy technique will be removed in LabKey Server version 17.3.

Disable Loading by Table Title

In LabKey Server v17.2 an administrator can find and fix these table title references by proactively disabling the "alwaysUseTitlesForLoadingCustomViews" using this experimental feature. If you want to improve performance, removing reliance on this feature will help.

  • Select (Admin) > Site > Admin Console.
  • Click the Admin Console Links tab.
  • Under Configuration, click Experimental Features
  • Click Enable for Remove support for loading of Custom Views by Table Title.

Custom views loading by table title will now generate a warning enabling you to find and fix them.

Loading Custom Views

The correct way to attach a custom view to a table is to bind via the query name. For instance, if you have a query in the elispot module called QueryName, which includes the table name definition as TableTitle, and your custom view is called MyView, you would place the xml file here:

./resources/assay/elispot/queries/QueryName/MyView.qview.xml

Fixing Legacy Views

With the "alwaysUseTitlesForLoadingCustomViews" flag set, you would also have been able to load the above example view by binding it to the table name, i.e.:

./resources/assay/elispot/queries/TableTitle/MyView.qview.xml

In version 17.3, this flag will be removed, so to fix legacy views and remove reliance on this flag, use the experimental feature described above to disable it in version 17.2 and modify any module based custom views to directly reference the query name.




Example Assay JavaScript Objects


These JavaScript objects are automatically injected into the rendered page (example page: result.html), to save developers from needing to make a separate JavaScript client API request via AJAX to separately fetch them from the server.

LABKEY.page.assay:

The assay definition is available as LABKEY.page.assay for all of the html views. It is a JavaScript object, which is of type LABKEY.Assay.AssayDesign:

LABKEY.page.assay = {
"id": 4,
"projectLevel": true,
"description": null,
"name": <assay name>,
// domains objects: one for batch, run, and result.
"domains": {
// array of domain property objects for the batch domain
"<assay name> Batch Fields": [
{
"typeName": "String",
"formatString": null,
"description": null,
"name": "ParticipantVisitResolver",
"label": "Participant Visit Resolver",
"required": true,
"typeURI": "http://www.w3.org/2001/XMLSchema#string"
},
{
"typeName": "String",
"formatString": null,
"lookupQuery": "Study",
"lookupContainer": null,
"description": null,
"name": "TargetStudy",
"label": "Target Study",
"required": false,
"lookupSchema": "study",
"typeURI": "http://www.w3.org/2001/XMLSchema#string"
}
],
// array of domain property objects for the run domain
"<assay name> Run Fields": [{
"typeName": "Double",
"formatString": null,
"description": null,
"name": "DoubleRun",
"label": null,
"required": false,
"typeURI": "http://www.w3.org/2001/XMLSchema#double"
}],
// array of domain property objects for the result domain
"<assay name> Result Fields": [
{
"typeName": "String",
"formatString": null,
"description": "The Sample Id",
"name": "SampleId",
"label": "Sample Id",
"required": true,
"typeURI": "http://www.w3.org/2001/XMLSchema#string"
},
{
"typeName": "DateTime",
"formatString": null,
"description": null,
"name": "TimePoint",
"label": null,
"required": true,
"typeURI": "http://www.w3.org/2001/XMLSchema#dateTime"
},
{
"typeName": "Double",
"formatString": null,
"description": null,
"name": "DoubleData",
"label": null,
"required": false,
"typeURI": "http://www.w3.org/2001/XMLSchema#double"
}
]
},
"type": "Simple"
};

LABKEY.page.batch:

The batch object is available as LABKEY.page.batch on the upload.html and batch.html pages. The JavaScript object is an instance of LABKEY.Exp.RunGroup and is shaped like:

LABKEY.page.batch = new LABKEY.Exp.RunGroup({
"id": 8,
"createdBy": <user name>,
"created": "8 Apr 2009 12:53:46 -0700",
"modifiedBy": <user name>,
"name": <name of the batch object>,
"runs": [
// array of LABKEY.Exp.Run objects in the batch. See next section.
],
// map of batch properties
"properties": {
"ParticipantVisitResolver": null,
"TargetStudy": null
},
"comment": null,
"modified": "8 Apr 2009 12:53:46 -0700",
"lsid": "urn:lsid:labkey.com:Experiment.Folder-5:2009-04-08+batch+2"
});

LABKEY.page.run:

The run detail object is available as LABKEY.page.run on the run.html pages. The JavaScript object is an instance of LABKEY.Exp.Run and is shaped like:

LABKEY.page.run = new LABKEY.Exp.Run({
"id": 4,
// array of LABKEY.Exp.Data objects added to the run
"dataInputs": [{
"id": 4,
"created": "8 Apr 2009 12:53:46 -0700",
"name": "run01.tsv",
"dataFileURL": "file:/C:/Temp/assaydata/run01.tsv",
"modified": null,
"lsid": <filled in by the server>
}],
// array of objects, one for each row in the result domain
"dataRows": [
{
"DoubleData": 3.2,
"SampleId": "Monkey 1",
"TimePoint": "1 Nov 2008 11:22:33 -0700"
},
{
"DoubleData": 2.2,
"SampleId": "Monkey 2",
"TimePoint": "1 Nov 2008 14:00:01 -0700"
},
{
"DoubleData": 1.2,
"SampleId": "Monkey 3",
"TimePoint": "1 Nov 2008 14:00:01 -0700"
},
{
"DoubleData": 1.2,
"SampleId": "Monkey 4",
"TimePoint": "1 Nov 2008 00:00:00 -0700"
}
],
"createdBy": <user name>,
"created": "8 Apr 2009 12:53:47 -0700",
"modifiedBy": <user name>,
"name": <name of the run>,
// map of run properties
"properties": {"DoubleRun": null},
"comment": null,
"modified": "8 Apr 2009 12:53:47 -0700",
"lsid": "urn:lsid:labkey.com:SimpleRun.Folder-5:cf1fea1d-06a3-102c-8680-2dc22b3b435f"
});

LABKEY.page.result:

The result detail object is available as LABKEY.page.result on the result.html page. The JavaScript object is a map for a single row and is shaped like:

LABKEY.page.result = {
"DoubleData": 3.2,
"SampleId": "Monkey 1",
"TimePoint": "1 Nov 2008 11:22:33 -0700"
};



Assay Query Metadata


Query Metadata for Assay Tables

You can associate query metadata with an individual assay design, or all assay designs that are based on the same type of assay (e.g., "NAb" or "Viability").

Example. Assay table names are based upon the name of the assay design. For example, consider an assay design named "Example" that is based on the "Viability" assay type. This design would be associated with three tables in the schema explorer: "Example Batches", "Example Runs", and "Example Data."

Associate metadata with a single assay design. To attach query metadata to the "Example Data" table, you would normally create a /queries/assay/Example Data.query.xml metadata file. This would work well for the "Example Data" table itself. However, this method would not allow you to re-use this metadata file for a new assay design that is also based on the same assay type ("Viability" in this case).

Associate metadata with all assay designs based on a particular assay type. To permit re-use of the metadata, you need to create a query metadata file whose name is based upon the assay type and table name. To continue our example, you would create a query metadata file callled /assay/Viability/queries/Data.query.xml to attach query metadata to all data tables based on the Viability-type assay.

As with other query metadata in module files, the module must be activated (in other words, the appropriate checkbox must be checked) in the folder's settings.

See Modules: Queries, Views and Reports and Modules: Query Metadata for more information on query metadata.




Customize Batch Save Behavior


You can enable file-based assays to customize their own Experiment.saveBatch behavior by writing Java code that implements the AssaySaveHandler interface. This allows you to customize saving your batch without having to convert your existing file-based assay UI code, queries, views, etc. into a Java-based assay.

The AssaySaveHandler interface enables file-based assays to extend the functionality of the SaveAssayBatch action with Java code. A file-based assay can provide an implementation of this interface by creating a Java-based module and then putting the class under the module's src directory. This class can then be referenced by name in the <saveHandler/> element in the assay's config file. For example, an entry might look like:

<saveHandler>org.labkey.icemr.assay.tracking.TrackingSaveHandler</saveHandler>.

To implement this functionality:

  • Create the skeleton framework for a Java module. This consists of a controller class, manager, etc. See Creating a New Java Module for details on autogenerating the boiler plate Java code.
  • Add an assay directory underneath the Java src directory that corresponds to the file-based assay you want to extend. For example: myModule/src/org.labkey.mymodule/assay/tracking
  • Implement the AssaySaveHandler interface. You can choose to either implement the interface from scratch or extend default behavior by having your class inherit from the DefaultAssaySaveHandler class. If you want complete control over the JSON format of the experiment data you want to save, you may choose to implement the AssaySaveHandler interface entirely. If you want to follow the pre-defined LABKEY experiment JSON format, then you can inherit from the DefaultAssaySaveHandler class and only override the specific piece you want to customize. For example, you may want custom code to run when a specific property is saved. (See below for more implementation details.)
  • Reference your class in the assay's config.xml file. For example, notice the <ap:saveHandler/> entry below. If a non-fully-qualified name is used (as below) then LabKey Server will attempt to find this class under org.labkey.[module name].assay.[assay name].[save handler name].
<ap:provider xmlns:ap="http://labkey.org/study/assay/xml">
<ap:name>Flask Tracking</ap:name>
<ap:description>
Enables entry of a set of initial samples and then tracks
their progress over time via a series of daily measurements.
</ap:description>
<ap:saveHandler>TrackingSaveHandler</ap:saveHandler>
<ap:fieldKeys>
<ap:participantId>Run/PatientId</ap:participantId>
<ap:date>MeasurementDate</ap:date>
</ap:fieldKeys>
</ap:provider>
  • The interface methods are invoked when the user chooses to import data into the assay or otherwise calls the SaveAssayBatch action. This is usually invoked by the Experiment.saveBatch JavaScript API. On the server, the file-based assay provider will look for an AssaySaveHandler specified in the config.xml and invoke its functions. If no AssaySaveHandler is specfied then the DefaultAssaySaveHandler implementation is used.

SaveAssayBatch Details

The SaveAssayBatch function creates a new instance of the SaveHandler for each request. SaveAssayBatch will dispatch to the methods of this interface according to the format of the JSON Experiment Batch (or run group) sent to it by the client. If a client chooses to implement this interface directly then the order of method calls will be:

  • beforeSave
  • handleBatch
  • afterSave
A client can also inherit from DefaultAssaySaveHandler class to get a default implementation. In this case, the default handler does a deep walk through all the runs in a batch, inputs, outputs, materials, and properties. The sequence of calls for DefaultAssaySaveHandler are:
  • beforeSave
  • handleBatch
  • handleProperties (for the batch)
  • handleRun (for each run)
  • handleProperties (for the run)
  • handleProtocolApplications
  • handleData (for each data output)
  • handleProperties (for the data)
  • handleMaterial (for each input material)
  • handleProperties (for the material)
  • handleMaterial (for each output material)
  • handleProperties (for the material)
  • afterSave
Because LabKey Server creates a new instance of the specified SaveHandler for each request, your implementation can preserve instance state across interface method calls within a single request but not across requests.

Related Topics




SQL Scripts for Module-Based Assays


How do you add supporting tables to your assay type? For example, suppose you want to add a table of Reagents, which your assay domain refers to via a lookup/foreign key?

Some options:

1) Manually import a list archive into the target folder.

2) Add the tables via SQL scripts included in the module. To insert data: use SQL DML scripts or create an initialize.html view that populates the table using LABKEY.Query.insertRows().

To add the supporting table using SQL scripts, add a schemas directory, as a sibling to the assay directory, as shown below.

exampleassay
├───assay
│ └───example
│ │ config.xml
│ │
│ ├───domains
│ │ batch.xml
│ │ result.xml
│ │ run.xml
│ │
│ └───views
│ upload.html

└───schemas
│ SCHEMA_NAME.xml

└───dbscripts
├───postgresql
│ SCHEMA_NAME-X.XX-Y.YY.sql
└───sqlserver
SCHEMA_NAME-X.XX-Y.YY.sql

To support only one database, include a script only for that database, and configure your module properties accordingly -- see "SupportedDatabases" in Module Properties Reference.

LabKey Server does not currently support adding assay types or lists via SQL scripts, but you can create a new schema to hold the table, for example, the following script creates a new schema called "myreagents" (on PostgreSQL):

DROP SCHEMA IF EXISTS myreagents CASCADE;

CREATE SCHEMA myreagents;

CREATE TABLE myreagents.Reagents
(
RowId SERIAL NOT NULL,
ReagentName VARCHAR(30) NOT NULL

);

ALTER TABLE ONLY myreagents.Reagents
ADD CONSTRAINT Reagents_pkey PRIMARY KEY (RowId);

INSERT INTO myreagents.Reagents (ReagentName) VALUES ('Acetic Acid');
INSERT INTO myreagents.Reagents (ReagentName) VALUES ('Baeyers Reagent');
INSERT INTO myreagents.Reagents (ReagentName) VALUES ('Carbon Disulfide');

Update the assay domain, adding a lookup/foreign key property to the Reagents table:

<exp:PropertyDescriptor>
<exp:Name>Reagent</exp:Name>
<exp:Required>false</exp:Required>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#int</exp:RangeURI>
<exp:Label>Reagent</exp:Label>
<exp:FK>
<exp:Schema>myreagents</exp:Schema>
<exp:Query>Reagents</exp:Query>
</exp:FK>
</exp:PropertyDescriptor>

If you'd like to allow admins to add/remove fields from the table, you can add an LSID column to your table and make it a foreign key to the exp.Object.ObjectUri column in the schema.xml file. This will allow you to define a domain for the table much like a list. The domain is per-folder so different containers may have different sets of fields.

For example, see customModules/reagent/resources/schemas/reagent.xml. It wires up the LSID lookup to the exp.Object.ObjectUri column

<ns:column columnName="Lsid"> 
<ns:datatype>lsidtype</ns:datatype>
<ns:isReadOnly>true</ns:isReadOnly>
<ns:isHidden>true</ns:isHidden>
<ns:isUserEditable>false</ns:isUserEditable>
<ns:isUnselectable>true</ns:isUnselectable>
<ns:fk>
<ns:fkColumnName>ObjectUri</ns:fkColumnName>
<ns:fkTable>Object</ns:fkTable>
<ns:fkDbSchema>exp</ns:fkDbSchema>
</ns:fk>
</ns:column>

...and adds an "Edit Fields" button that opens the domain editor.

function editDomain(queryName) 
{
var url = LABKEY.ActionURL.buildURL("property", "editDomain", null, {
domainKind: "ExtensibleTable",
createOrEdit: true,
schemaName: "myreagents",
queryName: queryName
});
window.location = url;
}



Transformation Scripts


As part of validating and cleaning assay data, transformation scripts can be run at the time of assay data upload.

Any scripting language that can be invoked via the command line and has the ability to read/write files is supported, including:

  • Perl
  • Python
  • R
  • Java
Transform scripts can inspect an uploaded file and change the data or populate empty columns in the uploaded data. For example, you can calculate the contents of one column from data contained in other columns. A transformation script can also modify run- and batch-level properties. If validation only needs to be done for particular single field values, the simpler mechanism is to use a validator within the field properties for the column.

Transformation scripts (which are always attached to assay designs) are different from trigger scripts, which are attached to a dataset (database table or query).

Topics

Use Transformation Scripts

Each assay design can be associated with one or more validation or transformation scripts which are run in the order specified. The script file extension (.r, .pl, etc) identifies the script engine that will be used to run the transform script. For example: a script named test.pl will be run with the Perl scripting engine. Before you can run validation or transformation scripts, you must configure the necessary Scripting Engines.

This section describes the process of using a transformation script that has already been developed for your assay type. An example workflow for how to create an assay transformation script in perl can be found in Example Workflow: Develop a Transformation Script (perl).

To specify a transform script in an assay design, you enter the full path including the file extension.

  • Open the assay designer for a new assay, or edit an existing assay design.
  • Click Add Script.
  • Enter the full path to the script in the Transform Scripts field.
  • You may enter multiple scripts by clicking Add Script again.
  • Confirm that other Properties required by your assay type are correctly specified.
  • Click Save and Close.

When you import (or re-import) run data using this assay design, the script will be executed. When you are developing or debugging transform scripts, you can use the Save Script Data option to store the files generated by the server that are passed to the script. Once your script is working properly, uncheck this box to avoid unnecessarily cluttering your disk.

A few notes on usage:

  • Client API calls are not supported in transform scripts.
  • Columns populated by transform scripts must already exist in the assay definition.
  • Executed scripts show up in the experimental graph, providing a record that transformations and/or quality control scripts were run.
  • Transform scripts are run before field-level validators.
  • The script is invoked once per run upload.
  • Multiple scripts are invoked in the order they are listed in the assay design.
Note that non-programmatic quality control remains available -- assay designs can be configured to perform basic checks for data types, required values, regular expressions, and ranges in uploaded data. See the Validators section of the Field Properties topic and Manage Dataset QC States.

The general purpose assay tutorial includes another example use of a transformation script in Set up a Data Transformation Script.

How Transformation Scripts Work

Script Execution Sequence

Transformation and validation scripts are invoked in the following sequence:

  1. A user uploads assay data.
  2. The server creates a runProperties.tsv file and rewrites the uploaded data in TSV format. Assay-specific properties and files from both the run and batch levels are added. See Run Properties Reference for full lists of properties.
  3. The server invokes the transform script by passing it the information created in step 2 (the runProperties.tsv file).
  4. After script completion, the server checks whether any errors have been written by the transform script and whether any data has been transformed.
  5. If transformed data is available, the server uses it for subsequent steps; otherwise, the original data is used.
  6. If multiple transform scripts are specified, the server invokes the other scripts in the order in which they are defined.
  7. Field-level validator/quality-control checks (including range and regular expression validation) are performed. (These field-level checks are defined in the assay definition.)
  8. If no errors have occurred, the run is loaded into the database.

Passing Run Properties to Transformation Scripts

Information on run properties can be passed to a transform script in two ways. You can put a substitution token into your script to identify the run properties file, or you can configure your scripting engine to pass the file path as a command line argument. See Transformation Script Substitution Syntax for a list of available substitution tokens.

For example, using perl:

Option #1: Put a substitution token (${runInfo}) into your script and the server will replace it with the path to the run properties file. Here's a snippet of a perl script that uses this method:

# Open the run properties file. Run or upload set properties are not used by
# this script. We are only interested in the file paths for the run data and
# the error file.

open my $reportProps, '${runInfo}';

Option #2: Configure your scripting engine definition so that the file path is passed as a command line argument:

  • Go to Admin > Site > Admin Console.
  • Select the Views and Scripting.
  • Select and edit the perl engine.
  • Add ${runInfo} to the Program Command field.



Example Workflow: Develop a Transformation Script (perl)


This example workflow describes the process for developing a perl transformation script. There are two potential use cases:
  • transform run data
  • transform run properties
This page will walk through the process of creating an assay transformation script for run data, and give an example of a run properties transformation at the end.

Script Engine Setup

Before you can develop or run validation or transform scripts, configure the necessary Scripting Engines. You only need to set up a scripting engine once per type of script. You will need a copy of Perl running on your machine to set up the engine.

  • Select Admin > Site > Admin Console.
  • Click Views and Scripting.
  • Click Add > New Perl Engine.
  • Fill in as shown, specifying the "pl" extension and full path to the perl executable.
  • Click Submit.

Add a Script to the Assay Design

Create a new empty .pl file in the development location of your choice and include it in your assay design.

  • Navigate to the Assay Tutorial.
  • Click GenericAssay in the Assay List web part.
  • Select Manage Assay Design > copy assay design.
  • Click Copy to Current Folder.
  • Enter a new name, such as "TransformedAssay".
  • Click Add Script and type the full path to the new script file you are creating.
  • Check the box for Save Script Data.
  • Confirm that the batch, run, and data fields are correct.
  • Click Save and Close.

Obtain Test Data

To assist in writing your transform script, you will next obtain sample "runData.tsv" and "runProperties.tsv" files showing the state of your data import 'before' the transform script would be applied. To generate useful test data, you need to import a data run using the new assay design.

  • Open and select the following file (if you have already imported this file during the tutorial, you will first need to delete that run):
LabKeyDemoFiles/Assays/Generic/GenericAssay_Run4.xls
  • Click Import Data.
  • Select the TransformedAssay design you just defined, then click Import.
  • Click Next, then Save and Finish.
  • When the import completes, select Manage Assay Design > edit assay design.
  • You will now see a Download Test Data button that was not present during initial assay design.
  • Click it and unzip the downloaded "sampleQCData" package to see the .tsv files.
  • Open the "runData.tsv" file to view the current fields.
Date	VisitID	ParticipantID	M3	M2	M1	SpecimenID
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value

Save Script Data

Typically transform and validation script data files are deleted on script completion. For debug purposes, it can be helpful to be able to view the files generated by the server that are passed to the script. When the Save Script Data checkbox is checked, files will be saved to a subfolder named: "TransformAndValidationFiles", in the same folder as the original script. Beneath that folder are subfolders for the AssayId, and below that a numbered directory for each run. In that nested subdirectory you will find a new "runDataFile.tsv" that will contain values from the run file plugged into the current fields.

participantid	Date	M1	M2	M3
249318596 2008-06-07 00:00 435 1111 15.0
249320107 2008-06-06 00:00 456 2222 13.0
249320107 2008-03-16 00:00 342 3333 15.0
249320489 2008-06-30 00:00 222 4444 14.0
249320897 2008-05-04 00:00 543 5555 32.0
249325717 2008-05-27 00:00 676 6666 12.0

Define the Desired Transformation

The runData.tsv file gives you the basic fields layout. Decide how you need to modify the default data. For example, perhaps for our project we need an adjusted version of the value in the M1 field - we want the doubled value available as an integer.

Add Required Fields to the Assay Design

  • Select Manage Assay Design > edit assay design.
  • Scroll down to the Data Fields section and click Add Field.
  • Enter "AdjustM1", "Adjusted M1", and select type "Integer".
  • Click Save and Close.

Write a Script to Transform Run Data

Now you have the information you need to write and refine your transformation script. Open the empty script file and paste the contents of the Modify Run Data box from this page: Example Transformation Scripts (perl).

Iterate over the Sample Run

Re-import the same run using the transform script you have defined.

  • From the run list, select the run and click Re-import Run.
  • Click Next.
  • Under Run Data, click Use the data file(s) already uploaded to the server.
  • Click Save and Finish.

The results now show the new field populated with the Adjusted M1 value.

Until the results are as desired, you will edit the script and use Reimport Run to retry.

Once your transformation script is working properly, re-edit the assay design one more time to uncheck the Save Script Data box - otherwise your script will continue to generate artifacts with every run and could eventually fill your disk.

Debugging Transformation Scripts

If your script has errors that prevent import of the run, you will see red text in the Run Properties window. If you fail to select the correct data file, for example:

If you have a type mismatch error between your script results and the defined destination field, you will see an error like:

Errors File

If the validation script needs to report an error that is displayed by the server, it adds error records to an error file. The location of the error file is specified as a property entry in the run properties file. The error file is in a tab-delimited format with three columns:

  1. type: error, warning, info, etc.
  2. property: (optional) the name of the property that the error occurred on.
  3. message: the text message that is displayed by the server.
Sample errors file:
typepropertymessage
errorrunDataFileA duplicate PTID was found : 669345900
errorassayIdThe assay ID is in an invalid format



Example Transformation Scripts (perl)


There are two use cases for writing transformation scripts:
  • Modify Run Data
  • Modify Run Properties
This page shows an example of each type of script using perl.

Modify Run Data

This script is used in the Example Workflow: Develop a Transformation Script (perl) and populates a new field with data derived from an existing field in the run.

#!/usr/local/bin/perl
use strict;
use warnings;


# Open the run properties file. Run or upload set properties are not used by
# this script. We are only interested in the file paths for the run data and
# the error file.

open my $reportProps, '${runInfo}';

my $transformFileName = "unknown";
my $dataFileName = "unknown";

my %transformFiles;

# Parse the data file properties from reportProps and save the transformed data location
# in a map. It's possible for an assay to have more than one transform data file, although
# most will only have a single one.

while (my $line=<$reportProps>)
{
chomp($line);
my @row = split(/\t/, $line);

if ($row[0] eq 'runDataFile')
{
$dataFileName = $row[1];

# transformed data location is stored in column 4

$transformFiles = $row[3];
}
}

my $key;
my $value;
my $adjustM1 = 0;

# Read each line from the uploaded data file and insert new data (double the value in the M1 field)
# into an additional column named 'Adjusted M1'. The additional column must already exist in the assay
# definition and be of the correct type.

while (($key, $value) = each(%transformFiles)) {

open my $dataFile, $key or die "Can't open '$key': $!";
open my $transformFile, '>', $value or die "Can't open '$value': $!";

my $line=<$dataFile>;
chomp($line);
$line =~ s/\r*//g;
print $transformFile $line, "\t", "Adjusted M1", "\n";

while (my $line=<$dataFile>)
{
$adjustM1 = substr($line, 27, 3) * 2;
chomp($line);
$line =~ s/\r*//g;
print $transformFile $line, "\t", $adjustM1, "\n";

}

close $dataFile;
close $transformFile;
}

Modify Run Properties

You can also define a transform script that modifies the run properties, as show in this example which parses the short filename out of the full path:

#!/usr/local/bin/perl
use strict;
use warnings;

# open the run properties file, run or upload set properties are not used by
# this script, we are only interested in the file paths for the run data and
# the error file.

open my $reportProps, $ARGV[0];

my $transformFileName = "unknown";
my $uploadedFile = "unknown";

while (my $line=<$reportProps>)
{
chomp($line);
my @row = split(/\t/, $line);

if ($row[0] eq 'transformedRunPropertiesFile')
{
$transformFileName = $row[1];
}
if ($row[0] eq 'runDataUploadedFile')
{
$uploadedFile = $row[1];
}
}

if ($transformFileName eq 'unknown')
{
die "Unable to find the transformed run properties data file";
}

open my $transformFile, '>', $transformFileName or die "Can't open '$transformFileName': $!";

#parse out just the filename portion
my $i = rindex($uploadedFile, "\\") + 1;
my $j = index($uploadedFile, "
.xls");

#add a value for fileID

print $transformFile "
FileID", "\t", substr($uploadedFile, $i, $j-$i), "\n";
close $transformFile;



Transformation Scripts in R


Overview

Users importing instrument-generated tabular datasets into LabKey Server may run into the following difficulties:

  • Instrument-generated files often contain header lines before the main dataset, denoted by a leading # or ! or other symbol. These lines usually contain useful metadata about the protocol or reagents or samples tested, and in any case need to be skipped over to find the main data set.
  • The file format is optimized for display, not for efficient storage and retrieval. For example, columns that correspond to individual samples are difficult to work with in a database.
  • The data to be imported contains the display values from a lookup column, which need to be mapped to the foreign key values for storage.
All of these problems can be solved with a transform script. Transform scripts were originally designed to fill in additional columns such as quality control values in an imported assay data set. The assay framework, however, allows for transform scripts to solve a much wider range of challenges. And R is a good choice of language for writing transform scripts, because R contains a lot of built-in functionality for manipulating tabular data sets.

First we review the way to hookup a transform script to an assay and the communications mechanisms between the assay framework and a transform script in R.

Identifying the Path to the Script File

Transform scripts are designated as part of a assay by providing a fully qualified path to the script file in the field named at the top of the assay instance definition. A convenient location to put the script file is to upload it using a File web part defined in the same folder as the assay definition. Then the fully qualified path to the script file is the concatenation of the file root for the folder (for example, "C:\lktrunk\build\deploy\files\MyAssayFolderName\@files\", as determined by the Files page in the Admin console) plus the file path to the script file as seen in the File web part (for example, "scripts\LoadData.R". For the file path, LabKey Server accepts the use of either backslashes (the default Windows format) or forward slashes.

When working on your own developer workstation, you can put the script file wherever you like, but putting it within the scope of the File manager will make it easier to deploy to a server. It also makes iterative development against a remote server easier, since you can use a Web-DAV enabled file editor to directly edit the same file that the server is calling.

If your transform script calls other script files to do its work, the normal way to pull in the source code is using the source statement, for example

source("C:\lktrunk\build\deploy\files\MyAssayFolderName\@files\Utils.R")

But to keep the scripts so that they are easily moved to other servers, it is better to keep the script files together and the built-in substitution token "${srcDirectory}" which the server automatically fills in to be the directory where the called script file is located , for example:

source("${srcDirectory}/Utils.R");

Accessing and Using the Run Properties File

The primary mechanism for communication between the LabKey Assay framework and the Transform script is the Run Properties file. Again a substitution token tells the script code where to find this file. The script file should contain a line like

rpPath<- "${runInfo}"

When the script is invoked by the assay framework, the rpPath variable will contain the fully qualified path to the run properties file.

The run properties file contains three categories of properties:

1. Batch and run properties as defined by the user when creating an assay instance. These properties are of the format: <property name> <property value> <java data type>

for example,

gDarkStdDev 1.98223 java.lang.Double

When the transform script is called these properties will contain any values that the user has typed into the corresponding text box under the “Batch Properties” of “Run Properties” sections of the upload form. The transform script can assign or modify these properties based on calculations or by reading them from the raw data file from the instrument. The script must then write the modified properties file to the location specified by the transformedRunPropertiesFile property (see #3 below)

2. Context properties of the assay such as assayName, runComments, and containerPath. These are recorded in the same format as the user-defined batch and run properties, but they cannot be overwritten by the script.

3. Paths to input and output files. These are fully qualified paths that the script reads from or writes to. They are in a <property name> <property value> format without property types. The paths currently used are:

  • a. runDataUploadedFile: the raw data file that was selected by the user and uploaded to the server as part of an import process. This can be an Excel file, a tab-separated text file, or a comma-separated text file.
  • b. runDataFile: the imported data file after the assay framework has attempted to convert the file to .tsv format and match its columns to the assay data result set definition. The path will point to a subfolder below the script file directory, with a path value similar to <property value> <java property type>. The AssayId_22\42 part of the directory path serves to separate the temporary files from multiple executions by multiple scripts in the same folder.
C:\lktrunk\build\deploy\files\transforms\@files\scripts\TransformAndValidationFiles\AssayId_22\42\runDataFile.tsv
  • c. AssayRunTSVData: This file path is where the result of the transform script will be written. It will point to a unique file name in an “assaydata” directory that the framework creates at the root of the files tree. NOTE: this property is written on the same line as the runDataFile property.
  • d. errorsFile: This path is where a transform or validation script can write out error messages for use in troubleshooting. Not normally needed by an R script because the script usually writes errors to stdout, which are written by the framework to a file named “<scriptname>.Rout”.
  • e. transformedRunPropertiesFile: This path is where the script writes out the updated values of batch- and run-level properties that are listed in the runProperties file.

Choosing the Input File for Transform Script Processing

The transform script developer can choose to use either the runDataFile or the runDataUploadedFile as its input. The runDataFile would be the right choice for an Excel-format raw file and a script that fills in additional columns of the data set. By using the runDataFile, the assay framework does the Excel-to-TSV conversion and the script doesn’t need to know how to parse Excel files. The runDataUploadedFile would be the right choice for a raw file in TSV format that the script is going to reformat by turning columns into rows. In either case, the script writes its output to the AssayRunTSVData file.

Transform Script Options

There are two useful options presented as checkboxes in the Assay designer.

  • Save Script Data tells the framework to not delete the intermediate files such as the runProperties file after a successful run. This option is important during script development. It can be turned off to avoid cluttering the file space under the TransformAndValidationFiles directory that the framework automatically creates under the script file directory.
  • Upload In Background tells the framework to create a pipeline job as part of the import process, rather than tying up the browser session. It is useful for importing large data sets.

Connecting Back to the Server from a Transform Script

Sometimes a transform script needs to connect back to the server to do its job. One example is translating lookup display values into key values. The Rlabkey library available on CRAN has the functions needed to connect to, query, and insert or update data in the local LabKey Server where it is running. To give the connection the right security context (the current user’s), the assay framework provides the substitution token ${rLabkeySessionId}. Including this token on a line by itself near the beginning of the transform script eliminates the need to use a config file to hold a username and password for this loopback connection. It will be replaced with two lines that looks like:

labkey.sessionCookieName = "JSESSIONID" labkey.sessionCookieContents = "TOMCAT_SESSION_ID"

where TOMCAT_SESSION_ID is the actual ID of the user's HTTP session.

Debugging an R Transform Script

You can load an R transform script into the R console/debugger and run the script with debug(<functionname> commands active. Since the substitution tokens described above ( ${srcDirectory} , ${runInfo}, and ${rLabkeySessionId} ) are necessary to the correct operation of the script, the framework conveniently writes out a version of the script with these substitutions made, into the same subdiretory as the runProperties.tsv file is found. Load this modified version of the script into the R console.

Example Script

Setup

  • Create a new project, type Assay
  • Add the following Web parts:
    • Files
    • Lists
    • Data Pipeline
    • Sample Sets (narrow)
  • Copy the scripts folder from the data folder to the root of the Files web part tree
  • Create a sample set called ExampleSamples
    • Click on header of Sample Sets web part
    • Select Import Sample Set
    • Open the file samples.txt in a text editor or Excel (Click to download from page.)
    • Copy and paste the contents into the import window , select sampleId as the key field
  • Create a list called probesources by importing ProbeSourcesListArchive.zip (Click to download.)
  • Create a GPAT assay with transform script
  • Run the assay
    • Click on assay name
    • Import data button on toolbar
    • Select probe source from list, leave property Prefix, press Next
    • Column Names Row: 65
    • Sample Set: ExampleSamples
    • Run Data: Upload a data file. Choose file GSE11199_series_matrix_200.txt (Click to download.)
    • Save and finish

A Look at the Code

This transform script example handles the data output from an Affymetrics microarray reader. The data file contains 64 lines of metadata before the chip-level intensity data. The metadata describes the platform, the experiment, and the samples used. The spot-level data is organized with one column per sample, which may be efficient for storage in a spreadsheet but isn’t good for querying in a database.

The transform script does the following tasks:

  1. Reads in the runProperties file
  2. Gets additional import processing parameters from a lookup list, such as the prefix that designates a comment line containing a property-value pair
  3. Fills in run properties that are read from the data file header (marked by the prefix). Writes the transformed run properties to the designated file location so they get stored with the assay.
  4. Converts sample identifiers to sample set key values so that a lookup from result data to sample set properties works.
  5. Skips over a specified number of rows to the beginning of the spot data.
  6. Reshapes the input data so that the result set is easier to query by sample
The areas of the code that do these things are marked with the corresponding number.

TransformScriptExample.R

options(stringsAsFactors = FALSE) 
source("${srcDirectory}/ExampleUtils.R")
baseUrl<-"http://localhost:8080/labkey"

${rLabkeySessionId}
rpPath<- "${runInfo}"

## read the file paths etc out of the runProperties.tsv file
params <- getRunPropsList(rpPath, baseUrl)

## read the input data frame just to get the column headers.
inputDF<-read.table(file=params$inputPathUploadedFile, header = TRUE,
sep = "\t", quote = "\"",
fill=TRUE, stringsAsFactors = FALSE, check.names=FALSE,
row.names=NULL, skip=(params$loaderColNamesRow -1), nrows=1)
cols<-colnames(inputDF)

## create a Name to RowId map for samples
keywords <- as.vector(colnames(inputDF)[-1])
queryName=params$sampleSetName

keywordMap<- getLookupMap( keywords, baseUrl=baseUrl, folderPath=params$containerPath,
schemaName="
Samples", queryName=queryName, keyField="rowId",
displayField="
SampleId")

doRunLoad(params=params, inputColNames=cols, outputColNames=c( "
ID_REF", "sample", "val"),
lookupMap=keywordMap)

ExampleUtils.R, function getRunPropsList()

getRunPropsList<- function(rpPath, baseUrl) 
{
rpIn<- read.table(rpPath, col.names=c("name", "val1", "val2", "val3"), #########
header=FALSE, check.names=FALSE, ## 1 ##
stringsAsFactors=FALSE, sep="\t", quote="", fill=TRUE, na.strings=""); #########

## pull out the run properties

params<- list(inputPathUploadedFile = rpIn$val1[rpIn$name=="runDataUploadedFile"],
inputPathValidated = rpIn$val1[rpIn$name=="runDataFile"],

##a little strange. AssayRunTSVData is the one we need to output to
outputPath = rpIn$val3[rpIn$name=="runDataFile"],

containerPath = rpIn$val1[rpIn$name=="containerPath"],
runPropsOutputPath = rpIn$val1[rpIn$name=="transformedRunPropertiesFile"],
sampleSetId = as.integer(rpIn$val1[rpIn$name=="sampleSet"]),
probeSourceId = as.integer(rpIn$val1[rpIn$name=="probeSource"]),
errorsFile = rpIn$val1[rpIn$name=="errorsFile"])

## lookup the name of the sample set based on its number
if (length(params$sampleSetId)>0)
{
df<-labkey.selectRows(baseUrl=baseUrl,
folderPath=params$containerPath, schemaName="exp", queryName="SampleSets",
colFilter=makeFilter(c("rowid", "EQUALS", params$sampleSetId)))
params<- c(params, list(sampleSetName=df$Name))
}

## This script reformats the rows in batches of 1000 in order to reduce
## the memory requirements of the R calculations
params<-c(params, list(loaderBatchSize=as.integer(1000)))

## From the probesource lookup table, get the prefix characters that
## identify property value comment lines in the data file, and the starting
## line number of the spot data table within the data file
dfProbeSource=labkey.selectRows(baseUrl=baseUrl, folderPath=params$containerPath, #########
schemaName="lists", queryName="probesources", ## 2 ##
colFilter=makeFilter(c("probesourceid", "EQUALS", params$probeSourceId))) #########

params<-c(params, list(propertyPrefix=dfProbeSource$propertyPrefix,
loaderColNamesRow=dfProbeSource$loaderColNamesRow))

if (is.null(params$loaderColNamesRow) | is.na(params$loaderColNamesRow))
{
params$loaderColNamesRow <- 1
}

## now apply the run property values reported in the header
## of the data tsv file to the corresponding run properties
conInput = file(params$inputPathUploadedFile, "r")

line<-""
pfx <- as.integer(0)
fHasProps <- as.logical(FALSE)

if (!is.na(params$propertyPrefix))
{ #########
pfx<-nchar(params$propertyPrefix) ## 3 ##
} #########

while(pfx>0)
{
line<-readLines(conInput, 1)
if (nchar(line)<=pfx) {break}
if (substring(line, 1, pfx) != params$propertyPrefix) {break}
strArray=strsplit(substring(line, pfx+1, nchar(line)) ,"\t", fixed=TRUE)
prop<- strArray[[1]][1]
val<- strArray[[1]][2]
if (length(rpIn$name[rpIn$name==prop]) > 0 )
{
## dealing with dates is sometimes tricky. You want the value pushed to rpIn
## to be a string representing a date but in the default date format This data
## file uses a non-defualt date format that we explicitly convert to date using
## as.Date and a format string.
## Then convert it back to character using the default format.

if (rpIn$val2[rpIn$name==prop]=="java.util.Date")
{
val<-as.character(as.Date(val, "%b%d%y"))
}
rpIn$val1[rpIn$name==prop]<-val
fHasProps <- TRUE
}
}

if (fHasProps)
{
## write out the transformed run properties to the file that
## the assay framework will read in
write.table(rpIn, file=params$runPropsOutputPath, sep="\t", quote=FALSE
, na="" , row.names=FALSE, col.names=FALSE, append=FALSE)
}
return (params)

}

getLookupMap()

getLookupMap<- function(uniqueLookupValues, baseUrl, folderPath, schemaName, 
queryName, keyField, displayField, otherColName=NULL, otherColValue=NULL)
{
inClauseVals = paste(uniqueLookupValues, collapse=";") #########
colfilt<-makeFilter(c(displayField, "EQUALS_ONE_OF", inClauseVals)) ## 4 ##
if (!is.null(otherColName)) #########
{
otherFilter=makeFilter(c(otherColName, "EQUALS", otherColValue))
colfilt = c(colfilt, otherFilter)
}
colsel<- paste(keyField, displayField, sep=",")

lookupMap <-labkey.selectRows(baseUrl=baseUrl, folderPath=folderPath,
schemaName=schemaName, queryName=queryName,
colSelect=colsel, colFilter=colfilt, showHidden=TRUE)

newLookups<- uniqueLookupValues[!(uniqueLookupValues %in% lookupMap[,2])]

if (length(newLookups)>0 && !is.na(newLookups[1]) )
{
## insert the lookup values that we haven't already seen before
newLookupsToInsert<- data.frame(lookupValue=newLookups, stringsAsFactors=FALSE)
colnames(newLookupsToInsert)<- displayField
if (!is.null(otherColName))
{
newLookupsToInsert<-cbind(newLookupsToInsert, otherColValue)
colnames(newLookupsToInsert)<- c(displayField, otherColName)
}

result<- labkey.insertRows(baseUrl=baseUrl, folderPath=folderPath,
schemaName=schemaName, queryName=queryName, toInsert= newLookupsToInsert)

lookupMap <-labkey.selectRows(baseUrl=baseUrl, folderPath=folderPath,
schemaName=schemaName, queryName=queryName,
colSelect=colsel, colFilter=colfilt, showHidden=TRUE)
}
colnames(lookupMap)<- c("RowId", "Name")

return(lookupMap)
}

doRunLoad()

doRunLoad<-function(params, inputColNames, outputColNames, lookupMap)
{
folder=params$containerPath
unlink(params$outputPath)

cIn <- file(params$inputPathUploadedFile, "r")
cOut<- file(params$outputPath , "w")

## write the column headers to the output file
headerDF<-data.frame(matrix(NA, nrow=0, ncol=length(outputColNames)))
colnames(headerDF)<- outputColNames

write.table(headerDF, file=cOut, sep="\t", quote=FALSE, row.names=FALSE, na="",
col.names=TRUE, append=FALSE)

# the fisrt read from the input file skips rows up to and including the header
skipCnt<-params$loaderColNamesRow

## read in chunks of batchSize, which are then transposed and written to the output file. #########
## blkStart is the 1-based index of the starting row of a chunk ## 5 ##
#########
blkStart <- skipCnt + 1
rowsToRead <- params$loaderBatchSize

while(rowsToRead > 0)
{
inputDF <- read.table(file=cIn, header = FALSE, sep = "\t", quote = "\"",
na.strings = "
---", fill=TRUE, row.names=NULL,
stringsAsFactors = FALSE, check.names=FALSE,
col.names=inputColNames ,skip=skipCnt, nrows=rowsToRead)

cols<-colnames(inputDF)

if(NROW(inputDF) >0)
{
idVarName<-inputColNames[1]
df1 <- reshape(inputDF, direction="
long", idvar=idVarName,,
v.names="
Val",timevar="Name"
,times=cols[-1], varying=list(cols[-1]) ) #########
## 6 ##
df2<- merge(df1, lookupMap) #########
reshapedRows<- data.frame(cbind(df2[,idVarName], df2[,"
RowId"],
df2[,"
Val"], params$probeSourceId ), stringsAsFactors=FALSE)

reshapedRows[,2] <- as.integer(reshapedRows[,2])
reshapedRows[,4] <- as.integer(reshapedRows[,4])

nonEmptyRows<- !is.na(reshapedRows[,3])
reshapedRows<-reshapedRows[nonEmptyRows ,]

reshapedRows<- reshapedRows[ do.call(order, reshapedRows[1:2]), ]
colnames(reshapedRows)<- outputColNames

## need to double up the single quotes in the data
reshapedRows[,3]<-gsub("
'", "''", reshapedRows[,3],fixed=TRUE)

write.table(reshapedRows, file=cOut, sep="
\t", quote=TRUE, na="" ,
row.names=FALSE, col.names=FALSE, append=TRUE)

df1<-NULL
df2<-NULL
reshapedRows<-NULL
recordsToInsert<-NULL

}

if (NROW(inputDF)< rowsToRead)
{
##we've hit the end of the file, no more to read
rowsToRead <- 0
}
else
{
## now look where the next block will start, and read up to the end row
blkStart <- blkStart + rowsToRead
}
## skip rows only on the first read
skipCnt<-0
}
inputDF<-NULL
close(cIn)
close(cOut)
}



Transformation Scripts in Java


Overview

LabKey Server supports transformation scripts for assay data at upload time. This feature is primarily targeted for Perl or R scripts; however, the framework is general enough that any application that can be externally invoked can be run as well, including a Java program.

Java appeals to programmers who desire a stronger-typed language than most script-based languages. Most important, using a Java-based validator allows a developer to leverage the remote client API and take advantage of the classes available for assays, queries, and security.

This page outlines the steps required to configure and create a Java-based transform script. The ProgrammaticQCTest script, available in the BVT test, provides an example of a script that uses the remote client API.

Configure the Script Engine

In order to use a Java-based validation script, you will need to configure an external script engine to bind a file with the .jar extension to an engine implementation.

To do this:

  • Go to the Admin Console for your site.
  • Select the [views and scripting configuration] option.
  • Create a new external script engine.
  • Set up the script engine by filling in its required fields:
    • File extension: jar
    • Program path: (the absolute path to java.exe)
    • Program command: -jar "${scriptFile}" "${runInfo}"
      • scriptFile - The full path to the (processed and rewritten) transform script. This is usually in a temporary location the server manages.
      • runInfo - The full path to the run properties file the server creates. For further info on this file, see the "Run Properties File" section of the Transformation Scripts documentation.
      • srcDirectory - The original directory of the transform script (usually specified in the assay definition).

The program command configured above will invoke the java.exe application against a .jar file passing in the run properties file location as an argument to the java program. The run properties file contains information about the assay properties including the uploaded data and the location of the error file used to convey errors back to the server. Specific details about this file are contained in the data exchange specification for Programmatic QC.

Implement a Java Validator

The implementation of your java validator class must contain an entry point matching the following function signature:

public static void main(String[] args)

The location of the run properties file will be passed from the script engine configuration (described above) into your program as the first element of the args array.

The following code provides an example of a simple class that implements the entry point and handles any arguments passed in:

public class AssayValidator
{
private String _email;
private String _password;
private File _errorFile;
private Map<String, String> _runProperties;
private List<String> _errors = new ArrayList<String>();

private static final String HOST_NAME = "http://localhost:8080/labkey";
private static final String HOST = "localhost:8080";

public static void main(String[] args)
{
if (args.length != 1)
throw new IllegalArgumentException("Input data file not passed in");

File runProperties = new File(args[0]);
if (runProperties.exists())
{
AssayValidator qc = new AssayValidator();

qc.runQC(runProperties);
}
else
throw new IllegalArgumentException("Input data file does not exist");
}

Create a Jar File

Next, compile and jar your class files, including any dependencies your program may have. This will save you from having to add a classpath parameter in your engine command. Make sure that a ‘Main-Class’ attribute is added to your jar file manifest. This attribute points to the class that implements your program entry point.

Set Up Authentication for Remote APIs

Most of the remote APIs require login information in order to establish a connection to the server. Credentials can be hard-coded into your validation script or passed in on the command line. Alternatively, a .netrc file can be used to hold the credentials necessary to login to the server. For further information, see: Create a .netrc or _netrc file.

The following sample code can be used to extract credentials from a .netrc file:

private void setCredentials(String host) throws IOException
{
NetrcFileParser parser = new NetrcFileParser();
NetrcFileParser.NetrcEntry entry = parser.getEntry(host);

if (null != entry)
{
_email = entry.getLogin();
_password = entry.getPassword();
}
}

Associate the Validator with an Assay Instance

Finally, the QC validator must be attached to an assay. To do this, you will need to editing the assay design and specify the absolute location of the .jar file you have created. The engine created earlier will bind the .jar extension to the java.exe command you have configured.




Transformation Scripts for Module-based Assays


A transformation script can be included in a module-based assay by including a directory called 'scripts' in the assay directory. In this case, the exploded module structure looks something like:

<assay>
|_domains
|_views
|_scripts
|_config.xml

The scripts directory contains one or more script files; e.g., "validation.pl".

The order of script invocation can be specified in the config.xml file. See the <transformScripts> element. If scripts are not listed in the config.xml file, they will be executed in alphabetical order based on file name.

A script engine must be defined for the appropriate type of script (for the example script named above, this would be a Perl engine). The rules for defining a script engine for module-based assays are the same as they are for Java-based assays.

When a new assay instance is created, you will notice that the script appears in the assay designer, but it is read-only (the path cannot be changed or removed). Just as for Java-defined assays, you will still see an additional text box where you can specify one or more additional scripts.




Run Properties Reference


Run properties are defined as part of assay design and values are specified at run upload. The server creates a runProperties.tsv file and rewrites the uploaded data in TSV format. Assay-specific properties from both the run and batch levels are included.

There are standard default assay properties which apply to most assay types, as well as additional properties specific to the assay type. For example, NAb, Luminex, and ELISpot assays can include specimen, analyte, and antigen properties which correspond to locations on a plate associated with the assay instance.

The runProperties.tsv file also contains additional context information that the validation script might need, such as username, container path, assay instance name, assay id. Since the uploaded assay data will be written out to a file in TSV format, the runProperties.tsv also specifies the destination file's location.

Run Properties Format

The runProperties file has three (or four) tab-delimited columns in the following order:

  1. property name
  2. property value
  3. data type – The java class name of the property value (java.lang.String). This column may have a different meaning for properties like the run data, transformed data, or errors file. More information can be found in the property description below.
  4. transformed data location – The full path to the location where the transformed data are rewritten in order for the server to load them into the database.
The file does not contain a column header row because the column order is fixed.

Generic Assay Run Properties

Property NameData TypeProperty Description
assayIdStringThe value entered in the Assay Id field of the run properties section.
assayNameStringThe name of the assay design given when the new assay design was created.
assayTypeStringThe type of this assay design. (GenericAssay, Luminex, Microarray, etc.)
baseUrlURL StringFor example, http://localhost:8080/labkey
containerPathStringThe container location of the assay. (for example, /home/AssayTutorial)
errorsFileFull PathThe full path to a .tsv file where any validation errors are written. See details below.
originalFileLocationFull PathThe full path to the original location of the file being imported as an assay.
protocolDescriptionStringThe description of the assay definition when the new assay design was created.
protocolIdStringThe ID of this assay definition.
protocolLsidStringThe assay definition LSID.
runCommentsStringThe value entered into the Comments field of the run properties section.
runDataUploadedFileFull PathThe original data file that was selected by the user and uploaded to the server as part of an import process. This can be an Excel file, a tab-separated text file, or a comma-separated text file.
runDataFileFull PathThe imported data file after the assay framework has attempted to convert the file to .tsv format and match its columns to the assay data result set definition.
transformedRunPropertiesFileFull PathFile where the script writes out the updated values of batch- and run-level properties that are listed in the runProperties file.
userNameStringThe user who created the assay design.
workingDirStringThe temp location that this script is executed in. (e.g. C:\AssayId_209\39\)

errorsFile

Validation errors can be written to a TSV file as specified by full path with the errorsFile property. This output file is formatted with three columns:

  • Type - "error" or "warn"
  • Property - the name of the property raising the validation error
  • Message - the actual error message
For additional information about handling errors and warnings in transformation scripts, see: Warnings in Transformation Scripts.

Additional Assay Specific Run Properties

ELISpot

Property NameData TypeProperty Description
sampleDataStringThe path to a file that contains sample data written in a tab-delimited format. The file will contain all of the columns from the sample group section of the assay design. A wellgroup column will be written that corresponds to the well group name in the plate template associated with this assay instance. A row of data will be written for each well position in the plate template.
antigenDataStringThe path to a file that contains antigen data written in a tab-delimited format. The file contains all of the columns from the antigen group section of the assay design. A wellgroup column corresponds to the well group name in the plate template associated with this assay instance. A row of data is written for each well position in the plate template.

Luminex

Property NameData TypeProperty Description
DerivativeString 
AdditiveString 
SpecimenTypeString 
DateModifiedDate 
ReplacesPreviousFileBoolean 
TestDateDate 
ConjugateString 
IsotypeString 

NAb (TZM-bl Neutralizing Antibody) Assay

Property NameData TypeProperty Description
sampleDataStringThe path to a file that contains sample data written in a tab-delimited format. The file contains all of the columns from the sample group section of the assay design. A wellgroup column corresponds to the well group name in the plate template associated with this assay instance. A row of data is written for each well position in the plate template.

General Purpose Assay Type (GPAT)

Property NameData TypeProperty Description
severityLevel (reserved)StringThis is a property name used internally for error and warning handling. Do not define your own property with the same name in a GPAT assay.
maximumSeverity (reserved)StringThis is a property name reserved for use in error and warning handling. Do not define your own property with the same name in a GPAT assay. See Warnings in Transformation Scripts for details.



Transformation Script Substitution Syntax


LabKey Server supports a number of substitutions that can be used with transformation scripts. These substitutions work both on the command-line being used to invoke the script (configured in the Views and Scripting section of the Admin Console), and in the text of transformation scripts themselves. See Transformation Scripts for a description of how to use this syntax.

Script SyntaxDescriptionSubstitution Value
${runInfo}File containing metadata about the runFull path to the file on the local file system
${srcDirectory}Directory in which the script file is locatedFull path to parent directory of the script
${rLabkeySessionId}Information about the current user's HTTP sessionlabkey.sessionCookieName = "COOKIE_NAME"
labkey.sessionCookieContents = "USER_SESSION_ID"
Note that this is multi-line. The cookie name is typically JSESSIONID, but is not in all cases.
${httpSessionId}The current user's HTTP session IDThe string value of the session identifier, which can be used for authentication when calling back to the server for additional information
${sessionCookieName}The name of the session cookieThe string value of the cookie name, which can be used for authentication when calling back to the server for additional information.
${baseServerURL}The server's base URL and context pathThe string of the base URL and context path. (ex. "http://localhost:8080/labkey")
${containerPath}The current container pathThe string of the current container path. (ex. "/ProjectA/SubfolderB")



Warnings in Transformation Scripts


In General Purpose Assay (GPAT) designs, you can enable reporting of warnings in a transformation script. Ordinarily, errors will stop the execution of a script and the assay import, but if warnings are configured, you can have the import pause on warnings and allow an operator to examine transformed results and elect to proceed or cancel the upload. Note that this feature applies only to the General Purpose Assay Type (GPAT) and is not a generic assay feature. Warning reporting is optional, and invisible unless you explicitly enable it. If your script does not update maximumSeverity, then no warnings will be triggered and no user interaction will be required.

Enable Support for Warnings in a Transformation Script

To raise a warning from within your transformation script, set maximumSeverity to WARN within the transformedRunProperties file. To report an error, set maximumSeverity to ERROR. To display a specific message with either a warning or error, write the message to errors.html in the current directory. For example, this snippet from an R transformation script defines a warning and error handler:

# writes the maximumSeverity level to the transformRunProperties file and the error/warning message to the error.html file.
# LK server will read these files after execution to determine if an error or warning occurred and handle it appropriately
handleErrorsAndWarnings <- function()
{
if(run.error.level > 0)
{
fileConn<-file(trans.output.file);
if(run.error.level == 1)
{
writeLines(c(paste("maximumSeverity","WARN",sep="t")), fileConn);
}
else
{
writeLines(c(paste("maximumSeverity","ERROR",sep="t")), fileConn);
}
close(fileConn);

# This file gets read and displayed directly as warnings or errors, depending on maximumSeverity level.
if(!is.null(run.error.msg))
{
fileConn<-file("errors.html");
writeLines(run.error.msg, fileConn);
close(fileConn);
}

quit();
}
}

Click here to download a sample transformation script including this handler and other configuration required for warning reporting.

Workflow for Warnings from Transformation Scripts

When a warning is triggered during assay import, the user will see a screen similar to this with the option to Proceed or Cancel the import after examining the output files:

After examining the output and transformed data files, if the user clicks Proceed the transform script will be rerun and no warnings will be raised the on second pass. Quieting warnings on the approved import is handled using the value of an internal property called severityLevel in the run properties file. Errors will still be raised if necessary.

Priority of Errors and Warnings:

  • 1. Script error (syntax, runtime, etc...) <- Error
  • 2. Script returns a non-zero value <- Error
  • 3. Script writes ERROR to maximumSeverity in the transformedRunProperties file <- Error
    • If the script also writes a message to errors.html, it will be displayed, otherwise a server generated message will be shown.
  • 4. Script writes WARN to maximumSeverity in the transformedRunProperties file <- Warning
    • If the script also writes a message to errors.html, it will be displayed, otherwise a server generated message will be shown.
    • The Proceed and Cancel buttons are shown, requiring a user selection to continue.
  • 5. Script does not write a value to maximumSeverity in transformedRunProperties but does write a message to errors.html. This will be interpreted as an error.