Table of Contents |
guest 2025-06-24 |
There are a handful of module-based assays in the LabKey SVN tree. You can find the modules in <LABKEY_HOME>/server/customModules. Examples include:
The assay consists of an assay config file, a set of domain descriptions, and view html files. The assay is added to a module by placing it in an assay directory at the top-level of the module. The assay has the following file structure:
The only required part of the assay is the <assay-name> directory. The config.xml, domain files, and view files are all optional.
This diagram shows the relationship between the pages. The details link will only appear if the corresponding details html view is available.
Module-based assays can be designed to jump to a "begin" page instead of a "runs" page. If an assay has a begin.html in the assay/<name>/views/ directory, users are directed to this page instead of the runs page when they click on the name of the assay in the assay list.
To create a module-based assay, you create a set of files that define the new assay design, describe the data import process, and define various types of assay views. The new assay is incorporated into your server when package these files as a module and restart your server. The new type of assay is then available on your server as the basis for new assay designs, in the same way that built-in assay types (e.g., Luminex) are available.
This tutorial explains how to incorporate a ready-made, module-based assay into your LabKey Server and make use of the new type of assay. It does not cover creation of the files that compose a module-based assay. Please refer to the "Related Topics" section below for instructions on how to create such files.
First download a pre-packed .module file and deploy it to LabKey Server.
exampleassay
└───assay
└───example
│ config.xml
│
├───domains
│ batch.xml
│ result.xml
│ run.xml
│
└───views
upload.html
The assay module is now available through the UI. Here we enable the module in a folder.
Next we create a new assay design based the module.
An assay module can define a custom domain to replace LabKey's built-in default assay domains, by adding a schema definition in the domains/ directory. For example:
assay/<assay-name>/domains/<domain-name>.xml
The name of the assay is taken from the <assay-name> directory. The contents of <domain-name>.xml file contains the domain definition and conforms to the <domain> element from assayProvider.xsd, which is in turn a DomainDescriptorType from the expTypes.xsd XML schema. There are three built-in domains for assays: "batch", "run", and "result". This following result domain replaces the build-in result domain for assays:
result.xml
<ap:domain xmlns:exp="http://cpas.fhcrc.org/exp/xml"
xmlns:ap="http://labkey.org/study/assay/xml">
<exp:Description>This is my data domain.</exp:Description>
<exp:PropertyDescriptor>
<exp:Name>SampleId</exp:Name>
<exp:Description>The Sample Id</exp:Description>
<exp:Required>true</exp:Required>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#string</exp:RangeURI>
<exp:Label>Sample Id</exp:Label>
</exp:PropertyDescriptor>
<exp:PropertyDescriptor>
<exp:Name>TimePoint</exp:Name>
<exp:Required>true</exp:Required>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#dateTime</exp:RangeURI>
</exp:PropertyDescriptor>
<exp:PropertyDescriptor>
<exp:Name>DoubleData</exp:Name>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#double</exp:RangeURI>
</exp:PropertyDescriptor>
</ap:domain>
To deploy the module, the assay directory is zipped up as a <module-name>.module file and copied to the LabKey server's modules directory.
When you create a new assay design for that assay type, it will use the fields defined in the XML domain as a template for the corresponding domain. Changes to the domains in the XML files will not affect existing assay designs that have already been created.
Suppose you want to add a [details] link to each row of an assay run table, that takes you to a custom details view for that row. You can add new views to the module-based assay by adding html files in the views/ directory, for example:
assay/<assay-name>/views/<view-name>.html
The overall page template will include JavaScript objects as context so that they're available within the view, avoiding an extra client API request to fetch it from the server. For example, the result.html page can access the assay definition and result data as LABKEY.page.assay and LABKEY.page.result respectively. Here is an example custom details view named result.html:
1 <table>
2 <tr>
3 <td class='labkey-form-label'>Sample Id</td>
4 <td><div id='SampleId_div'>???</div></td>
5 </tr>
6 <tr>
7 <td class='labkey-form-label'>Time Point</td>
8 <td><div id='TimePoint_div'>???</div></td>
9 </tr>
10 <tr>
11 <td class='labkey-form-label'>Double Data</td>
12 <td><div id='DoubleData_div'>???</div></td>
13 </tr>
14 </table>
15
16 <script type="text/javascript">
17 function setValue(row, property)
18 {
19 var div = Ext.get(property + "_div");
20 var value = row[property];
21 if (!value)
22 value = "<none>";
23 div.dom.innerHTML = value;
24 }
25
26 if (LABKEY.page.result)
27 {
28 var row = LABKEY.page.result;
29 setValue(row, "SampleId");
30 setValue(row, "TimePoint");
31 setValue(row, "DoubleData");
32 }
33 </script>
Note on line 28 the details view is accessing the result data from LABKEY.page.result. See Example Assay JavaScript Objects for a description of the LABKEY.page.assay and LABKEY.page.result objects.
Same as for the custom details page for the row data except the view file name is run.html and the run data will be available as the LABKEY.page.run variable. See Example Assay JavaScript Objects for a description of the LABKEY.page.run object.
Same as for the custom details page for the row data except the view file name is batch.html and the run data will be available as the LABKEY.page.batch variable. See Example Assay JavaScript Objects for a description of the LABKEY.page.batch object.
In LabKey Server v17.2 an administrator can find and fix these table title references by proactively disabling the "alwaysUseTitlesForLoadingCustomViews" using this experimental feature. If you want to improve performance, removing reliance on this feature will help.
Custom views loading by table title will now generate a warning enabling you to find and fix them.
The correct way to attach a custom view to a table is to bind via the query name. For instance, if you have a query in the elispot module called QueryName, which includes the table name definition as TableTitle, and your custom view is called MyView, you would place the xml file here:
./resources/assay/elispot/queries/QueryName/MyView.qview.xml
With the "alwaysUseTitlesForLoadingCustomViews" flag set, you would also have been able to load the above example view by binding it to the table name, i.e.:
./resources/assay/elispot/queries/TableTitle/MyView.qview.xml
In version 17.3, this flag will be removed, so to fix legacy views and remove reliance on this flag, use the experimental feature described above to disable it in version 17.2 and modify any module based custom views to directly reference the query name.
LABKEY.page.assay:
The assay definition is available as LABKEY.page.assay for all of the html views. It is a JavaScript object, which is of type LABKEY.Assay.AssayDesign:
LABKEY.page.assay = {
"id": 4,
"projectLevel": true,
"description": null,
"name": <assay name>,
// domains objects: one for batch, run, and result.
"domains": {
// array of domain property objects for the batch domain
"<assay name> Batch Fields": [
{
"typeName": "String",
"formatString": null,
"description": null,
"name": "ParticipantVisitResolver",
"label": "Participant Visit Resolver",
"required": true,
"typeURI": "http://www.w3.org/2001/XMLSchema#string"
},
{
"typeName": "String",
"formatString": null,
"lookupQuery": "Study",
"lookupContainer": null,
"description": null,
"name": "TargetStudy",
"label": "Target Study",
"required": false,
"lookupSchema": "study",
"typeURI": "http://www.w3.org/2001/XMLSchema#string"
}
],
// array of domain property objects for the run domain
"<assay name> Run Fields": [{
"typeName": "Double",
"formatString": null,
"description": null,
"name": "DoubleRun",
"label": null,
"required": false,
"typeURI": "http://www.w3.org/2001/XMLSchema#double"
}],
// array of domain property objects for the result domain
"<assay name> Result Fields": [
{
"typeName": "String",
"formatString": null,
"description": "The Sample Id",
"name": "SampleId",
"label": "Sample Id",
"required": true,
"typeURI": "http://www.w3.org/2001/XMLSchema#string"
},
{
"typeName": "DateTime",
"formatString": null,
"description": null,
"name": "TimePoint",
"label": null,
"required": true,
"typeURI": "http://www.w3.org/2001/XMLSchema#dateTime"
},
{
"typeName": "Double",
"formatString": null,
"description": null,
"name": "DoubleData",
"label": null,
"required": false,
"typeURI": "http://www.w3.org/2001/XMLSchema#double"
}
]
},
"type": "Simple"
};
LABKEY.page.batch:
The batch object is available as LABKEY.page.batch on the upload.html and batch.html pages. The JavaScript object is an instance of LABKEY.Exp.RunGroup and is shaped like:
LABKEY.page.batch = new LABKEY.Exp.RunGroup({
"id": 8,
"createdBy": <user name>,
"created": "8 Apr 2009 12:53:46 -0700",
"modifiedBy": <user name>,
"name": <name of the batch object>,
"runs": [
// array of LABKEY.Exp.Run objects in the batch. See next section.
],
// map of batch properties
"properties": {
"ParticipantVisitResolver": null,
"TargetStudy": null
},
"comment": null,
"modified": "8 Apr 2009 12:53:46 -0700",
"lsid": "urn:lsid:labkey.com:Experiment.Folder-5:2009-04-08+batch+2"
});
LABKEY.page.run:
The run detail object is available as LABKEY.page.run on the run.html pages. The JavaScript object is an instance of LABKEY.Exp.Run and is shaped like:
LABKEY.page.run = new LABKEY.Exp.Run({
"id": 4,
// array of LABKEY.Exp.Data objects added to the run
"dataInputs": [{
"id": 4,
"created": "8 Apr 2009 12:53:46 -0700",
"name": "run01.tsv",
"dataFileURL": "file:/C:/Temp/assaydata/run01.tsv",
"modified": null,
"lsid": <filled in by the server>
}],
// array of objects, one for each row in the result domain
"dataRows": [
{
"DoubleData": 3.2,
"SampleId": "Monkey 1",
"TimePoint": "1 Nov 2008 11:22:33 -0700"
},
{
"DoubleData": 2.2,
"SampleId": "Monkey 2",
"TimePoint": "1 Nov 2008 14:00:01 -0700"
},
{
"DoubleData": 1.2,
"SampleId": "Monkey 3",
"TimePoint": "1 Nov 2008 14:00:01 -0700"
},
{
"DoubleData": 1.2,
"SampleId": "Monkey 4",
"TimePoint": "1 Nov 2008 00:00:00 -0700"
}
],
"createdBy": <user name>,
"created": "8 Apr 2009 12:53:47 -0700",
"modifiedBy": <user name>,
"name": <name of the run>,
// map of run properties
"properties": {"DoubleRun": null},
"comment": null,
"modified": "8 Apr 2009 12:53:47 -0700",
"lsid": "urn:lsid:labkey.com:SimpleRun.Folder-5:cf1fea1d-06a3-102c-8680-2dc22b3b435f"
});
LABKEY.page.result:
The result detail object is available as LABKEY.page.result on the result.html page. The JavaScript object is a map for a single row and is shaped like:
LABKEY.page.result = {
"DoubleData": 3.2,
"SampleId": "Monkey 1",
"TimePoint": "1 Nov 2008 11:22:33 -0700"
};
You can associate query metadata with an individual assay design, or all assay designs that are based on the same type of assay (e.g., "NAb" or "Viability").
Example. Assay table names are based upon the name of the assay design. For example, consider an assay design named "Example" that is based on the "Viability" assay type. This design would be associated with three tables in the schema explorer: "Example Batches", "Example Runs", and "Example Data."
Associate metadata with a single assay design. To attach query metadata to the "Example Data" table, you would normally create a /queries/assay/Example Data.query.xml metadata file. This would work well for the "Example Data" table itself. However, this method would not allow you to re-use this metadata file for a new assay design that is also based on the same assay type ("Viability" in this case).
Associate metadata with all assay designs based on a particular assay type. To permit re-use of the metadata, you need to create a query metadata file whose name is based upon the assay type and table name. To continue our example, you would create a query metadata file callled /assay/Viability/queries/Data.query.xml to attach query metadata to all data tables based on the Viability-type assay.
As with other query metadata in module files, the module must be activated (in other words, the appropriate checkbox must be checked) in the folder's settings.
See Modules: Queries, Views and Reports and Modules: Query Metadata for more information on query metadata.
The AssaySaveHandler interface enables file-based assays to extend the functionality of the SaveAssayBatch action with Java code. A file-based assay can provide an implementation of this interface by creating a Java-based module and then putting the class under the module's src directory. This class can then be referenced by name in the <saveHandler/> element in the assay's config file. For example, an entry might look like:
<saveHandler>org.labkey.icemr.assay.tracking.TrackingSaveHandler</saveHandler>.
To implement this functionality:
<ap:provider xmlns:ap="http://labkey.org/study/assay/xml">
<ap:name>Flask Tracking</ap:name>
<ap:description>
Enables entry of a set of initial samples and then tracks
their progress over time via a series of daily measurements.
</ap:description>
<ap:saveHandler>TrackingSaveHandler</ap:saveHandler>
<ap:fieldKeys>
<ap:participantId>Run/PatientId</ap:participantId>
<ap:date>MeasurementDate</ap:date>
</ap:fieldKeys>
</ap:provider>
The SaveAssayBatch function creates a new instance of the SaveHandler for each request. SaveAssayBatch will dispatch to the methods of this interface according to the format of the JSON Experiment Batch (or run group) sent to it by the client. If a client chooses to implement this interface directly then the order of method calls will be:
Some options:
1) Manually import a list archive into the target folder.
2) Add the tables via SQL scripts included in the module. To insert data: use SQL DML scripts or create an initialize.html view that populates the table using LABKEY.Query.insertRows().
To add the supporting table using SQL scripts, add a schemas directory, as a sibling to the assay directory, as shown below.
exampleassay
├───assay
│ └───example
│ │ config.xml
│ │
│ ├───domains
│ │ batch.xml
│ │ result.xml
│ │ run.xml
│ │
│ └───views
│ upload.html
│
└───schemas
│ SCHEMA_NAME.xml
│
└───dbscripts
├───postgresql
│ SCHEMA_NAME-X.XX-Y.YY.sql
└───sqlserver
SCHEMA_NAME-X.XX-Y.YY.sql
To support only one database, include a script only for that database, and configure your module properties accordingly -- see "SupportedDatabases" in Module Properties Reference.
LabKey Server does not currently support adding assay types or lists via SQL scripts, but you can create a new schema to hold the table, for example, the following script creates a new schema called "myreagents" (on PostgreSQL):
DROP SCHEMA IF EXISTS myreagents CASCADE;
CREATE SCHEMA myreagents;
CREATE TABLE myreagents.Reagents
(
RowId SERIAL NOT NULL,
ReagentName VARCHAR(30) NOT NULL
);
ALTER TABLE ONLY myreagents.Reagents
ADD CONSTRAINT Reagents_pkey PRIMARY KEY (RowId);
INSERT INTO myreagents.Reagents (ReagentName) VALUES ('Acetic Acid');
INSERT INTO myreagents.Reagents (ReagentName) VALUES ('Baeyers Reagent');
INSERT INTO myreagents.Reagents (ReagentName) VALUES ('Carbon Disulfide');
Update the assay domain, adding a lookup/foreign key property to the Reagents table:
<exp:PropertyDescriptor>
<exp:Name>Reagent</exp:Name>
<exp:Required>false</exp:Required>
<exp:RangeURI>http://www.w3.org/2001/XMLSchema#int</exp:RangeURI>
<exp:Label>Reagent</exp:Label>
<exp:FK>
<exp:Schema>myreagents</exp:Schema>
<exp:Query>Reagents</exp:Query>
</exp:FK>
</exp:PropertyDescriptor>
If you'd like to allow admins to add/remove fields from the table, you can add an LSID column to your table and make it a foreign key to the exp.Object.ObjectUri column in the schema.xml file. This will allow you to define a domain for the table much like a list. The domain is per-folder so different containers may have different sets of fields.
For example, see customModules/reagent/resources/schemas/reagent.xml. It wires up the LSID lookup to the exp.Object.ObjectUri column
<ns:column columnName="Lsid">
<ns:datatype>lsidtype</ns:datatype>
<ns:isReadOnly>true</ns:isReadOnly>
<ns:isHidden>true</ns:isHidden>
<ns:isUserEditable>false</ns:isUserEditable>
<ns:isUnselectable>true</ns:isUnselectable>
<ns:fk>
<ns:fkColumnName>ObjectUri</ns:fkColumnName>
<ns:fkTable>Object</ns:fkTable>
<ns:fkDbSchema>exp</ns:fkDbSchema>
</ns:fk>
</ns:column>
...and adds an "Edit Fields" button that opens the domain editor.
function editDomain(queryName)
{
var url = LABKEY.ActionURL.buildURL("property", "editDomain", null, {
domainKind: "ExtensibleTable",
createOrEdit: true,
schemaName: "myreagents",
queryName: queryName
});
window.location = url;
}
Any scripting language that can be invoked via the command line and has the ability to read/write files is supported, including:
Each assay design can be associated with one or more validation or transformation scripts which are run in the order specified. The script file extension (.r, .pl, etc) identifies the script engine that will be used to run the transform script. For example: a script named test.pl will be run with the Perl scripting engine. Before you can run validation or transformation scripts, you must configure the necessary Scripting Engines.
To specify a transform script in an assay design, you enter the full path including the file extension.
When you import (or re-import) run data using this assay design, the script will be executed. When you are developing or debugging transform scripts, you can use the Save Script Data option to store the files generated by the server that are passed to the script. Once your script is working properly, uncheck this box to avoid unnecessarily cluttering your disk.
A few notes on usage:
The general purpose assay tutorial includes another example use of a transformation script in Set up a Data Transformation Script.
Transformation and validation scripts are invoked in the following sequence:
Information on run properties can be passed to a transform script in two ways. You can put a substitution token into your script to identify the run properties file, or you can configure your scripting engine to pass the file path as a command line argument. See Transformation Script Substitution Syntax for a list of available substitution tokens.
For example, using perl:
Option #1: Put a substitution token (${runInfo}) into your script and the server will replace it with the path to the run properties file. Here's a snippet of a perl script that uses this method:
# Open the run properties file. Run or upload set properties are not used by
# this script. We are only interested in the file paths for the run data and
# the error file.
open my $reportProps, '${runInfo}';
Option #2: Configure your scripting engine definition so that the file path is passed as a command line argument:
Before you can develop or run validation or transform scripts, configure the necessary Scripting Engines. You only need to set up a scripting engine once per type of script. You will need a copy of Perl running on your machine to set up the engine.
Create a new empty .pl file in the development location of your choice and include it in your assay design.
To assist in writing your transform script, you will next obtain sample "runData.tsv" and "runProperties.tsv" files showing the state of your data import 'before' the transform script would be applied. To generate useful test data, you need to import a data run using the new assay design.
LabKeyDemoFiles/Assays/Generic/GenericAssay_Run4.xls
Date VisitID ParticipantID M3 M2 M1 SpecimenID
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
12/17/2013 1234 demo value 1234 1234 1234 demo value
Typically transform and validation script data files are deleted on script completion. For debug purposes, it can be helpful to be able to view the files generated by the server that are passed to the script. When the Save Script Data checkbox is checked, files will be saved to a subfolder named: "TransformAndValidationFiles", in the same folder as the original script. Beneath that folder are subfolders for the AssayId, and below that a numbered directory for each run. In that nested subdirectory you will find a new "runDataFile.tsv" that will contain values from the run file plugged into the current fields.
participantid Date M1 M2 M3
249318596 2008-06-07 00:00 435 1111 15.0
249320107 2008-06-06 00:00 456 2222 13.0
249320107 2008-03-16 00:00 342 3333 15.0
249320489 2008-06-30 00:00 222 4444 14.0
249320897 2008-05-04 00:00 543 5555 32.0
249325717 2008-05-27 00:00 676 6666 12.0
The runData.tsv file gives you the basic fields layout. Decide how you need to modify the default data. For example, perhaps for our project we need an adjusted version of the value in the M1 field - we want the doubled value available as an integer.
Now you have the information you need to write and refine your transformation script. Open the empty script file and paste the contents of the Modify Run Data box from this page: Example Transformation Scripts (perl).
Re-import the same run using the transform script you have defined.
The results now show the new field populated with the Adjusted M1 value.
Until the results are as desired, you will edit the script and use Reimport Run to retry.
Once your transformation script is working properly, re-edit the assay design one more time to uncheck the Save Script Data box - otherwise your script will continue to generate artifacts with every run and could eventually fill your disk.
If your script has errors that prevent import of the run, you will see red text in the Run Properties window. If you fail to select the correct data file, for example:
If you have a type mismatch error between your script results and the defined destination field, you will see an error like:
If the validation script needs to report an error that is displayed by the server, it adds error records to an error file. The location of the error file is specified as a property entry in the run properties file. The error file is in a tab-delimited format with three columns:
type | property | message |
---|---|---|
error | runDataFile | A duplicate PTID was found : 669345900 |
error | assayId | The assay ID is in an invalid format |
This script is used in the Example Workflow: Develop a Transformation Script (perl) and populates a new field with data derived from an existing field in the run.
#!/usr/local/bin/perl
use strict;
use warnings;
# Open the run properties file. Run or upload set properties are not used by
# this script. We are only interested in the file paths for the run data and
# the error file.
open my $reportProps, '${runInfo}';
my $transformFileName = "unknown";
my $dataFileName = "unknown";
my %transformFiles;
# Parse the data file properties from reportProps and save the transformed data location
# in a map. It's possible for an assay to have more than one transform data file, although
# most will only have a single one.
while (my $line=<$reportProps>)
{
chomp($line);
my @row = split(/\t/, $line);
if ($row[0] eq 'runDataFile')
{
$dataFileName = $row[1];
# transformed data location is stored in column 4
$transformFiles= $row[3];
}
}
my $key;
my $value;
my $adjustM1 = 0;
# Read each line from the uploaded data file and insert new data (double the value in the M1 field)
# into an additional column named 'Adjusted M1'. The additional column must already exist in the assay
# definition and be of the correct type.
while (($key, $value) = each(%transformFiles)) {
open my $dataFile, $key or die "Can't open '$key': $!";
open my $transformFile, '>', $value or die "Can't open '$value': $!";
my $line=<$dataFile>;
chomp($line);
$line =~ s/\r*//g;
print $transformFile $line, "\t", "Adjusted M1", "\n";
while (my $line=<$dataFile>)
{
$adjustM1 = substr($line, 27, 3) * 2;
chomp($line);
$line =~ s/\r*//g;
print $transformFile $line, "\t", $adjustM1, "\n";
}
close $dataFile;
close $transformFile;
}
You can also define a transform script that modifies the run properties, as show in this example which parses the short filename out of the full path:
#!/usr/local/bin/perl
use strict;
use warnings;
# open the run properties file, run or upload set properties are not used by
# this script, we are only interested in the file paths for the run data and
# the error file.
open my $reportProps, $ARGV[0];
my $transformFileName = "unknown";
my $uploadedFile = "unknown";
while (my $line=<$reportProps>)
{
chomp($line);
my @row = split(/\t/, $line);
if ($row[0] eq 'transformedRunPropertiesFile')
{
$transformFileName = $row[1];
}
if ($row[0] eq 'runDataUploadedFile')
{
$uploadedFile = $row[1];
}
}
if ($transformFileName eq 'unknown')
{
die "Unable to find the transformed run properties data file";
}
open my $transformFile, '>', $transformFileName or die "Can't open '$transformFileName': $!";
#parse out just the filename portion
my $i = rindex($uploadedFile, "\\") + 1;
my $j = index($uploadedFile, ".xls");
#add a value for fileID
print $transformFile "FileID", "\t", substr($uploadedFile, $i, $j-$i), "\n";
close $transformFile;
Users importing instrument-generated tabular datasets into LabKey Server may run into the following difficulties:
First we review the way to hookup a transform script to an assay and the communications mechanisms between the assay framework and a transform script in R.
Transform scripts are designated as part of a assay by providing a fully qualified path to the script file in the field named at the top of the assay instance definition. A convenient location to put the script file is to upload it using a File web part defined in the same folder as the assay definition. Then the fully qualified path to the script file is the concatenation of the file root for the folder (for example, "C:\lktrunk\build\deploy\files\MyAssayFolderName\@files\", as determined by the Files page in the Admin console) plus the file path to the script file as seen in the File web part (for example, "scripts\LoadData.R". For the file path, LabKey Server accepts the use of either backslashes (the default Windows format) or forward slashes.
When working on your own developer workstation, you can put the script file wherever you like, but putting it within the scope of the File manager will make it easier to deploy to a server. It also makes iterative development against a remote server easier, since you can use a Web-DAV enabled file editor to directly edit the same file that the server is calling.
If your transform script calls other script files to do its work, the normal way to pull in the source code is using the source statement, for example
source("C:\lktrunk\build\deploy\files\MyAssayFolderName\@files\Utils.R")
But to keep the scripts so that they are easily moved to other servers, it is better to keep the script files together and the built-in substitution token "${srcDirectory}" which the server automatically fills in to be the directory where the called script file is located , for example:
source("${srcDirectory}/Utils.R");
The primary mechanism for communication between the LabKey Assay framework and the Transform script is the Run Properties file. Again a substitution token tells the script code where to find this file. The script file should contain a line like
rpPath<- "${runInfo}"
When the script is invoked by the assay framework, the rpPath variable will contain the fully qualified path to the run properties file.
The run properties file contains three categories of properties:
1. Batch and run properties as defined by the user when creating an assay instance. These properties are of the format: <property name> <property value> <java data type>
for example,
gDarkStdDev 1.98223 java.lang.Double
When the transform script is called these properties will contain any values that the user has typed into the corresponding text box under the “Batch Properties” of “Run Properties” sections of the upload form. The transform script can assign or modify these properties based on calculations or by reading them from the raw data file from the instrument. The script must then write the modified properties file to the location specified by the transformedRunPropertiesFile property (see #3 below)
2. Context properties of the assay such as assayName, runComments, and containerPath. These are recorded in the same format as the user-defined batch and run properties, but they cannot be overwritten by the script.
3. Paths to input and output files. These are fully qualified paths that the script reads from or writes to. They are in a <property name> <property value> format without property types. The paths currently used are:
C:\lktrunk\build\deploy\files\transforms\@files\scripts\TransformAndValidationFiles\AssayId_22\42\runDataFile.tsv
The transform script developer can choose to use either the runDataFile or the runDataUploadedFile as its input. The runDataFile would be the right choice for an Excel-format raw file and a script that fills in additional columns of the data set. By using the runDataFile, the assay framework does the Excel-to-TSV conversion and the script doesn’t need to know how to parse Excel files. The runDataUploadedFile would be the right choice for a raw file in TSV format that the script is going to reformat by turning columns into rows. In either case, the script writes its output to the AssayRunTSVData file.
There are two useful options presented as checkboxes in the Assay designer.
Sometimes a transform script needs to connect back to the server to do its job. One example is translating lookup display values into key values. The Rlabkey library available on CRAN has the functions needed to connect to, query, and insert or update data in the local LabKey Server where it is running. To give the connection the right security context (the current user’s), the assay framework provides the substitution token ${rLabkeySessionId}. Including this token on a line by itself near the beginning of the transform script eliminates the need to use a config file to hold a username and password for this loopback connection. It will be replaced with two lines that looks like:
labkey.sessionCookieName = "JSESSIONID" labkey.sessionCookieContents = "TOMCAT_SESSION_ID"
where TOMCAT_SESSION_ID is the actual ID of the user's HTTP session.
You can load an R transform script into the R console/debugger and run the script with debug(<functionname> commands active. Since the substitution tokens described above ( ${srcDirectory} , ${runInfo}, and ${rLabkeySessionId} ) are necessary to the correct operation of the script, the framework conveniently writes out a version of the script with these substitutions made, into the same subdiretory as the runProperties.tsv file is found. Load this modified version of the script into the R console.
Setup
This transform script example handles the data output from an Affymetrics microarray reader. The data file contains 64 lines of metadata before the chip-level intensity data. The metadata describes the platform, the experiment, and the samples used. The spot-level data is organized with one column per sample, which may be efficient for storage in a spreadsheet but isn’t good for querying in a database.
The transform script does the following tasks:
TransformScriptExample.R
options(stringsAsFactors = FALSE)
source("${srcDirectory}/ExampleUtils.R")
baseUrl<-"http://localhost:8080/labkey"
${rLabkeySessionId}
rpPath<- "${runInfo}"
## read the file paths etc out of the runProperties.tsv file
params <- getRunPropsList(rpPath, baseUrl)
## read the input data frame just to get the column headers.
inputDF<-read.table(file=params$inputPathUploadedFile, header = TRUE,
sep = "\t", quote = "\"",
fill=TRUE, stringsAsFactors = FALSE, check.names=FALSE,
row.names=NULL, skip=(params$loaderColNamesRow -1), nrows=1)
cols<-colnames(inputDF)
## create a Name to RowId map for samples
keywords <- as.vector(colnames(inputDF)[-1])
queryName=params$sampleSetName
keywordMap<- getLookupMap( keywords, baseUrl=baseUrl, folderPath=params$containerPath,
schemaName="Samples", queryName=queryName, keyField="rowId",
displayField="SampleId")
doRunLoad(params=params, inputColNames=cols, outputColNames=c( "ID_REF", "sample", "val"),
lookupMap=keywordMap)
ExampleUtils.R, function getRunPropsList()
getRunPropsList<- function(rpPath, baseUrl)
{
rpIn<- read.table(rpPath, col.names=c("name", "val1", "val2", "val3"), #########
header=FALSE, check.names=FALSE, ## 1 ##
stringsAsFactors=FALSE, sep="\t", quote="", fill=TRUE, na.strings=""); #########
## pull out the run properties
params<- list(inputPathUploadedFile = rpIn$val1[rpIn$name=="runDataUploadedFile"],
inputPathValidated = rpIn$val1[rpIn$name=="runDataFile"],
##a little strange. AssayRunTSVData is the one we need to output to
outputPath = rpIn$val3[rpIn$name=="runDataFile"],
containerPath = rpIn$val1[rpIn$name=="containerPath"],
runPropsOutputPath = rpIn$val1[rpIn$name=="transformedRunPropertiesFile"],
sampleSetId = as.integer(rpIn$val1[rpIn$name=="sampleSet"]),
probeSourceId = as.integer(rpIn$val1[rpIn$name=="probeSource"]),
errorsFile = rpIn$val1[rpIn$name=="errorsFile"])
## lookup the name of the sample set based on its number
if (length(params$sampleSetId)>0)
{
df<-labkey.selectRows(baseUrl=baseUrl,
folderPath=params$containerPath, schemaName="exp", queryName="SampleSets",
colFilter=makeFilter(c("rowid", "EQUALS", params$sampleSetId)))
params<- c(params, list(sampleSetName=df$Name))
}
## This script reformats the rows in batches of 1000 in order to reduce
## the memory requirements of the R calculations
params<-c(params, list(loaderBatchSize=as.integer(1000)))
## From the probesource lookup table, get the prefix characters that
## identify property value comment lines in the data file, and the starting
## line number of the spot data table within the data file
dfProbeSource=labkey.selectRows(baseUrl=baseUrl, folderPath=params$containerPath, #########
schemaName="lists", queryName="probesources", ## 2 ##
colFilter=makeFilter(c("probesourceid", "EQUALS", params$probeSourceId))) #########
params<-c(params, list(propertyPrefix=dfProbeSource$propertyPrefix,
loaderColNamesRow=dfProbeSource$loaderColNamesRow))
if (is.null(params$loaderColNamesRow) | is.na(params$loaderColNamesRow))
{
params$loaderColNamesRow <- 1
}
## now apply the run property values reported in the header
## of the data tsv file to the corresponding run properties
conInput = file(params$inputPathUploadedFile, "r")
line<-""
pfx <- as.integer(0)
fHasProps <- as.logical(FALSE)
if (!is.na(params$propertyPrefix))
{ #########
pfx<-nchar(params$propertyPrefix) ## 3 ##
} #########
while(pfx>0)
{
line<-readLines(conInput, 1)
if (nchar(line)<=pfx) {break}
if (substring(line, 1, pfx) != params$propertyPrefix) {break}
strArray=strsplit(substring(line, pfx+1, nchar(line)) ,"\t", fixed=TRUE)
prop<- strArray[[1]][1]
val<- strArray[[1]][2]
if (length(rpIn$name[rpIn$name==prop]) > 0 )
{
## dealing with dates is sometimes tricky. You want the value pushed to rpIn
## to be a string representing a date but in the default date format This data
## file uses a non-defualt date format that we explicitly convert to date using
## as.Date and a format string.
## Then convert it back to character using the default format.
if (rpIn$val2[rpIn$name==prop]=="java.util.Date")
{
val<-as.character(as.Date(val, "%b%d%y"))
}
rpIn$val1[rpIn$name==prop]<-val
fHasProps <- TRUE
}
}
if (fHasProps)
{
## write out the transformed run properties to the file that
## the assay framework will read in
write.table(rpIn, file=params$runPropsOutputPath, sep="\t", quote=FALSE
, na="" , row.names=FALSE, col.names=FALSE, append=FALSE)
}
return (params)
}
getLookupMap()
getLookupMap<- function(uniqueLookupValues, baseUrl, folderPath, schemaName,
queryName, keyField, displayField, otherColName=NULL, otherColValue=NULL)
{
inClauseVals = paste(uniqueLookupValues, collapse=";") #########
colfilt<-makeFilter(c(displayField, "EQUALS_ONE_OF", inClauseVals)) ## 4 ##
if (!is.null(otherColName)) #########
{
otherFilter=makeFilter(c(otherColName, "EQUALS", otherColValue))
colfilt = c(colfilt, otherFilter)
}
colsel<- paste(keyField, displayField, sep=",")
lookupMap <-labkey.selectRows(baseUrl=baseUrl, folderPath=folderPath,
schemaName=schemaName, queryName=queryName,
colSelect=colsel, colFilter=colfilt, showHidden=TRUE)
newLookups<- uniqueLookupValues[!(uniqueLookupValues %in% lookupMap[,2])]
if (length(newLookups)>0 && !is.na(newLookups[1]) )
{
## insert the lookup values that we haven't already seen before
newLookupsToInsert<- data.frame(lookupValue=newLookups, stringsAsFactors=FALSE)
colnames(newLookupsToInsert)<- displayField
if (!is.null(otherColName))
{
newLookupsToInsert<-cbind(newLookupsToInsert, otherColValue)
colnames(newLookupsToInsert)<- c(displayField, otherColName)
}
result<- labkey.insertRows(baseUrl=baseUrl, folderPath=folderPath,
schemaName=schemaName, queryName=queryName, toInsert= newLookupsToInsert)
lookupMap <-labkey.selectRows(baseUrl=baseUrl, folderPath=folderPath,
schemaName=schemaName, queryName=queryName,
colSelect=colsel, colFilter=colfilt, showHidden=TRUE)
}
colnames(lookupMap)<- c("RowId", "Name")
return(lookupMap)
}
doRunLoad()
doRunLoad<-function(params, inputColNames, outputColNames, lookupMap)
{
folder=params$containerPath
unlink(params$outputPath)
cIn <- file(params$inputPathUploadedFile, "r")
cOut<- file(params$outputPath , "w")
## write the column headers to the output file
headerDF<-data.frame(matrix(NA, nrow=0, ncol=length(outputColNames)))
colnames(headerDF)<- outputColNames
write.table(headerDF, file=cOut, sep="\t", quote=FALSE, row.names=FALSE, na="",
col.names=TRUE, append=FALSE)
# the fisrt read from the input file skips rows up to and including the header
skipCnt<-params$loaderColNamesRow
## read in chunks of batchSize, which are then transposed and written to the output file. #########
## blkStart is the 1-based index of the starting row of a chunk ## 5 ##
#########
blkStart <- skipCnt + 1
rowsToRead <- params$loaderBatchSize
while(rowsToRead > 0)
{
inputDF <- read.table(file=cIn, header = FALSE, sep = "\t", quote = "\"",
na.strings = "---", fill=TRUE, row.names=NULL,
stringsAsFactors = FALSE, check.names=FALSE,
col.names=inputColNames ,skip=skipCnt, nrows=rowsToRead)
cols<-colnames(inputDF)
if(NROW(inputDF) >0)
{
idVarName<-inputColNames[1]
df1 <- reshape(inputDF, direction="long", idvar=idVarName,,
v.names="Val",timevar="Name"
,times=cols[-1], varying=list(cols[-1]) ) #########
## 6 ##
df2<- merge(df1, lookupMap) #########
reshapedRows<- data.frame(cbind(df2[,idVarName], df2[,"RowId"],
df2[,"Val"], params$probeSourceId ), stringsAsFactors=FALSE)
reshapedRows[,2] <- as.integer(reshapedRows[,2])
reshapedRows[,4] <- as.integer(reshapedRows[,4])
nonEmptyRows<- !is.na(reshapedRows[,3])
reshapedRows<-reshapedRows[nonEmptyRows ,]
reshapedRows<- reshapedRows[ do.call(order, reshapedRows[1:2]), ]
colnames(reshapedRows)<- outputColNames
## need to double up the single quotes in the data
reshapedRows[,3]<-gsub("'", "''", reshapedRows[,3],fixed=TRUE)
write.table(reshapedRows, file=cOut, sep="\t", quote=TRUE, na="" ,
row.names=FALSE, col.names=FALSE, append=TRUE)
df1<-NULL
df2<-NULL
reshapedRows<-NULL
recordsToInsert<-NULL
}
if (NROW(inputDF)< rowsToRead)
{
##we've hit the end of the file, no more to read
rowsToRead <- 0
}
else
{
## now look where the next block will start, and read up to the end row
blkStart <- blkStart + rowsToRead
}
## skip rows only on the first read
skipCnt<-0
}
inputDF<-NULL
close(cIn)
close(cOut)
}
LabKey Server supports transformation scripts for assay data at upload time. This feature is primarily targeted for Perl or R scripts; however, the framework is general enough that any application that can be externally invoked can be run as well, including a Java program.
Java appeals to programmers who desire a stronger-typed language than most script-based languages. Most important, using a Java-based validator allows a developer to leverage the remote client API and take advantage of the classes available for assays, queries, and security.
This page outlines the steps required to configure and create a Java-based transform script. The ProgrammaticQCTest script, available in the BVT test, provides an example of a script that uses the remote client API.
In order to use a Java-based validation script, you will need to configure an external script engine to bind a file with the .jar extension to an engine implementation.
To do this:
The program command configured above will invoke the java.exe application against a .jar file passing in the run properties file location as an argument to the java program. The run properties file contains information about the assay properties including the uploaded data and the location of the error file used to convey errors back to the server. Specific details about this file are contained in the data exchange specification for Programmatic QC.
The implementation of your java validator class must contain an entry point matching the following function signature:
public static void main(String[] args)
The location of the run properties file will be passed from the script engine configuration (described above) into your program as the first element of the args array.
The following code provides an example of a simple class that implements the entry point and handles any arguments passed in:
public class AssayValidator
{
private String _email;
private String _password;
private File _errorFile;
private Map<String, String> _runProperties;
private List<String> _errors = new ArrayList<String>();
private static final String HOST_NAME = "http://localhost:8080/labkey";
private static final String HOST = "localhost:8080";
public static void main(String[] args)
{
if (args.length != 1)
throw new IllegalArgumentException("Input data file not passed in");
File runProperties = new File(args[0]);
if (runProperties.exists())
{
AssayValidator qc = new AssayValidator();
qc.runQC(runProperties);
}
else
throw new IllegalArgumentException("Input data file does not exist");
}
Next, compile and jar your class files, including any dependencies your program may have. This will save you from having to add a classpath parameter in your engine command. Make sure that a ‘Main-Class’ attribute is added to your jar file manifest. This attribute points to the class that implements your program entry point.
Most of the remote APIs require login information in order to establish a connection to the server. Credentials can be hard-coded into your validation script or passed in on the command line. Alternatively, a .netrc file can be used to hold the credentials necessary to login to the server. For further information, see: Create a .netrc or _netrc file.
The following sample code can be used to extract credentials from a .netrc file:
private void setCredentials(String host) throws IOException
{
NetrcFileParser parser = new NetrcFileParser();
NetrcFileParser.NetrcEntry entry = parser.getEntry(host);
if (null != entry)
{
_email = entry.getLogin();
_password = entry.getPassword();
}
}
Finally, the QC validator must be attached to an assay. To do this, you will need to editing the assay design and specify the absolute location of the .jar file you have created. The engine created earlier will bind the .jar extension to the java.exe command you have configured.
<assay>
|_domains
|_views
|_scripts
|_config.xml
The scripts directory contains one or more script files; e.g., "validation.pl".
The order of script invocation can be specified in the config.xml file. See the <transformScripts> element. If scripts are not listed in the config.xml file, they will be executed in alphabetical order based on file name.
A script engine must be defined for the appropriate type of script (for the example script named above, this would be a Perl engine). The rules for defining a script engine for module-based assays are the same as they are for Java-based assays.
When a new assay instance is created, you will notice that the script appears in the assay designer, but it is read-only (the path cannot be changed or removed). Just as for Java-defined assays, you will still see an additional text box where you can specify one or more additional scripts.
There are standard default assay properties which apply to most assay types, as well as additional properties specific to the assay type. For example, NAb, Luminex, and ELISpot assays can include specimen, analyte, and antigen properties which correspond to locations on a plate associated with the assay instance.
The runProperties.tsv file also contains additional context information that the validation script might need, such as username, container path, assay instance name, assay id. Since the uploaded assay data will be written out to a file in TSV format, the runProperties.tsv also specifies the destination file's location.
The runProperties file has three (or four) tab-delimited columns in the following order:
Property Name | Data Type | Property Description |
---|---|---|
assayId | String | The value entered in the Assay Id field of the run properties section. |
assayName | String | The name of the assay design given when the new assay design was created. |
assayType | String | The type of this assay design. (GenericAssay, Luminex, Microarray, etc.) |
baseUrl | URL String | For example, http://localhost:8080/labkey |
containerPath | String | The container location of the assay. (for example, /home/AssayTutorial) |
errorsFile | Full Path | The full path to a .tsv file where any validation errors are written. See details below. |
originalFileLocation | Full Path | The full path to the original location of the file being imported as an assay. |
protocolDescription | String | The description of the assay definition when the new assay design was created. |
protocolId | String | The ID of this assay definition. |
protocolLsid | String | The assay definition LSID. |
runComments | String | The value entered into the Comments field of the run properties section. |
runDataUploadedFile | Full Path | The original data file that was selected by the user and uploaded to the server as part of an import process. This can be an Excel file, a tab-separated text file, or a comma-separated text file. |
runDataFile | Full Path | The imported data file after the assay framework has attempted to convert the file to .tsv format and match its columns to the assay data result set definition. |
transformedRunPropertiesFile | Full Path | File where the script writes out the updated values of batch- and run-level properties that are listed in the runProperties file. |
userName | String | The user who created the assay design. |
workingDir | String | The temp location that this script is executed in. (e.g. C:\AssayId_209\39\) |
Validation errors can be written to a TSV file as specified by full path with the errorsFile property. This output file is formatted with three columns:
Property Name | Data Type | Property Description |
---|---|---|
sampleData | String | The path to a file that contains sample data written in a tab-delimited format. The file will contain all of the columns from the sample group section of the assay design. A wellgroup column will be written that corresponds to the well group name in the plate template associated with this assay instance. A row of data will be written for each well position in the plate template. |
antigenData | String | The path to a file that contains antigen data written in a tab-delimited format. The file contains all of the columns from the antigen group section of the assay design. A wellgroup column corresponds to the well group name in the plate template associated with this assay instance. A row of data is written for each well position in the plate template. |
Property Name | Data Type | Property Description |
---|---|---|
Derivative | String | |
Additive | String | |
SpecimenType | String | |
DateModified | Date | |
ReplacesPreviousFile | Boolean | |
TestDate | Date | |
Conjugate | String | |
Isotype | String |
Property Name | Data Type | Property Description |
---|---|---|
sampleData | String | The path to a file that contains sample data written in a tab-delimited format. The file contains all of the columns from the sample group section of the assay design. A wellgroup column corresponds to the well group name in the plate template associated with this assay instance. A row of data is written for each well position in the plate template. |
Property Name | Data Type | Property Description |
---|---|---|
severityLevel (reserved) | String | This is a property name used internally for error and warning handling. Do not define your own property with the same name in a GPAT assay. |
maximumSeverity (reserved) | String | This is a property name reserved for use in error and warning handling. Do not define your own property with the same name in a GPAT assay. See Warnings in Transformation Scripts for details. |
Script Syntax | Description | Substitution Value |
---|---|---|
${runInfo} | File containing metadata about the run | Full path to the file on the local file system |
${srcDirectory} | Directory in which the script file is located | Full path to parent directory of the script |
${rLabkeySessionId} | Information about the current user's HTTP session | labkey.sessionCookieName = "COOKIE_NAME" labkey.sessionCookieContents = "USER_SESSION_ID" Note that this is multi-line. The cookie name is typically JSESSIONID, but is not in all cases. |
${httpSessionId} | The current user's HTTP session ID | The string value of the session identifier, which can be used for authentication when calling back to the server for additional information |
${sessionCookieName} | The name of the session cookie | The string value of the cookie name, which can be used for authentication when calling back to the server for additional information. |
${baseServerURL} | The server's base URL and context path | The string of the base URL and context path. (ex. "http://localhost:8080/labkey") |
${containerPath} | The current container path | The string of the current container path. (ex. "/ProjectA/SubfolderB") |
To raise a warning from within your transformation script, set maximumSeverity to WARN within the transformedRunProperties file. To report an error, set maximumSeverity to ERROR. To display a specific message with either a warning or error, write the message to errors.html in the current directory. For example, this snippet from an R transformation script defines a warning and error handler:
# writes the maximumSeverity level to the transformRunProperties file and the error/warning message to the error.html file.
# LK server will read these files after execution to determine if an error or warning occurred and handle it appropriately
handleErrorsAndWarnings <- function()
{
if(run.error.level > 0)
{
fileConn<-file(trans.output.file);
if(run.error.level == 1)
{
writeLines(c(paste("maximumSeverity","WARN",sep="t")), fileConn);
}
else
{
writeLines(c(paste("maximumSeverity","ERROR",sep="t")), fileConn);
}
close(fileConn);
# This file gets read and displayed directly as warnings or errors, depending on maximumSeverity level.
if(!is.null(run.error.msg))
{
fileConn<-file("errors.html");
writeLines(run.error.msg, fileConn);
close(fileConn);
}
quit();
}
}
Click here to download a sample transformation script including this handler and other configuration required for warning reporting.
When a warning is triggered during assay import, the user will see a screen similar to this with the option to Proceed or Cancel the import after examining the output files:
After examining the output and transformed data files, if the user clicks Proceed the transform script will be rerun and no warnings will be raised the on second pass. Quieting warnings on the approved import is handled using the value of an internal property called severityLevel in the run properties file. Errors will still be raised if necessary.