Overview

Transformation scripts are attached to assay designs and used to clean and validate assay data, solving a wide variety of challenges. For example:

  • Instrument-generated files often contain header lines before the main data table, denoted by a leading #, !, or other symbol. These lines may contain useful metadata about the protocol, reagents, or samples tested which should either be incorporated into the data import or skipped over to find the main data table.
  • File or data formats might be optimized for display, not for efficient storage and retrieval. Transformation scripts can clean, validate, and reformat imported data.
  • During import, display values from a lookup column may need to be mapped to foreign key values for storage.
  • You may need to fill in additional quality control values with imported assay data, or calculate contents of a new column from columns in the imported data.
Transformation scripts can inspect an uploaded file and change the data or populate empty columns in the uploaded data. They can also modify run- and batch-level properties. If validation only needs to be done for particular single field values, the simpler mechanism is to use a validator within the field properties for the column.

Any scripting language that can be invoked via the command line and has the ability to read/write files is supported for transformation scripts, including:

  • Perl
  • Python
  • R
  • Java
Transformation scripts (which are always attached to assay designs) are different from trigger scripts, which are attached to a dataset (database table or query).

Use Transformation Scripts

Each assay design can be associated with one or more validation or transformation scripts which are run in the order specified.

Note that transformation scripts cannot be run on hosted servers, including hosted trial servers. The interface to add scripts will not be shown for assays on hosted servers.

The script file extension (.r, .pl, etc) identifies the script engine that will be used to run the transform script. For example: a script named test.pl will be run with the Perl scripting engine. Before you can run validation or transformation scripts, you must configure the necessary Scripting Engines.

This section describes the process of using a transformation script that has already been developed for your assay type. An example workflow for how to create an assay transformation script in perl can be found in Example Workflow: Develop a Transformation Script (perl).

Identifying the Path to the Script File

It is convenient to upload the script file to the File Repository in the same folder as the assay design. The absolute path to the script file can be determined by concatenating the file root for the folder (available at (Admin) > Folder > Management > Files tab) plus the path to the script file in the Files web part (for example, "scripts\LoadData.R"). In the file path, LabKey Server accepts either backslashes (the default Windows format) or forward slashes.

When working on your own developer workstation, you can put the script file wherever you like, but putting it within the File Repository will make it easier to deploy to a production server. It also makes iterative development against a remote server easier, since you can use a Web-DAV enabled file editor to directly edit the same script file that the server is calling.

When you decide where to locate your transformation script file, consider that it is convenient to keep script files in the same location as the assay. You will enter the full path when you configure your assay design to run this script. Use the built-in substitution token "${srcDirectory}" which the server automatically fills in to be the directory where the called script file (the one identified in the Transform Scripts field) is located.

Accessing and Using the Run Properties File

The primary mechanism for communication between the LabKey Assay framework and the Transform script is the Run Properties file. Again a substitution token ${runInfo} tells the script code where to find this file. The script file should contain a line like

run.props = labkey.transform.readRunPropertiesFile("${runInfo}");

The run properties file contains three categories of properties:

1. Batch and run properties as defined by the user when creating an assay instance. These properties are of the format: <property name> <property value> <java data type>

for example,

gDarkStdDev 1.98223 java.lang.Double

When the transform script is called these properties will contain any values that the user has typed into the “Batch Properties” and “Run Properties” sections of the upload form. The transform script can assign or modify these properties based on calculations or by reading them from the raw data file from the instrument. The script must then write the modified properties file to the location specified by the transformedRunPropertiesFile property.

2. Context properties of the assay such as assayName, runComments, and containerPath. These are recorded in the same format as the user-defined batch and run properties, but they cannot be overwritten by the script.

3. Paths to input and output files. These are absolute paths that the script reads from or writes to. They are in a <property name> <property value> format without property types. The paths currently used are:

  • a. runDataUploadedFile: the raw data file that was selected by the user and uploaded to the server as part of an import process. This can be an Excel file, a tab-separated text file, or a comma-separated text file.
  • b. runDataFile: the imported data file after the assay framework has attempted to convert the file to .tsv format and match its columns to the assay data result set definition. The path will point to a subfolder below the script file directory, with a path value similar to <property value> <java property type>. The AssayId_22\42 part of the directory path serves to separate the temporary files from multiple executions by multiple scripts in the same folder.
C:\labkey\files\transforms\@files\scripts\TransformAndValidationFiles\AssayId_22\42\runDataFile.tsv
  • c. AssayRunTSVData: This file path is where the result of the transform script will be written. It will point to a unique file name in an “assaydata” directory that the framework creates at the root of the files tree. NOTE: this property is written on the same line as the runDataFile property.
  • d. errorsFile: This path is where a transform or validation script can write out error messages for use in troubleshooting. Not normally needed by an R script because the script usually writes errors to stdout, which are written by the framework to a file named “<scriptname>.Rout”.
  • e. transformedRunPropertiesFile: This path is where the script writes out the updated values of batch- and run-level properties that are listed in the runProperties file.

Choosing the Input File for Transform Script Processing

The transform script developer can choose to use either the runDataFile or the runDataUploadedFile as its input. The runDataFile would be the right choice for an Excel-format raw file and a script that fills in additional columns of the data set. By using the runDataFile, the assay framework does the Excel-to-TSV conversion and the script doesn’t need to know how to parse Excel files. The runDataUploadedFile would be the right choice for a raw file in TSV format that the script is going to reformat by turning columns into rows. In either case, the script writes its output to the AssayRunTSVData file.

Associate the Script with an Assay

To specify a transform script in an assay design, you enter the full path including the file extension in the Transform Script field.

  • Open the assay designer for a new assay, or edit an existing assay design.
  • Click Add Script.
  • Enter the full path to the script in the Transform Scripts field.
  • You may enter multiple scripts by clicking Add Script again. Delete scripts by clicking the X to the right of the row.
  • Confirm that other properties and fields required by your assay are correctly specified.
  • Scroll down and click Save.

When you import (or re-import) run data using this assay design, the script will be executed.

There are two useful options presented as checkboxes in the Assay designer.

  • Save Script Data for Debugging tells the framework to not delete the intermediate files such as the runProperties file after a successful run. This option is important during script development. It can be turned off to avoid cluttering the file space under the TransformAndValidationFiles directory that the framework automatically creates under the script file directory.
  • Import In Background tells the framework to create a pipeline job as part of the import process, rather than tying up the browser session. It is useful for importing large data sets.
A few notes on usage:
  • Client API calls are not supported in transform scripts.
  • Columns populated by transform scripts must already exist in the assay definition.
  • Executed scripts show up in the experimental graph, providing a record that transformations and/or quality control scripts were run.
  • Transform scripts are run before field-level validators.
  • The script is invoked once per run upload.
  • Multiple scripts are invoked in the order they are listed in the assay design.
Note that non-programmatic quality control remains available -- assay designs can be configured to perform basic checks for data types, required values, regular expressions, and ranges in uploaded data. See the Validators section of the Field Properties topic and Dataset QC States - Admin Guide.

The general purpose assay tutorial includes another example use of a transformation script in Set up a Data Transformation Script.

How Transformation Scripts Work

Script Execution Sequence

Transformation and validation scripts are invoked in the following sequence:

  1. A user uploads assay data.
  2. The server creates a runProperties.tsv file and rewrites the uploaded data in TSV format. Assay-specific properties and files from both the run and batch levels are added. See Run Properties Reference for full lists of properties.
  3. The server invokes the transform script by passing it the information created in step 2 (the runProperties.tsv file).
  4. After script completion, the server checks whether any errors have been written by the transform script and whether any data has been transformed.
  5. If transformed data is available, the server uses it for subsequent steps; otherwise, the original data is used.
  6. If multiple transform scripts are specified, the server invokes the other scripts in the order in which they are defined.
  7. Field-level validator/quality-control checks (including range and regular expression validation) are performed. (These field-level checks are defined in the assay definition.)
  8. If no errors have occurred, the run is loaded into the database.

Passing Run Properties to Transformation Scripts

Information on run properties can be passed to a transform script in two ways. You can put a substitution token into your script to identify the run properties file, or you can configure your scripting engine to pass the file path as a command line argument. See Transformation Script Substitution Syntax for a list of available substitution tokens.

For example, using perl:

Option #1: Put a substitution token (${runInfo}) into your script and the server will replace it with the path to the run properties file. Here's a snippet of a perl script that uses this method:

# Open the run properties file. Run or upload set properties are not used by
# this script. We are only interested in the file paths for the run data and
# the error file.

open my $reportProps, '${runInfo}';

Option #2: Configure your scripting engine definition so that the file path is passed as a command line argument:

  • Go to (Admin) > Site > Admin Console.
  • Click Settings.
  • Under Configuration, click Views and Scripting.
  • Select and edit the perl engine.
  • Add ${runInfo} to the Program Command field.

Related Topics


Premium Resources Available

Subscribers to premium editions of LabKey Server can learn more with the example code in these topics:


Learn more about premium editions

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all