Transformation Scripts: /home/

Transformation Scripts

As part of validating and cleaning assay data, transformation scripts (written in any language, Perl, R, Java, etc.) can be run at the time of assay data upload. They can inspect an uploaded data file and change the data or populate empty columns in the uploaded data. For example, you can calculate the contents of one column from data contained in other columns. A transformation script can also modify run- and batch-level properties. If validation only needs to be done for particular single field values, the simpler mechanism is to use a validator within the field properties for the column.

Transformation scripts (which are always attached to assay designs) are different from trigger scripts, which are attached to a dataset (database table or query).

Topics

Use Transformation Scripts

Each assay design can be associated with one or more validation or transformation scripts which are run in the order specified. The script file extension (.r, .pl, etc) identifies the script engine that will be used to run the transform script. For example: a script named test.pl will be run with the Perl scripting engine. Before you can run validation or transformation scripts, you must configure the necessary Scripting Engines.

This section describes the process of using a transformation script that has already been developed for your assay type. An example workflow for how to create an assay transformation script in perl can be found in Example Workflow: Develop a Transformation Script (perl).

To specify a transform script in an assay design, you enter the full path including the file extension.

Open the assay designer for a new assay, or edit an existing assay design.
Click Add Script.
Enter the full path to the script in the Transform Scripts field.
You may enter multiple scripts by clicking Add Script again.

Confirm that other Properties required by your assay type are correctly specified.
Click Save and Close.

When you import (or re-import) run data using this assay design, the script will be executed. When you are developing or debugging transform scripts, you can use the Save Script Data option to store the files generated by the server that are passed to the script. Once your script is working properly, uncheck this box to avoid unnecessarily cluttering your disk.

A few notes on usage:

Client API calls are not supported in transform scripts.
Columns populated by transform scripts must already exist in the assay definition.
Executed scripts show up in the experimental graph, providing a record that transformations and/or quality control scripts were run.
Transform scripts are run before field-level validators.
The script is invoked once per run upload.
Multiple scripts are invoked in the order they are listed in the assay design.

Note that non-programmatic quality control remains available -- assay designs can be configured to perform basic checks for data types, required values, regular expressions, and ranges in uploaded data. See the Validators section of the Field Properties topic and Manage Dataset QC States.

The general purpose assay tutorial includes another example use of a transformation script in Set up a Data Transformation Script.

How Transformation Scripts Work

Script Execution Sequence

Transformation and validation scripts are invoked in the following sequence:

A user uploads assay data.
The server creates a runProperties.tsv file and rewrites the uploaded data in TSV format. Assay-specific properties and files from both the run and batch levels are added. See Run Properties Reference for full lists of properties.
The server invokes the transform script by passing it the information created in step 2 (the runProperties.tsv file).
After script completion, the server checks whether any errors have been written by the transform script and whether any data has been transformed.
If transformed data is available, the server uses it for subsequent steps; otherwise, the original data is used.
If multiple transform scripts are specified, the server invokes the other scripts in the order in which they are defined.
Field-level validator/quality-control checks (including range and regular expression validation) are performed. (These field-level checks are defined in the assay definition.)
If no errors have occurred, the run is loaded into the database.

Passing Run Properties to Transformation Scripts

Information on run properties can be passed to a transform script in two ways. You can put a substitution token into your script to identify the run properties file, or you can configure your scripting engine to pass the file path as a command line argument. See Transformation Script Substitution Syntax for a list of available substitution tokens.

For example, using perl:

Option #1: Put a substitution token (${runInfo}) into your script and the server will replace it with the path to the run properties file. Here's a snippet of a perl script that uses this method:

# Open the run properties file. Run or upload set properties are not used by
# this script. We are only interested in the file paths for the run data and
# the error file.

open my $reportProps, '${runInfo}';

Option #2: Configure your scripting engine definition so that the file path is passed as a command line argument:

Go to Admin > Site > Admin Console.
Select the Views and Scripting.
Select and edit the perl engine.
Add ${runInfo} to the Program Command field.

LabKey Support

LabKey Support