Example Workflow: Develop a Transformation Script

2024-03-28

This topic is under construction for the 24.3 (March 2024) release of LabKey Server. For current documentation of this feature, click here.

This topic describes the process for developing a transformation script to transform assay data. You can use transformation scripts to change both run data and run properties as needed. In this example, we use perl, but using any scripting language will follow these general steps.

Script Engine Setup

Before you can develop or run validation or transform scripts, configure the necessary Scripting Engines. You only need to set up a scripting engine once per type of script. You will need a copy of Perl running on your machine to set up the engine for this example.

  • Select (Admin) > Site > Admin Console.
  • Under Configuration, click Views and Scripting.
  • Click Add > New Perl Engine.
  • Fill in as shown, specifying the "pl" extension and full path to the perl executable.
  • Click Submit.

Add Script to Assay Design

Create a new empty .pl file in the development location of your choice and include it in your assay design. This topic uses the folder and simple assay design you would have created while completing the Assay Tutorial.

  • Navigate to the Assay Tutorial folder.
  • Click CellCulture in the Assay List web part.
  • Select Manage Assay Design > Copy assay design.
  • Click Copy to Current Folder.
  • Enter a new name, such as "CellCulture_Transformed".
  • Click Add Script and select or drag and drop to add the new script file you are creating.
  • Check the box for Save Script Data for Debugging.
  • Confirm that the batch, run, and data fields are correct.
  • Scroll down and click Save.

Download Template Files

To assist in writing your transform script, you will next obtain example "runData.tsv" and "runProperties.tsv" files showing the state of your data import 'before' the transform script would be applied. To generate useful example data, you need to import a data run using the new assay design with the "Save Script Data for Debugging" box checked.

  • Open and select the following file in the files web part(if you have already imported this file during the tutorial, you will first need to delete that run):
/CellCulture_001.xls
  • Click Import Data.
  • Select Use CellCulture_Transformed (the design you just defined) then click Import.
  • Click Next, then Save and Finish.
  • When the import completes, select Manage Assay Design > Edit assay design.
  • Click Download Template Files next to the Save Script Data for Debugging checkbox.
  • Unzip the downloaded "sampleQCData" package to see the .tsv files.
  • Open the "runData.tsv" file to view the current fields.
ParticipantID	TotalCellCount	CultureDay	Media	VisitID	Date	SpecimenID
demo value 1234 1234 demo value 1234 05/18/2021 demo value
demo value 1234 1234 demo value 1234 05/18/2021 demo value
demo value 1234 1234 demo value 1234 05/18/2021 demo value
demo value 1234 1234 demo value 1234 05/18/2021 demo value
demo value 1234 1234 demo value 1234 05/18/2021 demo value

Save Script Data for Debugging

Typically transform and validation script data files are deleted on script completion. For debug purposes, it can be helpful to be able to view the files generated by the server that are passed to the script. When the Save Script Data for Debugging checkbox is checked, files will be saved to a subfolder named: "TransformAndValidationFiles", in the same folder as the original script. Beneath that folder are subfolders for the AssayId, and below that a numbered directory for each run. In that nested subdirectory you will find a new "runDataFile.tsv" that will contain values from the run file plugged into the current fields.

SpecimenID	ParticipantID	Date	CultureDay	TotalCellCount	Media
S-001 PT-101 2019-05-17 00:00 1 127 Media A
S-002 PT-101 2019-05-18 00:00 2 258 Media A
S-003 PT-101 2019-05-19 00:00 3 428 Media A
S-004 PT-101 2019-05-20 00:00 4 638 Media A
S-005 PT-101 2019-05-21 00:00 5 885 Media A
S-006 PT-101 2019-05-22 00:00 6 1279 Media A
S-007 PT-101 2019-05-23 00:00 7 2004 Media A
S-008 PT-101 2019-05-24 00:00 8 3158 Media A
S-009 PT-101 2019-05-25 00:00 9 4202 Media A
S-010 PT-101 2019-05-26 00:00 10 4663 Media A

Define the Desired Transformation

The runData.tsv file gives you the basic fields layout. Decide how you need to modify the default data. For example, perhaps for our project we need the day portion of the Date pulled out into a new integer field.

Add Required Fields to the Assay Design

  • Select Manage Assay Design > Edit assay design.
  • Scroll down to the CellCulture_Transformed Results Fields section and click Add Field.
  • Enter the Name: "MonthDay", select Data Type: "Integer".
  • Scroll down and click Save.

Write a Script to Transform Run Data

Now you have the information you need to write and refine your transformation script. You can use this simple example to get started:

A walkthrough of this example is available in this topic: Set up a Data Transform Script

Iterate over the Test Run to Complete Script

Re-import the same run using the transform script you have defined.

  • From the run list, select the run and click Re-import Run.
  • Click Next.
  • Under Run Data, click Use the data file(s) already uploaded to the server.
  • Click Save and Finish.

The results now show the new field populated with the Month Day value.

Until the results are as desired, you will edit the script and use Re-Import Run to retry.

Once your transformation script is working properly, re-edit the assay design one more time to uncheck the Save Script Data box - otherwise your script will continue to generate artifacts with every run and could eventually fill your disk. Click Save.

Troubleshooting

When developing transformation scripts, remember to check the box in the assay design to "Save Script Data for Debugging". This will retain generated files that would otherwise be deleted upon successful import. You can iterate on details of your transformation script outside the assay framework using a set of these generated files.

Depending on the step in the process, there are a number of ways errors can be reported, including but not limited to the errors file described below.

runDataFile.tsv Not Found

If you import data using a transformation script and see an error about the "runDataFile.tsv" not being found, this is likely because the assay design does not recognize the expected data prior to your transformations being run. This could happen if the data file has a header section or if the pre-transformation column headers are not recognizable. In cases like this, the path to where that file would've been inferred is included in the runProperties.tsv, but there is no file at that location.

You would see an error message similar to the following:

An error occurred when running the script 'my_transformation_script.py', exit code: 1).
Traceback (most recent call last):
File "my_transformation_script.py", line 15, in <module>
fileIn = open(filePathIn, "r")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\dev\\labkey\\labkeyHome\\build\\deploy\\files\\Tutorials\\PythonDemo\\@files\\TransformAndValidationFiles\\AssayId_432\\work-6\\runDataFile.tsv'

Instead of using the runDataFile, use the "runDataUploadedFile" in its original format. This path is also included in the runProperties.tsv.

General Script Errors

If your script has errors that prevent import of the run, you will often see red text in the Run Properties window. If you fail to select the correct data file, for example:

Type Mismatches

If you have a type mismatch error between your script results and the defined destination field, you will see an error like:

Scripting Engine Errors

Before you can run scripts, you must configure the necessary scripting engine on your server. If you are missing the necessary engine, or the desired engine does not have the script file extension you are using, you'll get an error message similar to:

A script engine implementation was not found for the specified QC script (my_transformation_script.py). Check configurations in the Admin Console.

Note that extensions in the configuration of scripting engines should not include the '.' (i.e. for python scripts, enter "py" in the field, not ".py".

Errors File

If the validation script needs to report an error that is displayed by the server, it adds error records to an error file. The location of the error file is specified as a property entry in the run properties file. The error file is in a tab-delimited format with three columns:

  1. type: error, warning, info, etc.
  2. property: (optional) the name of the property that the error occurred on.
  3. message: the text message that is displayed by the server.
Sample errors file:
typepropertymessage
errorrunDataFileA duplicate PTID was found : 669345900
errorassayIdThe assay ID is in an invalid format

Related Topics