Accurate and consistent user entry is important to assay data, especially when it includes manual input of key metadata. For example, if an operator failed to enter a needed instrument setting, and later someone else wants to recreate an interesting result, it can be impossible to determine how that result was actually obtained. If an instrument's brand name is entered where a serial number is expected, results from different machines can be erroneously grouped as if they came from a single machine. If one machine is found to be faulty, you may be forced to throw out all data if you haven't accurately tracked where each run was done.
This topic demonstrates a few of the options available for data validation during upload:
- Required fields: prevent operators from skipping critical entries
- Regular expressions: validate entered text against a pattern
- Range validators: catch import of runs containing obviously out of range data
Set Up Validation
Here we add some validation to our GenericAssay design by modifying it. Remember that the assay design is like a map describing how to import and store data. When we change the map, any run data imported using the old design may no longer pass validation.
Open the design for editing:
- Click the Assay Dashboard tab.
- In the Assay List section, click the GenericAssay link.
- Select Manage Assay Design > edit assay design.
- Note that if you didn't specify the current subfolder when you defined this tutorial assay, you will get a pop up dialog "This assay is defined in the /home folder. Would you still like to edit it?". Click Ok to continue to the Assay Designer if you are the only user of this assay on the /home folder.
By default, any new field you enter is optional. If you wish, you can make one or more fields required, so that if an operator skips an entry, the upload fails.
- Scroll to the GenericAssay Run Fields section.
- Select the InstrumentSetting field (in the "Run Fields" section).
- Click the Validators tab and then click the Required checkbox.
- Click Save and Close.
- If you get the message The required property cannot be changed when rows already exist, this means assay data has already been imported using this design without the instrument setting. You will need to delete the offending assay runs before you can set the field as required.
Using a regular expression to check entered text is a flexible form of validation. You could compare text to an expected pattern, or in this example, we can check that special characters like angle brackets are not included in an email address (as could happen in a cut and paste from a contact list).
- Reopen Manage Assay Design > edit assay design.
- Select the OperatorEmail field in the "Batch Fields" section. The extended property editor will appear to the right.
- Click the Validators tab and then click Add Regex Validator.
- Enter the following parameters:
- Name: BracketCheck
- Description: Ensure no angle brackets.
- Regular Expression: .*[<>].*
- Error Message: An email address cannot contain the "<" or ">" characters
- Check the box for Fail when pattern matches. Otherwise, you would be requiring that emails contained the offending characters.
- Click OK.
For more information on regular expressions, see Class Pattern
By checking that a given numeric value falls within a given range, you can catch some bad runs at the very beginning of the import process.
- Select the M3 field in the "Data Fields" section. The extended property editor will appear to the right.
- Click the Validators tab and then click the Add Range Validator button (which only appears for numeric fields).
- Enter the following parameters:
- Name: M3ValidRange
- First Condition: Select greater than or equals: 5
- Second Condition: Select less than or equals: 100
- Error Message: Valid M3 values are between 5 and 100.
- Click OK.
- Click Save & Close to save the edited GenericAssay design.
Observe Validation in Action
To see how data validation would screen for these issues, we'll intentionally upload some "bad" data which will fail the validators we just defined.
- On the Assay Dashboard tab, in the Files web part, select the file [LabKeyDemoFiles]/Assays/Generic/GenericAssay_BadData.xls.
- Click Import Data.
- Select Use GenericAssay and click Import.
- Paste in "John Doe <firstname.lastname@example.org>" as the OperatorEmail. Leave other entries at their defaults, saved from our prior imports.
- Click Next.
- Observe the next red error message: "Value 'John Doe <email@example.com>' for field 'OperatorEmail' is invalid. An email address cannot contain the "<" or ">" characters.
- Correct the email address entry to read only "firstname.lastname@example.org" as before.
- Click Next again and you will no longer see the error.
- Enter an Assay ID for the run, such as "BadRun" and delete the InstrumentSetting value which was autofilled based on your prior upload.
- Click Save and Finish.
The sequence in which validators are run does not necessarily match their order in the design.
- Observe the red error text: "Instrument Setting is required and must be of type Integer."
- Enter a value and click Save and Finish again.
- Observe error message: "Value '4.8' for field 'M3' is invalid. Valid M3 values are between 5 and 100." The invalid M3 value is included in the spreadsheet being imported, so the only way to clear this particular error would be to edit/save/reimport the spreadsheet.
There is no actual need to import bad data now that we have seen how it works, so cancel the import or simply click the Assay Dashboard
tab to return to the home page.