Accurate and consistent user entry is important to assay data, especially when it includes manual input of key metadata. For example, if an operator failed to enter a needed instrument setting, and later someone else wants to recreate an interesting result, it can be impossible to determine how that result was actually obtained. If an instrument's brand name is entered where a serial number is expected, results from different machines can be erroneously grouped as if they came from a single machine. If one machine is found to be faulty, you may be forced to throw out all data if you haven't accurately tracked where each run was done.
This topic demonstrates a few of the options available for data validation during upload:
Set Up Validation
Here we add some validation to our GenericAssay design by modifying it. Remember that the assay design is like a map describing how to import and store data. When we change the map, any run data imported using the old design may no longer pass validation.
Open the design for editing:
- Navigate to the Assay Tutorial folder.
- In the Assay List web part, click GenericAssay.
- Select Manage Assay Design > Edit assay design.
Note that if you didn't specify the current (tutorial) subfolder when you defined this assay, you will get a pop up dialog "This assay is defined in the <PROJECT_NAME> folder. Would you still like to edit it?". Click Ok to continue to the Assay Designer if you are the only user of this assay in this project, otherwise you will need to copy the assay design to the current tutorial folder before proceeding to edit your copy.
By default, any new field you add to an assay design is optional. If you wish, you can make one or more fields required, so that if an operator skips an entry, the upload fails.
- Scroll to the GenericAssay Run Fields section.
- Select the InstrumentSetting field (in the "Run Fields" section).
- Click the Validators tab and then click the Required checkbox.
- Click Save and Close.
- If you get the message The required property cannot be changed when rows already exist, this means assay data has already been imported using this design without an instrument setting. You will need to delete the offending assay runs before you can set the field as required.
Using a regular expression (RegEx) to check entered text is a flexible form of validation. You could compare text to an expected pattern, or in this example, we can check that special characters like angle brackets are not included in an email address (as could happen in a cut and paste of an email address from a contact list).
- Reopen Manage Assay Design > Edit assay design.
- Select the OperatorEmail field in the "Batch Fields" section. The extended property editor will appear to the right.
- Click the Validators tab and then click Add Regex Validator.
- Enter the following parameters:
- Name: BracketCheck
- Description: Ensure no angle brackets.
- Regular Expression: .*[<>].*
- Error Message: An email address cannot contain the "<" or ">" characters
- Check the box for Fail when pattern matches. Otherwise, you would be requiring that emails contained the offending characters.
- Click OK.
For more information on regular expressions, see Class Pattern
By checking that a given numeric value falls within a given range, you can catch some bad runs at the very beginning of the import process.
- Select the M3 field in the "Data Fields" section. The extended property editor will appear to the right.
- Click the Validators tab and then click the Add Range Validator button (which only appears for numeric fields).
- Enter the following parameters:
- Name: M3ValidRange
- First Condition: Select greater than or equals: 5
- Second Condition: Select less than or equals: 100
- Error Message: Valid M3 values are between 5 and 100.
- Click OK.
- Click Save & Close to save the edited GenericAssay design.
Observe Validation in Action
To see how data validation would screen for these issues, we'll intentionally upload some "bad" data which will fail the validation steps we just added.
- Click Assay Tutorial to return to the main folder page.
- In the Files web part, select the file /Assays/Generic/GenericAssay_BadData.xls.
- Click Import Data.
- Select Use GenericAssay and click Import.
- Paste in "John Doe <firstname.lastname@example.org>" as the OperatorEmail. Leave other entries at their defaults, saved from our prior imports.
- Click Next.
- Observe the next red error message: "Value 'John Doe <email@example.com>' for field 'OperatorEmail' is invalid. An email address cannot contain the "<" or ">" characters.
- Correct the email address entry to read only "firstname.lastname@example.org" as before.
- Click Next again and you will proceed, no longer seeing the error.
- Enter an Assay ID for the run, such as "BadRun".
- Delete the InstrumentSetting value which was autofilled based on your prior upload.
- Click Save and Finish.
The sequence in which validators are run does not necessarily match their order in the design.
- Observe the red error text: "Instrument Setting is required and must be of type Integer."
- Enter a value and click Save and Finish again.
- Observe error message: "Value '4.8' for field 'M3' is invalid. Valid M3 values are between 5 and 100." The invalid M3 value is included in the spreadsheet being imported, so the only way to clear this particular error would be to edit/save/reimport the spreadsheet.
There is no actual need to import bad data now that we have seen how it works, so cancel the import or simply click the Assay Tutorial
link to return to the home page.