File Watchers let administrators
set up the monitoring of directories on the file system and perform specific actions when desired files appear. This topic describes the various pipeline tasks available. The configuration options for file watcher triggers and specifics about type of files eligible vary based on the type of file watcher, and the tasks available vary by folder type.
File Watcher Tasks
Reload Folder Archive
This option is available in any folder type and reloads an unzipped folder archive. It will not accept compressed (.zip) folder archives.
The reloader expects a folder.xml in the base directory of the archive. To create an unzipped folder archive,
export the folder to your browser as a .zip file and then unzip it.
To reload a study, use this filewatcher, providing an unzipped folder archive containing the study objects as well as the folder.xml file and any other folder objects needed.
Import Samples From Data File
This option supports importing Sample data into a specified Sample Type. A few key options on the
Configuration panel are described here.
Action: Merge or Append
Import behavior into Sample Types has two options, Merge or Append.
- Merge: When an incoming field contains a value, the corresponding value in the Sample Type will be updated. When a field in the imported data has no value (an empty cell), the corresponding value in the Sample Type will be deleted.
- Append: The incoming data file will be inserted as new rows in the Sample Type. The operation will fail if there are existing sample ids that match those being imported.
File Pattern
You can tell the trigger which Sample Type the imported data belongs to by using one of these
file name capture methods:
- <name>: the text name of the Sample Type, for example "BloodVials".
- <id>: The integer system id of the Sample Type, for example, "330". To find the system id: go to the Sample Types web part and click a Sample Type. The URL will show the id as a parameter named 'RowId'. For example:
For example, a
File Pattern using the
name might look like:
Sample_(?<name>.+)_.(xlsx|tsv|xls)
...which would recognize the following file name as targeting a Sample Type named "BloodVials":
If the target Sample Type does not exist, the filewatcher import will fail.
Import Assay Data from a File
Currently only Standard assay designs are supported, under the
General assay provider. Multi-run files and run re-imports are not supported by the file watcher.
The following file formats are supported, note that .txt files are not supported:
- xls, .xlsx, .csv, .tsv, .zip
The following assay data and metadata are supported by the file watcher:
- result data
- batch properties/metadata
- run properties/metadata
- plate properties/metadata
If only result data is being imported, you can use a single tabular file.
If additional run metadata is being imported, you can use either a zip file format or an excel multi-sheet format. In a zip file format the system determines the data type (result, run metadata, etc.) using the names of the files. In the multi-sheet format the system matches based on the sheet names. The sheet names don't need to be in any particular order. The following matching criteria are used:
data type | for zipped files, use file name... | for multi-sheet Excel, use sheet name... |
---|
batch properties | batchProperties.(tsv, csv, xlsx, xls) | batchProperties |
run properties | runProperties.(tsv, csv, xlsx, xls) | runProperties |
results data | results.(tsv, csv, xlsx, xls) | results |
plate metadata | plateMetadata.json | not supported |
The following multi-sheet Excel file shows how to format results data and run properties fields on different sheets:
The assay provider (currently only General is supported) and protocol can be specified in the file watcher configuration. This is easier to configure than binding to the protocol using a regular expression named capture group.
When setting the
File Pattern,
regular expression 'name capture' can be used as with other file watcher types to match names or IDs from the source file name.
Two capture groups can be used:
- name: the assay protocol name (for example, MyAssay)
- id: the system id of the target assay (an integer)
For example this file name pattern:
assay_(?<name>.+)_.(xlsx|tsv|xls|zip)
will interpret the following file name as targeting an assay named "MyAssay":
If the target assay does not exist the filewatcher import will fail.
If there is no name capture group in the file pattern and there is a single assay protocol in the container, the system attempts to import into that single assay.
The following example file pattern uses the protocol ID instead of the assay name:
assayProtocol_(?<id>.+)_.(xlsx|tsv|xls|zip)
which will interpret this file as targeting the assay with protocol ID 308:
Reload Lists Using Data File
This option is available in any folder type, provided the list module has been enabled. It imports data to existing lists from source files in either Excel (.xls/.xlsx) or TSV (.tsv) formats. It can also infer non-key column changes. Note that this task cannot create a new list definition: the list definition must already exist on the server.
You can reload lists from files in S3 storage by enabling an SQS Queue and configuring cloud storage to use in your local folder. Learn more in this topic:
Move Files Across the Server
This option is available in any folder type. It moves and/or copies files around the server without analyzing the contents of those files.
Import/Reload Study Datasets Using Data File
This option is available in a study folder. It loads data into existing study datasets and it infers/creates datasets if they don't already exist. Source data can be in TSV, Excel, or text files.
Import Specimen Data Using Data File
This option is only available in study folders with the specimen module enabled.
This file watcher type accepts specimen data in both .zip and .tsv file formats:
- .zip: The specimen archive zip file has a .specimens file extension.
- .tsv: An individual specimens.tsv file which will typically be the simple specimen format and contain only vial information. This file will have a # specimens comment at the top.
By default, specimen data imported using the a file watcher will be set to
replace existing data. To
merge instead, set the
custom property "mergeSpecimen" to true.
Specimen module docs:
Specimen Tracking (Legacy)
Import a Directory of FCS Files
Import flow files to the flow module. This type of file watcher is only available in Flow folders. It supports a process where FCS flow data is deposited in a common location by a number of users, with each data export placed into a subdirectory of the watched folder, perhaps in a separate subdirectory per user.
When the File Watcher finds these files, they are placed into a new location under the folder pipeline root based on the current user and date. Example: @pipeline/${username}/${date('YYYY-MM')}
LabKey then imports the FCS data to that container. All FCS files within a single directory are imported as a single experiment run in the flow module.
Custom File Watcher Tasks
Custom Parameters
Add custom parameters on the
Configuration panel, first expanding the
Show Advanced Settings section. Click
Add Custom Parameter to add each one. Click
to delete a parameter.
allowDomainUpdates
This parameter used in earlier versions has been replaced with the checkbox option to
Allow Domain Updates on the
Configuration panel for the tasks 'Reload Lists Using Data File' and 'Import/Reload Study Datasets Using Data File'.
When updating lists and datasets, by default, the columns in the incoming data will overwrite the columns in the existing list or dataset. This means that any new columns in the incoming data will be added to the list and any columns missing from the incoming data will be dropped (and their data deleted).
To override this behavior, uncheck the
Allow Domain Updates box to retain the column set of the existing list or dataset.
default.action
The "default.action" parameter accepts text values of either : replace or append, the default is replace. This parameter can be used to control the default
Action for the trigger, which may also be
more conveniently set using the Action options on the
Configuration panel.
mergeData
This parameter can be included to merge data, with the value set to either true or false. The default is false (replace) and for existing configurations if no param was provided we interpret that as : false/replace.
Where supported, merging can be
more conveniently set using the Action options on the
Configuration panel.
mergeSpecimen
By default, specimen data imported using the 'Import Specimen Data Using Data File' file watcher will be set to
replace existing data. To
merge instead, set the property "mergeSpecimen" to true.
skipQueryValidation
'Reload Study' and 'Reload Folder Archive' can be configured to skip query validation by adding a custom parameter to the file watcher named 'skipQueryValidation' and setting it to 'TRUE'. This may be helpful if your file watcher reloads are failing due to unrelated query issues.
Related Topics