Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

File Watchers let administrators set up the monitoring of directories on the file system and perform specific actions when desired files appear. This topic describes the various pipeline tasks available. The configuration options for file watcher triggers and specifics about type of files eligible vary based on the type of file watcher, and the tasks available vary by folder type.

File Watcher Tasks

Reload Folder Archive

This option is available in any folder type and reloads an unzipped folder archive. It will not accept compressed (.zip) folder archives.

The reloader expects a folder.xml in the base directory of the archive. To create an unzipped folder archive, export the folder to your browser as a .zip file and then unzip it.

To reload a study, use this filewatcher, providing an unzipped folder archive containing the study objects as well as the folder.xml file and any other folder objects needed.

Import Samples From Data File

This option supports importing Sample data into a specified Sample Type. A few key options on the Configuration panel are described here.

File Pattern

You can tell the trigger which Sample Type the imported data belongs to by using one of these file name capture methods:

  • <name>: the text name of the Sample Type, for example "BloodVials".
  • <id>: The integer system id of the Sample Type, for example, "330". To find the system id: go to the Sample Types web part and click a Sample Type. The URL will show the id as a parameter named 'RowId'. For example:
    https://SERVER_NAME/FolderPath/experiment-showSampleType.view?rowId=330
For example, a File Pattern using the name might look like:
Sample_(?<name>.+)_.(xlsx|tsv|xls)

...which would recognize the following file name as targeting a Sample Type named "BloodVials":

Sample_BloodVials_.xls

If the target Sample Type does not exist, the filewatcher import will fail.

Action: Merge or Append

Import behavior into Sample Types has two options, Merge or Append.

  • Merge: When an incoming field contains a value, the corresponding value in the Sample Type will be updated. When a field in the imported data has no value (an empty cell), the corresponding value in the Sample Type will be deleted.
  • Append: The incoming data file will be inserted as new rows in the Sample Type. The operation will fail if there are existing sample ids that match those being imported.

Import Lookups by Alternate Key

Some sample fields, including the built in Status field, are structured as lookups. When a filewatcher encounters a value other than the primary key for such a lookup, it will only resolve if you check the box to Import Lookups by Alternate Key.

For example, if you see an error about the inability to convert the "Available (String)" value, you can either:

  • Edit your spreadsheet to provide the rowID for each "Status" value, OR
  • Edit your filewatcher to check the Import Lookups by Alternate Key box.

Import Assay Data from a File

Currently only Standard assay designs are supported, under the General assay provider. Multi-run files and run re-imports are not supported by the file watcher.

The following file formats are supported, note that .txt files are not supported:

  • xls, .xlsx, .csv, .tsv, .zip
The following assay data and metadata are supported by the file watcher:
  • result data
  • batch properties/metadata
  • run properties/metadata
  • plate properties/metadata
If only result data is being imported, you can use a single tabular file.

If additional run metadata is being imported, you can use either a zip file format or an excel multi-sheet format. In a zip file format the system determines the data type (result, run metadata, etc.) using the names of the files. In the multi-sheet format the system matches based on the sheet names. The sheet names don't need to be in any particular order. The following matching criteria are used:

data typefor zipped files, use file name...for multi-sheet Excel, use sheet name...
batch propertiesbatchProperties.(tsv, csv, xlsx, xls)batchProperties
run propertiesrunProperties.(tsv, csv, xlsx, xls)runProperties
results dataresults.(tsv, csv, xlsx, xls)results
plate metadataplateMetadata.jsonnot supported

The following multi-sheet Excel file shows how to format results data and run properties fields on different sheets:

Configure Target Assay Design

The assay provider (currently only General is supported) and protocol (assay design name) can be specified in the file watcher configuration. This is easier to configure than binding to the protocol using a regular expression named capture group.

If there is no name capture group in the file pattern and there is a single assay protocol in the container, the system attempts to import into that single assay. If the target assay does not exist or cannot be determined, the filewatcher import will fail.

Use Name Capture to Target Any Assay Design

When setting the File Pattern, regular expression 'name capture' can be used as with other file watcher types to match names or IDs from the source file name.

Two capture groups can be used:

  • name: the assay protocol name (for example, MyAssay)
  • id: the system id of the target assay (an integer)
For example this file name pattern:
assay_(?<name>.+)_.(xlsx|tsv|xls|zip)

will interpret the following file name as targeting an assay named "MyAssay":

assay_MyAssay_.xls

The following example file pattern uses the protocol ID instead of the assay name:

assayProtocol_(?<id>.+)_.(xlsx|tsv|xls|zip)

which will interpret this file as targeting the assay with protocol ID 308:

assayProtocol_308_.tsv

Reload Lists Using Data File

This option is available in any folder type, provided the list module has been enabled. It imports data to existing lists from source files in either Excel (.xls/.xlsx) or TSV (.tsv) formats. It can also infer non-key column changes. Note that this task cannot create a new list definition: the list definition must already exist on the server.

You can reload lists from files in S3 storage by enabling an SQS Queue and configuring cloud storage to use in your local folder. Learn more in this topic:

Move Files Across the Server

This option is available in any folder type. It moves and/or copies files around the server without analyzing the contents of those files.

Import/Reload Study Datasets Using Data File

This option is available in a study folder. It loads data into existing study datasets and it infers/creates datasets if they don't already exist. Source data can be in TSV, Excel, or text files. You can configure the filewatcher to either append or replace any existing data with the new data.

You can use a name capture group to be able to identify the target dataset as a portion of the filename. You can also use a compound name capture group to have a filewatcher target multiple studies from the same location. Examples are available in this topic: File Watcher: File Name Patterns

If you don't use a name capture group, the system will use the entire filename stem as the name of the dataset. For example, dropping the following files into the watched location will load two datasets of these names:

Dropped FileDataset Loaded
Demographics.xlsDemographics
LabResults.xlsLabResults
New_LabResults.xlsNew_LabResults

To have a file like "New_LabResults.xls" reload new data into the LabResults dataset, you would need a name capture group that parsed out the <name> between the underscore and dot.

Import Specimen Data Using Data File

This option is only available in study folders with the specimen module enabled.

This file watcher type accepts specimen data in both .zip and .tsv file formats:

  • .zip: The specimen archive zip file has a .specimens file extension.
  • .tsv: An individual specimens.tsv file which will typically be the simple specimen format and contain only vial information. This file will have a # specimens comment at the top.
By default, specimen data imported using the a file watcher will be set to replace existing data. To merge instead, set the custom property "mergeSpecimen" to true.

Specimen module docs: Specimen Tracking (Legacy)

Import a Directory of FCS Files

Import flow files to the flow module. This type of file watcher is only available in Flow folders. It supports a process where FCS flow data is deposited in a common location by a number of users. It is important to note that each data export must be placed into a new separate subdirectory of the watched folder. Once a subfolder has been 'processed', adding new files to it will not trigger a flow import.

When the File Watcher finds a new subdirectory of FCS files, they can be placed into a new location under the folder pipeline root based on the current user and date. Example: @pipeline/${username}/${date('YYYY-MM')}. LabKey then imports the FCS data to that container. All FCS files within a single directory are imported as a single experiment run in the flow module.

One key attribute of a flow filewatcher is to ensure that you set a long enough Quiet Period. When the folder is first created, the file watcher will "wait" the specified quiet period before processing files. This interval must be long enough for all of the files to be uploaded, otherwise the file watcher will only import the files that exist at the end of the quiet period. For example, if you set a 1 minute quiet period, but have an 18 file FCS folder (such as in our tutorial example) you might only have 14 files uploaded at the end of the minute, so only those 14 will be imported into the run. When defining a flow filewatcher, use caution to set an adequate quiet period. In situations where uploads take considerable time, you may decide to keep using a manual upload and import process to avoid the possibility of incomplete runs.

Custom File Watcher Tasks

Custom Parameters

Add custom parameters on the Configuration panel, first expanding the Show Advanced Settings section. Click Add Custom Parameter to add each one. Click to delete a parameter.

allowDomainUpdates

This parameter used in earlier versions has been replaced with the checkbox option to Allow Domain Updates on the Configuration panel for the tasks 'Reload Lists Using Data File' and 'Import/Reload Study Datasets Using Data File'.

When updating lists and datasets, by default, the columns in the incoming data will overwrite the columns in the existing list or dataset. This means that any new columns in the incoming data will be added to the list and any columns missing from the incoming data will be dropped (and their data deleted).

To override this behavior, uncheck the Allow Domain Updates box to retain the column set of the existing list or dataset.

default.action

The "default.action" parameter accepts text values of either : replace or append, the default is replace. This parameter can be used to control the default Action for the trigger, which may also be more conveniently set using the Action options on the Configuration panel.

mergeData

This parameter can be included to merge data, with the value set to either true or false. The default is false (replace) and for existing configurations if no param was provided we interpret that as : false/replace.

Where supported, merging can be more conveniently set using the Action options on the Configuration panel.

mergeSpecimen

By default, specimen data imported using the 'Import Specimen Data Using Data File' file watcher will be set to replace existing data. To merge instead, set the property "mergeSpecimen" to true.

skipQueryValidation

'Reload Study' and 'Reload Folder Archive' can be configured to skip query validation by adding a custom parameter to the file watcher named 'skipQueryValidation' and setting it to 'TRUE'. This may be helpful if your file watcher reloads are failing due to unrelated query issues.

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all