Premium Feature — This feature is available in the Professional, Professional Plus, and Enterprise Editions. Learn more or contact LabKey

File Watchers let administrators set up the monitoring of directories on the file system and perform specific actions when desired files appear. Multiple file watchers can be set up to monitor a single directory for file changes, or each file watcher could watch a different location. When new or updated files appear in a monitored directory, a specified pipeline task (or set of tasks) will be triggered.

Each File Watcher can be configured to be triggered only when specific file name patterns are detected, such as watching for '.xlsx' files, etc. Use caution when defining multiple file watchers to monitor the same location. If file name patterns are not sufficiently distinct, you may encounter conflicts among filewatchers acting on the same files.

When files are detected, by default they are moved (not copied) to the LabKey folder's pipeline root where they are picked up for processing. (You can change this default behavior and specify that the files be moved to a different location.)

Create a FileWatcher Listener

  • Navigate to the folder where you want the files to be imported, i.e. the destination in LabKey.
  • Open the File Watcher management UI:
    • Select (Admin) > Folder > Management. Click the Import tab and scroll down.
    • In a study folder, you can instead click the Manage tab, then click Manage File Watchers.
  • Depending on your project's enabled module set, options for trigger creation may vary. Click the desired link below Create a trigger to....
  • Manage File Watcher Triggers: Click to see the table of all currently configured file watchers.

Configure the Trigger

The two panels of the Create Pipeline Trigger wizard define a file watcher. Configuration options and on-screen guidance vary somewhat for each task type.

Details

  • Name: A unique name for the trigger.
  • Description: A description for the trigger.
  • Type: Currently supports one value 'pipeline-filewatcher'.
  • Pipeline Task: The type of filewatcher task you want to create. By default the option you clicked to open this wizard is selected but you can change this selection from available options on the dropdown menu.
  • Run as username: The file watcher will run as this user in the pipeline. It is strongly recommended that this user has elevated permissions to perform updates, deletes, etc.
  • Assay Provider: Use this provider for running assay import runs.
  • Enabled: Turns on detection and triggering.
  • Click Next to move to the next panel.

Configuration

  • Location: File location to watch for uploadable files. This can be an absolute path on the server’s file system or a relative path under the container’s pipeline root.
  • Include child folders: A boolean indicating whether to seek uploadable files in subdirectories (currently to a max depth of 3).
  • FilePattern: A Java regular expression that captures filenames of interest and can extract and use information from the filename to set other properties. We recommend using a regex interpreter, such as https://regex101.com/, to test the behavior of your file pattern. Options are described below.
  • Quiet period: Number of seconds to wait after file activity before executing a job (minimum is 1). If you encounter conflicts, particularly when running multiple filewatchers monitoring the same location, try increasing the quiet period.
  • Move to container: Move the file to this container before analysis. This must be a relative or absolute container path. If this field is blank, and the watched file is already underneath a pipeline root, then it will not be moved. If this field is blank but the watched file is elsewhere, it will be moved to the pipeline root of the current container. You must have at least Folder Administrator permissions in the folder where files are being moved to.
  • Move to subdirectory: Move the file to this directory under the destination container's pipeline root. Leaving this blank will default to the pipeline root. You must have at least Folder Administrator permissions in the folder where files are being moved to.
  • Copy file to: Where the file should be copied to before analysis. This can be absolute or relative to the current project/folder. You must have at least Folder Administrator permissions in the folder where the files are being copied to.
  • Parameter Function: Include a JavaScript function to be executed during the move. (See details below.)
  • Add custom parameter: These parameters will be passed to the chosen pipeline task for consumption in addition to the standard configuration.
  • Click Save when finished.

File Watcher Tasks

The configuration options for file watcher triggers and specifics about type of files eligible vary based on the type of file watcher, and the types available vary by folder type.

Any Folder:

Study Folder:
  • Import/reload study datasets using data file: Create or reload study datasets from source files (either .tsv, Excel, or .txt files). This task can create dataset definitions if they are do not already exist in the study.
  • Reload study: Reload or populate an entire study from a study archive. Imports datasets, lists, and study properties.
Flow Folder:

Reload Folder Archive

This option is available in any folder type and reloads an unzipped folder archive. It will not accept compressed (.zip) folder archives.

The reloader expects a folder.xml in the base directory of the archive. To create an unzipped folder archive, export the folder to your browser as a .zip file and then unzip it.

Reload Lists Using Data File

This option is available in any folder type and imports data to existing lists from source files. It can also infer non-key column changes. Note that this task cannot create a new list definition: the list definition must already exist on the server. The list module must also be enabled in the folder.

By default, this task replaces list data using the contents of the source files. You may merge data by including the custom parameter: "mergeData": "true".

Move Files Across the Server

This option is available in any folder type. It moves and/or copies files around the server without analyzing the contents of those files.

Import/Reload Study Datasets Using Data File

This option is available in a study folder and creates and/or loads data into study datasets from source files. Source data can be in TSV, Excel, or text files. This task can also create dataset definitions if they do not already exist in the study.

Note that merging dataset data is not supported, only truncate and replace all. Upon reload the entire dataset design and data are replaced. Dataset columns are added or removed depending on the columns found in the Excel/TSV/text source file.

Reload Study

This option reloads or populates a study from an unzipped study archive. This task is triggered by a .txt file (i.e. studyload.txt) and reloads a study (both its datasets and study configurations such as cohort assignments). To create a study archive, see Export a Study.

Import a Directory of FCS Files

Import flow files to the flow module.

File Pattern Options

File patterns define which files are selected by the reloader based on their names. Use file patterns in conjuction with the target container path to load a subset of files to a particular folder on the server.

No file pattern / Default file pattern

If no FilePattern is supplied, the default pattern is used:

(^\D*).(?:tsv|txt|xls|xlsx)

This pattern matches only file names that contain letters and special characters (for example: Dataset_A.tsv). File names which include digits (for example: Dataset_1.tsv) are not matched, and their data will not be loaded.

If you want to target datasets that have digits in their names, use a "name capture group" as the FilePattern. See below for details.

Under the default file pattern, the following reloading behavior will occur:

File NameFile Watcher Behavior
DemgraphicsA.tsvFile matched, data loaded into dataset DemographicsA.
DemgraphicsB.tsvFile matched, data loaded into dataset DemographicsB.
Demgraphics1.tsvNo file match, data will not be loaded.
Demgraphics2.tsvNo file match, data will not be loaded.

User defined pattern

You can use any regex pattern to select source files for reloading. For example, suppose you have the following third source files:

FooStudy_Demographics.tsv
FooStudy_LabResults.tsv
BarStudy_Demographics.tsv

The regex file pattern...

FooStudy_(.+).(tsv)

will result in the following behavior...

File NameFile Watcher Behavior
FooStudy_Demographics.tsvFile matched, data loaded into dataset FooStudy_Demographics.
FooStudy_LabResults.tsvFile matched, data loaded into dataset FooStudy_LabResults.
BarStudy_Demographics.tsvNo file match, data will not be loaded.

"Name Capture Group" pattern

This type of file pattern extracts names or ID's from the source file name and targets an existing dataset of the same name or id. For example, suppose you have a source file with the following name:

dataset_Demographics_.xls

The following file pattern extracts the value <name> from the file name, in this case the string "Demographics" that occurs between the underscore characters, and loads data into an existing dataset with the same name "Demographics".

dataset_(?<name>.+)_.(xlsx|tsv|xls)

Note that you can use the technique above to target datasets that include numbers in their names. For example, using the pattern above, the following behavior will result.

File NameFile Watcher Behavior
dataset_Demographics_.tsvFile matched, data loaded into dataset Demographics.
datasetDemographics.tsvNo file match, data will not be loaded.
dataset_LabResults1_.tsvFile matched, data loaded into dataset LabResults1.
dataset_LabResults2_.tsvFile matched, data loaded into dataset LabResults2.

To target a dataset by its dataset id, rather than its name, then use the following regex, where <id> refers to the dataset id. Note that you can determine a dataset's id by navigating to your study's Manage tab, and clicking Manage Datasets. The table of existing datasets shows the id for each dataset in the first column.

dataset_(?<id>.+)_.(xlsx|tsv|xls)

Examples

Example: Dataset Creation and Import

Suppose you want to create a set of datasets based on Excel and TSV files, and load data into those datasets. To set this up, do the following:

  • Prepare your Excel/TSV files to match the expectations of your study, especially, time point-style (date or visit), ParticipantId column name, and time column name
  • Copy the Excel/TSV files to a location available to the File Watcher. You can do this by either (1) copying the file to the server's machine or (2) uploading the file into the study's File Repository.
  • Create a trigger to Import/reload study datasets using data file.
  • Location: point the trigger at your directory of files.
  • When the trigger is enabled, datasets will be created and loaded in your study.

Example: FCS Files

Consider a process where FCS flow data is deposited in a common location by a number of users, with each data export placed into a subdirectory of the watched folder, perhaps in a separate subdirectory per user.

When the File Watcher finds these files, they are placed into a new location under the folder pipeline root based on the current user and date. Example: @pipeline/${username}/${date('YYYY-MM')}

LabKey then imports the FCS data to that container. All FCS files within a single directory are imported as a single experiment run in the flow module.

Example: File Name Pattern Matching

Consider a set of data with original filenames matching a format like this: "sample_<timestamp>_<study_id>.xml", for example:

sample_2017-09-06_study20.xml

An example filePattern regular expression that would capture such filenames would be:

sample_(.+)_(?<study>.+).xml

If the specified pattern matches a file placed in the watched location, then the specified move and/or execute steps will be performed on that file. Nothing will happen to files in the watched location which do not match the pattern.

If the regular expression contains named capturing groups, such as the "(?<study>.+)" portion in the example above, then the corresponding value (in this example "study20" can be substituted into other property expressions. For instance, a move setting of:

/studies/${study}/@pipeline/import/${now:date}
would resolve into:
/studies/study20/@pipeline/import/2017-11-07 (or similar)
This substitution allows the administrator to configure the file watcher to automatically determine the destination folder based on the name, ensuring that the data is uploaded to the correct location.

Example: Using the Parameter Function

The Parameter Function is a JavaScript function which is executed during the move. In the example below, the username is selected programmatically:

var userName = sourcePath.getNameCount() > 0 ? sourcePath.getName(0) : null;
var ret = {'pipeline, username': userName }; ret;

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all