This topic includes some examples of using File Watchers.
Suppose you want to create a set of datasets based on Excel and TSV files, and load data into those datasets. To set this up, do the following:
Field | Value |
---|---|
Name | Load MyStudy |
Description | Imports datasets to MyStudy |
Type | Pipeline File Watcher |
Pipeline Task | Import/reload study datasets using data file. |
Location to Watch | watched |
File Pattern | |
Move to container | |
Move to subdirectory |
Consider a set of data with original filenames matching a format like this:
sample_2017-09-06_study20.tsv
sample_2017-09-06_study21.tsv
sample_2017-09-06_study22.tsv
An example filePattern regular expression that would capture such filenames would be:
sample_(.+)_(?<study>.+)\.tsv
Files that match the pattern are acted upon, such as being moved and/or imported to tables in the server. Nothing happens to files that do not match the pattern.
If the regular expression contains named capturing groups, such as the "(?<study>.+)" portion in the example above, then the corresponding value (in this example "study20") can be substituted into other property expressions. For instance, a Move to container setting of:
/studies/${study}/@pipeline/import/${now:date}
would resolve into:
/studies/study20/@pipeline/import/2017-11-07 (or similar)
This substitution allows the administrator to determine the destination folder based on the name, ensuring that the data is uploaded to the correct location.
Field | Value |
---|---|
Name | Load StudyA |
Description | Moves and imports datasets to StudyA |
Type | Pipeline File Watcher |
Pipeline Task | Import/reload study datasets using data file. |
Location | . |
File Pattern | sample_(.+)_(?<study>.+)\.tsv |
Move to container | /studies/${study}/@pipeline/import/${now:date} |
A File Watcher that matches .tsv/.xls files with "StudyA_" prefixed to the file name. For example, "StudyA_LabResults.tsv". Files are moved, and the data imported, to the StudyA folder. The <name> capture group determines the name of the dataset, so that "StudyA_LabResults.tsv" becomes the dataset "LabResults".
Field | Value |
---|---|
Name | Load StudyA |
Description | Moves and imports datasets to StudyA |
Type | Pipeline File Watcher |
Pipeline Task | Import/reload study datasets using data file. |
Location | . |
File Pattern | StudyA_(?<name>.+)\.(?:tsv|xls) |
Move to container | StudyA |
Move to subdirectory | imported |
To distribute files like the following to different study folders:
StudyA_Demographics.tsv
StudyB_Demographics.tsv
StudyA_LabResults.tsv
StudyB_LabResults.tsv
Field | Value |
---|---|
Name | Load datasets |
Location | watched |
File Pattern | (?<study>.+)_(?<name>.+)\.tsv |
Move to container | ${study} |
Move to subdirectory | imported |
The Parameter Function is a JavaScript function which is executed during the move. In the example below, the username is selected programmatically:
var userName = sourcePath.getNameCount() > 0 ? sourcePath.getName(0) : null;
var ret = {'pipeline, username': userName }; ret;
previousnext |
expand allcollapse all |