Scripting pipelines with more than one input file

LabKey Support Forum (Inactive)
Scripting pipelines with more than one input file kevink  2016-10-04 10:23
Status: Closed
 
Unfortunately, for historical reasons, the pipeline task's input files are identified by file extension rather than the actual input file name. If input2.csv is changed to input2.tsv, I believe your example will work but the task will expect a single csv file and a single tsv file as inputs.

By default, when there are multiple files for a single input name, we will launch the task multiple times -- once for each input file. In your case, however, you'd like to have multiple input files sent to a single task. To do this, add splitFiles="true" to the "input.csv" input in the task xml. Ideally we would expand the ${input.csv} token in the R script into a list of all the selected input files, however we haven't implemented it yet. As a workaround, you will need to read the taskInfo.tsv file to get the list of input files for the input.csv token. Here is some example code you can use:

jobInfo <- read.table("${pipeline, taskInfo}",
                      col.names=c("name", "value"),
                      header=FALSE, check.names=FALSE,
                      stringsAsFactors=FALSE, sep="\t", quote="",
                      fill=TRUE, na.strings="")

# collect all input files
inputFiles <- jobInfo$value[ grep("input\\.csv", jobInfo$name) ]


The HIPC group at the Fred Hutch have a pipeline script that uses multiple input files on github:

https://github.com/RGLab/LabKeyModules/blob/master/HIPCMatrix/pipeline/tasks/create-matrix.r
https://github.com/RGLab/LabKeyModules/blob/master/HIPCMatrix/pipeline/tasks/create-matrix.task.xml