Scripting pipelines with more than one input file | kevink | 2016-10-04 10:23 |
Status: Closed | ||
Unfortunately, for historical reasons, the pipeline task's input files are identified by file extension rather than the actual input file name. If input2.csv is changed to input2.tsv, I believe your example will work but the task will expect a single csv file and a single tsv file as inputs. By default, when there are multiple files for a single input name, we will launch the task multiple times -- once for each input file. In your case, however, you'd like to have multiple input files sent to a single task. To do this, add splitFiles="true" to the "input.csv" input in the task xml. Ideally we would expand the ${input.csv} token in the R script into a list of all the selected input files, however we haven't implemented it yet. As a workaround, you will need to read the taskInfo.tsv file to get the list of input files for the input.csv token. Here is some example code you can use: jobInfo <- read.table("${pipeline, taskInfo}", col.names=c("name", "value"), header=FALSE, check.names=FALSE, stringsAsFactors=FALSE, sep="\t", quote="", fill=TRUE, na.strings="") # collect all input files inputFiles <- jobInfo$value[ grep("input\\.csv", jobInfo$name) ] The HIPC group at the Fred Hutch have a pipeline script that uses multiple input files on github: https://github.com/RGLab/LabKeyModules/blob/master/HIPCMatrix/pipeline/tasks/create-matrix.r https://github.com/RGLab/LabKeyModules/blob/master/HIPCMatrix/pipeline/tasks/create-matrix.task.xml |
||