pipeline configuration when trying to accept multiple file extensions | Ben Bimber | 2011-04-26 08:13 |
Status: Closed | ||
we have a pipeline defined in the sequenceAnalysis module. I'd like to enable this pipeline to execute on 4 different input file extensions. at this point i'm open to ugly workarounds, like creating 4 distinct pipeline definitions. Here is what I have unsuccessfully tried thus far: Tried using this XML, patterned off microarray: <!--<constructor-arg>--> <!--<list>--> <!--<value>.sff</value>--> <!--<value>.fastq</value>--> <!--<value>.fasta</value>--> <!--<value>.fa</value>--> <!--<value>.fq</value>--> <!--</list>--> <!--</constructor-arg>--> <!--<constructor-arg value=".sff"/>--> If I use this, the pipeline will recognize all those extensions (ie. labkey will allow me to initiate a pipeline job). The problem is that this pipeline executes on a different server. The first step of such a pipeline involves copying the input file from the labkey server to the client server. If I use the above XML, labkey always appends '.sff' to the input file's name. It therefore fails because 'input.fastq.sff' does not exist (assuming the input file originally had a .fastq. extension). Example error log here: https://xnight.primate.wisc.edu:8443/labkey/pipeline-status/WNPRC/WNPRC_Units/Research_Services/Research_Computing/SequenceAnalysis/showFile.view?rowId=356&filename=flu.fastq.log 2. I tried duplicating the "org.labkey.api.pipeline.TaskPipelineRegistrar" class once per file extension. I gave each instance a distinct initialInputExt property. When I did this, the last instance of TaskPipelineRegistrar seemed to stomp on the first ones. The pipeline successfully worked for that file extension, but not the earlier ones. 3. I tried duplicating the org.labkey.api.pipeline.file.FileAnalysisTaskPipelineSettings block within a single TaskPipelineRegistrar. same outcome as above. I looked throughout subversion and found a few examples or other pipelines microarray and ms2 do similar things and I have a few questions: 1. microarrayContext.xml uses multiple file extensions. it's the only example I saw where a pipeline did this. it has: <bean id="tiffFileType" class="org.labkey.api.util.FileType"> <constructor-arg> <list> <value>.tiff</value> <value>.tif</value> </list> </constructor-arg> <constructor-arg value=".tiff"/> </bean> when i try something similar for sequencing, i get an error. I hit that error only b/c I am running this pipeline split between 2 servers. do existing microarray pipelines always execute on the same server as labkey? if none of these use 2 servers, then plausibly this was just never an issue before. if not, i'm confused why i get an error specifically for this pipeline. 2. in the pipeline config you originally sent, the pipeline property of the TaskPipelineRegistrar used the class "org.labkey.api.pipeline.file.FileAnalysisTaskPipelineSettings" instead of TaskPipelineSettings. The latter is used by several other pipelines. In fact ms2context.xml actually has multiple beans for org.labkey.api.pipeline.TaskPipelineSettings in the same file, which is what Josh originally suggested. I tried that, but only the last bean was recognized. What's the difference between TaskPipelineSettings and FileAnalysisTaskPipelineSettings? Are there other things I should be trying? Are there other ways to just duplicate the whole pipeline config and create one pipeline config per file extension? Thanks for any help. |
||