pipeline configuration when trying to accept multiple file extensions

LabKey Support Forum (Inactive)
pipeline configuration when trying to accept multiple file extensions Ben Bimber  2011-04-26 08:13
Status: Closed
 
we have a pipeline defined in the sequenceAnalysis module. I'd like to enable this pipeline to execute on 4 different input file extensions. at this point i'm open to ugly workarounds, like creating 4 distinct pipeline definitions. Here is what I have unsuccessfully tried thus far:

Tried using this XML, patterned off microarray:

        <!--<constructor-arg>-->
            <!--<list>-->
                <!--<value>.sff</value>-->
                <!--<value>.fastq</value>-->
                <!--<value>.fasta</value>-->
                <!--<value>.fa</value>-->
                <!--<value>.fq</value>-->
            <!--</list>-->
        <!--</constructor-arg>-->
        <!--<constructor-arg value=".sff"/>-->

If I use this, the pipeline will recognize all those extensions (ie. labkey will allow me to initiate a pipeline job). The problem is that this pipeline executes on a different server. The first step of such a pipeline involves copying the input file from the labkey server to the client server. If I use the above XML, labkey always appends '.sff' to the input file's name. It therefore fails because 'input.fastq.sff' does not exist (assuming the input file originally had a .fastq. extension). Example error log here:

https://xnight.primate.wisc.edu:8443/labkey/pipeline-status/WNPRC/WNPRC_Units/Research_Services/Research_Computing/SequenceAnalysis/showFile.view?rowId=356&filename=flu.fastq.log

2. I tried duplicating the "org.labkey.api.pipeline.TaskPipelineRegistrar" class once per file extension. I gave each instance a distinct initialInputExt property. When I did this, the last instance of TaskPipelineRegistrar seemed to stomp on the first ones. The pipeline successfully worked for that file extension, but not the earlier ones.

3. I tried duplicating the org.labkey.api.pipeline.file.FileAnalysisTaskPipelineSettings block within a single TaskPipelineRegistrar. same outcome as above.



I looked throughout subversion and found a few examples or other pipelines microarray and ms2 do similar things and I have a few questions:

1. microarrayContext.xml uses multiple file extensions. it's the only example I saw where a pipeline did this. it has:

    <bean id="tiffFileType" class="org.labkey.api.util.FileType">
        <constructor-arg>
            <list>
                <value>.tiff</value>
                <value>.tif</value>
            </list>
        </constructor-arg>
        <constructor-arg value=".tiff"/>
    </bean>

when i try something similar for sequencing, i get an error. I hit that error only b/c I am running this pipeline split between 2 servers. do existing microarray pipelines always execute on the same server as labkey? if none of these use 2 servers, then plausibly this was just never an issue before. if not, i'm confused why i get an error specifically for this pipeline.

2. in the pipeline config you originally sent, the pipeline property of the TaskPipelineRegistrar used the class "org.labkey.api.pipeline.file.FileAnalysisTaskPipelineSettings" instead of TaskPipelineSettings. The latter is used by several other pipelines. In fact ms2context.xml actually has multiple beans for org.labkey.api.pipeline.TaskPipelineSettings in the same file, which is what Josh originally suggested. I tried that, but only the last bean was recognized. What's the difference between TaskPipelineSettings and FileAnalysisTaskPipelineSettings?

Are there other things I should be trying? Are there other ways to just duplicate the whole pipeline config and create one pipeline config per file extension?

Thanks for any help.
 
 
jeckels responded:  2011-04-26 10:14
Hi Ben,

I just tried modifying the sequenceanalysisContext.xml that's checked in to the trunk. I was able to get actions wired up for the different extensions by using multiple FileAnalysisTaskPipelineSettings within a single TaskPipelineRegistrar, although I'm not set up at the moment to try actually running it.

Can you try the attached version of the file?

Thanks,
Josh
 
Ben Bimber responded:  2011-04-26 10:48
thanks josh. i had to modify the analyzeURL property to reflect the new names, but beyond that it seems to be working.