Custom DataTypes for file based assays

LabKey Support Forum (Inactive)
Custom DataTypes for file based assays Anthony Corbett  2013-04-09 09:17
Status: Closed
 
I'm currently writing a lot of file based assays to quickly allow experimental data to be enter into labkey with custom upload UIs, however, I do not want to constrain myself to only a file-based module assay solution; over time I may want to turn some of these into java based assays if the need for complexity arises. I'm worried that data migration might be an issue here, especially with regards to LSIDs and namespace prefixes for ExpData (the DataType namespace prefix).

I have found that the file based assay ExpData objects can have their LSIDs created with two different namespace prefiexes:

1. AssayController's FileUploadAction defaults the namespace prefix (datatype) of new ExpData objects to be ModuleAssayProvider.RAW_DATA_TYPE ("RawAssayData"), it is given that the form bean for this action does not contain the assayId (protocol) so there would be no way to determine the assay provider and resolve a customized dataType. By adding in the possibility of specifying the assayId in the form, a custom dataType could be resolved and the RAW_DATA_TYPE could provide backwards compatibility.

2. AssayController's SaveAssayBatchAction (for inputs and outputs) creates new ExpData objects with the namespace prefix (datatype) of AbstractAssayProvider.RELATED_FILE_DATA_TYPE ("RelatedFiles"), this happens without regard to the assay design (protocol) for which the ExpData is being created for during the save.

To help ease possible data migration between file based assays and java based assays would it be possible to add an appropriate element for [Assay]DataType to the ProviderType complex type defined in assayProvider.xsd? This way file based modules can declare their one DataTypes (and mappings to file types) per assay provider in the config.xml file:

<ap:provider xmlns:ap="http://labkey.org/study/assay/xml">
    <ap:name>Gel Card</ap:name>
    <ap:description>Gel Card measures titers for anti-A and anti-B antibodies in blood.</ap:description>
    <ap:dataType>
       <ap:namespacePrefix>GelCardImage</ap:namespacePrefix>
       <ap:role>Gel Card Image</ap:role>
       <ap:fileType>
          <ap:suffixes>
             <ap:suffix>.jpg</ap:suffix>
             <ap:suffix>.jpeg</ap:suffix>
             <ap:suffix>.tiff</ap:suffix>
          </ap:suffixes>
          <ap:defaultSuffix>.jpg</ap:defaultSuffix>
       </ap:fileType>
   </ap:dataType>
</ap:provider>

This new dataType xml bean can be converted to a DataType object and then can be used in the ModuleAssayProvider's constructor when calling super:

public ModuleAssayProvider(String name, Module module, Resource basePath, ProviderType providerConfig)
{
        super(name + "Protocol", name + "Run", DataType.fromXMLBean(providerConfig.getDataType()));
        this.name = name;
        this.module = module;
        this.basePath = basePath;

        init(providerConfig);
}

For backwards compatibility null from this configuration could be handled in the TSVAssayProvider and TsvDataHandler.DATA_TYPE could be used as it currently is as a default.


Regards,

Anthony Corbett
 
 
jeckels responded:  2013-12-02 16:37
Hi Anthony,

Sorry for the very late reply. I've implemented your suggestion, albeit with a slightly different XML schema:

<?xml version="1.0"?>
<provider xmlns="http://labkey.org/study/assay/xml">
  <name>Simple Test</name>
  <description>Simple file-based assay</description>
    <inputDataFile>
        <namespacePrefix>testPrefix</namespacePrefix>
        <role>testRole</role>
        <fileSuffix>.tsv</fileSuffix>
        <fileSuffix default="true">.txt</fileSuffix>
    </inputDataFile>
</provider>

As far as #2 goes (the related files' namespace prefix), can you give a little more detail around your scenario? In many other usages, we don't want to simply reuse the primary data file's namespace prefix (since it's a totally different kind of file). In your scenario, would simply setting a single alternative namespace prefix be sufficient? It would be fine for it to be the same as the primary data file's prefix, as long as your ExperimentDataHandler and other code would be able to make sense of everything.

Thanks,
Josh
 
Anthony Corbett responded:  2013-12-03 14:05

Josh,

Thanks for your reply. Your implementation of the XML schema looks okay. Using your XML schema my example assay prodiver XML would be:

<provider xmlns:ap="http://labkey.org/study/assay/xml">
    <name>Gel Card</name>
    <description>Gel Card measures titers for anti-A and anti-B antibodies in blood.</description>
    <inputDataFile>
       <namespacePrefix>GelCardImage</namespacePrefix>
       <role>Gel Card Image</role>
       <fileSuffix default="true">.jpg</fileSuffix >
       <fileSuffix >.jpeg</fileSuffix >
       <fileSuffix >.tiff</fileSuffix >
   </inputDataFile>
</provider>

This example is an actual use case for #1 above. I have a file based module using AssayController.FileUploadAction to upload a image file from the user's desktop to labkey. In this case the LSID's namespace for the ExpData which is created is "RawAssayData". By implementing this feature I am looking for the following behavior based on the "primary" inputDataFile defiintion:

1. When uploading a file using this FileUploadAction for a specific assay provider the file extension would be checked against the suffixes defined and would error if it doesn't match. Thus, creating a layer of validation for uploading files as inputs to a file based assay module. Though this could be implemented client side on the file input's value before making the HTTP request to this upload Action, it would prevent business logic from being implemented in two places (once in the XML and again in javascript)

2. When a file is uploaded for a file-based module assay provider the ExpData objet that is created would contain the namespace defined for the "primary" inputDataFile defined, in this case "GelCardImage". After the ExpData was created the Run FK (which created it) would be null until the file was added as output to a run of that assay provider and saved using SaveAssayBatchAction, which during that the Run FK would be added to that ExpData record.

3. As an aside, adding this feature for the FileUploadAction would even allow custom DataHandlers/Parsers to be regesitered to handle some parsing/server side processing of these uploaded files. (maybe through a spring configuration?). This idea just came to me so it might need more fleshing out.


As for #2, the use case could be either a single related data files or multiple related data files, each with a different role. I think having a more generic solution. Having something like a "relatedDataFile" XML type/element that also as namespacePrefix, role, and a list of fileSuffix-es. For all output files attached to the run during a file-based assay module's call to SaveAssayBatchAction the code would loop through these dataFile defitions and apply the correct namespace. Maybe a little tweak to your XML schema would work:

<provider xmlns:ap="http://labkey.org/study/assay/xml">
    <name>Bioanalyzer</name>
    <dataFile type="output" primary="true">
       <namespacePrefix>BioanalyzerFile</namespacePrefix>
       <role>Results</role>
       <fileSuffix default="true">.xml</fileSuffix >
       <fileSuffix>.bioA.xml</fileSuffix >
   </dataFile>
   <dataFile type="output">
       <namespacePrefix>BioanalyzerReport</namespacePrefix>
       <role>Report</role>
       <fileSuffix default="true">.pdf</fileSuffix >
       <fileSuffix>.bioA.pdf</fileSuffix >
   </dataFile>
</provider>

In this use case the primary data file for the run is the XML file and a related data file is a pdf report.

How the files make it up on the server, either picked from the FileContent WebDav widget or uploaded through the FileUploadAction, hopefully won't matter as the definition of these dataFile XML elements would allow adding these files to a run and saving the run and still do the "correct" behavior of matching the corresponding file suffix and applying the correct namespacePreix to the LSID.

I hope this gives a more detailed account of what I am looking for and not sure what would be possible.


Regards,

Anthony Corbett
 
jeckels responded:  2013-12-04 15:32
Hi Anthony,

Thanks for the additional detail. I believe I have this working, and plan to check it in to the trunk (14.1), hopefully tomorrow.

I went with a similar but slightly different schema:

<?xml version="1.0"?>
<provider xmlns="http://labkey.org/study/assay/xml">
    <name>Simple Type</name>
    <description>Simple file-based assay</description>
    <primaryDataFileType>
        <namespacePrefix>testPrefix</namespacePrefix>
        <role>testRole</role>
        <fileSuffix>.tsv</fileSuffix>
        <fileSuffix default="true">.txt</fileSuffix>
    </primaryDataFileType>
    <relatedDataFileType>
        <fileSuffix>.jpg</fileSuffix>
    </relatedDataFileType>
    <relatedDataFileType>
        <namespacePrefix>XMLPrefix</namespacePrefix>
        <role>BeingXMLy</role>
        <fileSuffix>.xml</fileSuffix>
    </relatedDataFileType>
</provider>

Note that in order to get the new behavior on assayFileUpload, you'll need to give the rowId of the assay design. Otherwise, we have no way of knowing which assay provider's data type to use. I've added a note about this in the JavaScript API docs that show sample usage of the upload action.

Thanks,
Josh