Suggestions for Extensible Assays

Installation Forum (Inactive)
Suggestions for Extensible Assays Ben Bimber  2009-08-29 13:48
Status: Closed
 
We are starting a project that will probably involve creating upwards of 30 custom assays for our organization. I spent this morning playing around making assays within simple modules. I'm very excited about this new functionality and believe this is exactly what labkey has been missing and should be a great way to handle this project. I realize that text-assays are very new and probably still evolving; however, there are some tweaks that could make some pretty dramatic improvements to it. My apologies if any of this already exists and I missed it.

Each assay tends to have little quirks associated with it. The existing extensible assay framework does a good job of accommodating this. The ability to define custom HTML files to be used instead of the default wizard or grids is very useful. However, if for each assay we end up replacing all the defaults with custom HTML views, we lose much of the benefit of having an assay manager. I have a few suggestions that might help further improve the extensibility of this framework. Clearly it is not possible to make an infinitely customizable import/viewing framework. I’ve tried to come up with tweaks to the behavior of the default upload/batch/run/results pages that permit these pages to be used as much as possible, without needing to replace them with custom HTML. I’m obviously biased toward the sort of data we plan to store, but I think a lot of these suggestions could be very universal. It is also quite possible that better solutions exist to address the obstacles I identify than the solutions I propose.

Import / Validation:

1. Rather than having a single upload wizard that is either used or replaced, you might want to break this into component steps. This way the user could create custom HTML to replace just the sample import or validation step while keeping the pre-defined behavior for each other step. Maybe there is the option of inserting a custom step between two pre-defined ones.

2. On import, it would be useful if we could create an HTML view that is simply appended above or below the existing views for a given import step, rather than replacing the wizard. This view might present the user with additional information, based on their imported records, that helps guide decisions. Take elispot: it might be useful to alert users if there are existing records with the same SubjectID/PeptideNumber as one of the samples being imported. That same paradigm applies to a lot of assays. In some cases this appended view might be as simple as adding custom HTML text with extra instructions unique to that assay.

3. It would be most useful if the validation/processing scripts could be defined field by field, rather than one per assay (maybe in combination with one per assay). For example, multiple assays might contain a field that holds a DNA sequence. This field gets the same validation/processing steps for each assay that imports DNA. We could reference the same script each time, rather than continually remaking it.

4. In the XML that defines an assay, we should be able to specify that a field gets hidden at import (applies to batch, run and result fields). This means that when the user downloads the import excel sheet, columns for these field(s) do not exist. This field may or may not have a default value. These sort of fields could include status flags or hold other information that is only added or used after a record is in the system.

5. Text-based assay / controlling allowable values for a field. There are many cases where we want to restrict allowable values for a field. If we’re using text-based assays, we’d want some method to define the list of allowable values within the text-based assay design (or at least define a list and establish the foreign key). this way when we export the assay this information is included. It would be best if these allowable values were created as a list so they could be edited in the future. However, if they are only editable by changing text or XML files within the assay design that’s ok too.

6. There are cases when we need to retain the ability to define the value of a field uniquely for each sample, but the vast majority of the time this field is the same for all samples within that run. It would be nice if there was an option (that could be enabled/disabled for a field) allowing that field to be filled-out alongside the run fields. If the user enters a value, this value would be assigned to each sample of that run. In this case, there would not be a column for this field when the user downloads the excel import sheet. When creating a run, the user should have the option of selecting something like ‘I need to define this individually for each sample’, in which case the excel sheet contains that column. Alternately, maybe we have the ability to define a default value for a sample-field when creating a run. If the column in the excel sheet is left blank, the default value is used. If the user enters a value for that sample, then this value is used.

7. Single sample import. There can be instances when a user will want to import a single sample into an assay. Rather than forcing the user through the traditional upload wizard, ideally they could hit an ‘import single record’ button, which gives them a web form similar to ‘insert new’ for any other list. In this view, the batch, run and results fields are all shown, completed, then imported. They would end up creating a run with one sample without actually knowing it.

Batch, Run and Results Views:

1. ‘Replace default view with a query’. Once assay data is in the system, you can view grids of assay data by batch, run or results. You can define the default view for each of these grids, which is important because the view people want to see if often not the same as the underlying table. People will want to hide some fields, use foreign keys to pull in other values, etc. An important piece this is currently missing is the ability to add calculated fields. For example, we might want to write an SQL expression that calculates a new value based on fields in that sample (the raw data may not reflect what people want to interact with). Or we might want to write an SQL expression that returns a human-friendly value like ‘positive’ or ‘negative’ based on calculations using fields in that record. A simple solution to this might be to allow the default view for any table to be replaced by an SQL query. Technically we can already create a query from a table; however, this gets kinda clunky. Rather than writing a new ‘batches.html’, ‘run.html’ and ‘’results.html’ for each assay, it would be simplest if we could just define which query is used for that page.

2. ‘Actions’. There are any number of things that people might want to do to or do with assay data once in the system, specific to that particular assay. We handle this in our existing system with an idea I took from Geospiza’s Finch. When looking at a grid of assay data, users can check one or more records, then pick from a pull-down menu which gives a list of actions (see attached screenshot ‘actions.jpg’). Each of these actions takes the user to a custom HTML page that does something to the checked records or provides some sort of customization visualization based on them. To give examples: for sequence data, I could write an HTML/javascript page that exports checked records as a FASTA file, instead of CSV. I might have a page that allows batch editing of a specific field. In the attached screenshot, I have an action called ‘mark removed’. ‘Removed’ is a field of that assay with a default value of 0. The response page simply changes this value on selected records to 1. I might have an assay in which I pick some records, then select the action ‘compare results’. The corresponding page performs some calculations on these records then presents the user with a custom output. This sort of framework creates a really flexible interface for working with assay data (or any list really). It seems that within a simple module there would be an assay/actions/ folder (or the actions could be defined using XML). Any HTML page within the folder would appear as an action for that assay. As above, I think this addition could save a whole lot of cases where users otherwise need to create unique ‘batches.html’, ‘run.html’ and ‘’results.html’ files.

3. ‘Display as URL’. Pretty much any case where a field is a foreign key, that field should be a link to the corresponding table. This sort of behavior is a huge benefit when an organization has multiple assays or types of data in labkey. For example, in the ELISPOT grid, the peptide field should automatically link to the peptides table, providing the user with quick access to details on that record. If this behavior is not going to be the default for batch/run/results grids, then we should have the option of enabling it during the assay design. Likewise, we might want to have a given field displayed as a link to some other URL. For example, if a field is a Genbank accession number, we might want it to link directly to genbank. As above, it would be great if we could specify this in the XML that defines that assay.

General:

1. When defining an assay, we create XML files for batch, run and results fields. Each field has something like this:
<exp:PropertyDescriptor>
    <exp:Name>TimePoint</exp:Name>
    <exp:Required>true</exp:Required>
    <exp:RangeURI>http://www.w3.org/2001/XMLSchema#dateTime</exp:RangeURI>
</exp:PropertyDescriptor>

It would be great if a host of other options can get specified here (or in some other XML file). The extra properties could dictate behavior ranging from validation to display behavior in a gridview. Most of the things I suggest above could be properties defined in something like this. I realize it is not possible to have infinite customization, but the advantage of building as many options as possible at this level is that we take advantage of professionally developed code and need to create/maintain as little redundant code as possible.

I don’t know how complicated this is, but for things like display behavior, it would be even better if any query based on this assay could inherit the attributes of the fields is uses. For example, if we define that an assay field should be displayed as a URL in a grid, any query that uses this field will also display it as a URL unless this property is changed within that query’s XML. If this becomes complicated, as long as those same properties can be defined using XML, it should be ok.

General Suggestions For Assays/Lists:

1. For a view, it would be nice if we could specify the default page size.

2. We should be able to define whether a field is editable or read-only. Currently we can only specify whether the entire assay’s data can be edited or not. There’s lot of cases where we’d want to permit some values to be changed (assuming a user has permissions), but other fields should never be edited.

3. After CSV import, if a record fails validation labkey currently does not provide a lot of information. If I import a file with 50 rows and only 1 of these has an invalid date, it still says ‘date must be of type DateTime’. Especially as validation becomes more complicated, it would be far more helpful if labkey displayed a grid with the contents of the problem row(s), indicating the cell(s) that failed and why. If the validation triggered warnings only (meaning the user can choose to import anyway) perhaps it makes sense to display the grid of problem rows, indicating the warnings, but to have a button allowing import to proceed.
 
 
kevink responded:  2009-08-30 15:20
> We are starting a project that will probably involve creating upwards of 30 custom assays for our organization. I spent this morning playing around making assays within simple modules. I'm very excited about this new functionality and believe this is exactly what labkey has been missing and should be a great way to handle this project. I realize that text-assays are very new and probably still evolving; however, there are some tweaks that could make some pretty dramatic improvements to it. My apologies if any of this already exists and I missed it.
>

Excellent! I'm glad that you're finding the file-based assays useful. We're also excited about the file-based assays and the ability to quickly create custom views of the data. You input directly affects how we prioritize our features so thanks for writing up your thoughts! I'll do my best to answer all of your questions below.

>
> Import / Validation:
>
> 1. Rather than having a single upload wizard that is either used or replaced, you might want to break this into component steps. This way the user could create custom HTML to replace just the sample import or validation step while keeping the pre-defined behavior for each other step. Maybe there is the option of inserting a custom step between two pre-defined ones.
>
> 2. On import, it would be useful if we could create an HTML view that is simply appended above or below the existing views for a given import step, rather than replacing the wizard. This view might present the user with additional information, based on their imported records, that helps guide decisions. Take elispot: it might be useful to alert users if there are existing records with the same SubjectID/PeptideNumber as one of the samples being imported. That same paradigm applies to a lot of assays. In some cases this appended view might be as simple as adding custom HTML text with extra instructions unique to that assay.
>

These are good ideas. Plugging in a custom page for the individual steps would be useful and keep consistency with the current assays. We considered having each part sparately replaceable, but for simplicity just went for the all or nothing approach in the first release.

>
> 3. It would be most useful if the validation/processing scripts could be defined field by field, rather than one per assay (maybe in combination with one per assay). For example, multiple assays might contain a field that holds a DNA sequence. This field gets the same validation/processing steps for each assay that imports DNA. We could reference the same script each time, rather than continually remaking it.
>

Yes, currently the scripts are only allowed at the assay level rather than the field level. Karl Lum may be able to add more details about the validation script feature. To add non-script field validators, add a 'PropertyValidator' element to the 'PropertyDescriptor' element in the domain xml file. For example, to add a a range validator on an integer type:

  <exp:PropertyDescriptor>
    <exp:Name>DoubleData</exp:Name>
    <exp:RangeURI>http://www.w3.org/2001/XMLSchema#double</exp:RangeURI>
    <exp:DefaultType>EditableDefault</exp:DefaultType>
    <exp:DefaultValue>2.0</exp:DefaultValue>
    <exp:PropertyValidator>
      <exp:Name>in range</exp:Name>
      <exp:Description>Check value is greater than 0 and less than 10</exp:Description>
      <exp:TypeURI>urn:lsid:labkey.com:PropertyValidator:range</exp:TypeURI>
      <exp:Expression>~gt=0&amp;~lt=10</exp:Expression>
      <exp:ErrorMessage>Uhoh! Value is out of range!</exp:ErrorMessage>
    </exp:PropertyValidator>
  </exp:PropertyDescriptor>

Or add a regex validator to a text type:

  <ns:PropertyDescriptor>
    <ns:Name>FieldWithA</ns:Name>
    <ns:RangeURI>http://www.w3.org/2001/XMLSchema#string</ns:RangeURI>
    <ns:PropertyValidator>
      <ns:Name>Check A</ns:Name>
      <ns:Description>Check value starts with the letter 'a'</ns:Description>
      <ns:TypeURI>urn:lsid:labkey.com:PropertyValidator:regex</ns:TypeURI>
      <ns:Expression>^a</ns:Expression>
      <ns:ErrorMessage>Uhoh! Does not start with 'a'.</ns:ErrorMessage>
    </ns:PropertyValidator>
  </ns:PropertyDescriptor>


>
> 4. In the XML that defines an assay, we should be able to specify that a field gets hidden at import (applies to batch, run and result fields). This means that when the user downloads the import excel sheet, columns for these field(s) do not exist. This field may or may not have a default value. These sort of fields could include status flags or hold other information that is only added or used after a record is in the system.
>

There is a hidden bit you can set on a PropertyDescriptor. However, this just marks the column as "not in the default grid view" rather than hiding it from the input excel sheet. As an alternative, you can set properties on the batch and run objects in the json. For example, in your custom upload.html page, you can set a property on the batch object:

    LABKEY.page.batch.properties.UploadInProgress = true;

Or on the run objects:

    LABKEY.page.batch.runs[0].properties.DilutionFactor = 20;

>
> 5. Text-based assay / controlling allowable values for a field. There are many cases where we want to restrict allowable values for a field. If we’re using text-based assays, we’d want some method to define the list of allowable values within the text-based assay design (or at least define a list and establish the foreign key). this way when we export the assay this information is included. It would be best if these allowable values were created as a list so they could be edited in the future. However, if they are only editable by changing text or XML files within the assay design that’s ok too.
>

Yes, unfortunately we don't have an enumeration field validator type. Nor do we expose lookups from domain PropertyDescriptors when using the file-based assays. Adding lookups from file-based assay domains is one of the features we're planning for 9.3. In the mean time, you can use the regular expression validator to enfore a text value or manually perform the lookup of a list using LABKEY.Query.

>
> 6. There are cases when we need to retain the ability to define the value of a field uniquely for each sample, but the vast majority of the time this field is the same for all samples within that run. It would be nice if there was an option (that could be enabled/disabled for a field) allowing that field to be filled-out alongside the run fields. If the user enters a value, this value would be assigned to each sample of that run. In this case, there would not be a column for this field when the user downloads the excel import sheet. When creating a run, the user should have the option of selecting something like ‘I need to define this individually for each sample’, in which case the excel sheet contains that column. Alternately, maybe we have the ability to define a default value for a sample-field when creating a run. If the column in the excel sheet is left blank, the default value is used. If the user enters a value for that sample, then this value is used.
>

This is an interesting idea. You may be able to simulate this by creating a run property and using javascript in the upload.html to ensure that the result field either has a value or takes the value from the run property.

>
> 7. Single sample import. There can be instances when a user will want to import a single sample into an assay. Rather than forcing the user through the traditional upload wizard, ideally they could hit an ‘import single record’ button, which gives them a web form similar to ‘insert new’ for any other list. In this view, the batch, run and results fields are all shown, completed, then imported. They would end up creating a run with one sample without actually knowing it.
>

That sounds like a nice and easy way to add simple data. For 9.3, I will be working on auto-generating Ext html forms from a domain. This would be a good use case.

>
> Batch, Run and Results Views:
>
> 1. ‘Replace default view with a query’. Once assay data is in the system, you can view grids of assay data by batch, run or results. You can define the default view for each of these grids, which is important because the view people want to see if often not the same as the underlying table. People will want to hide some fields, use foreign keys to pull in other values, etc. An important piece this is currently missing is the ability to add calculated fields. For example, we might want to write an SQL expression that calculates a new value based on fields in that sample (the raw data may not reflect what people want to interact with). Or we might want to write an SQL expression that returns a human-friendly value like ‘positive’ or ‘negative’ based on calculations using fields in that record. A simple solution to this might be to allow the default view for any table to be replaced by an SQL query. Technically we can already create a query from a table; however, this gets kinda clunky. Rather than writing a new ‘batches.html’, ‘run.html’ and ‘’results.html’ for each assay, it would be simplest if we could just define which query is used for that page.
>

In previous releases, you could name a query the same as a table name and it would replace the default grid. However, this became confusing to use in practice since you were never sure if you were referencing the custom query or the table. Perhaps, now that we have a use case we can reconsider that decision.

You can add some metadata to an existing table. From the query start page, click on a schema name, then click on '[customize display]' link. By wrapping an existing column you can create a lookup to a list or to another table.

>
> 2. ‘Actions’. There are any number of things that people might want to do to or do with assay data once in the system, specific to that particular assay. We handle this in our existing system with an idea I took from Geospiza’s Finch. When looking at a grid of assay data, users can check one or more records, then pick from a pull-down menu which gives a list of actions (see attached screenshot ‘actions.jpg’). Each of these actions takes the user to a custom HTML page that does something to the checked records or provides some sort of customization visualization based on them. To give examples: for sequence data, I could write an HTML/javascript page that exports checked records as a FASTA file, instead of CSV. I might have a page that allows batch editing of a specific field. In the attached screenshot, I have an action called ‘mark removed’. ‘Removed’ is a field of that assay with a default value of 0. The response page simply changes this value on selected records to 1. I might have an assay in which I pick some records, then select the action ‘compare results’. The corresponding page performs some calculations on these records then presents the user with a custom output. This sort of framework creates a really flexible interface for working with assay data (or any list really). It seems that within a simple module there would be an assay/actions/ folder (or the actions could be defined using XML). Any HTML page within the folder would appear as an action for that assay. As above, I think this addition could save a whole lot of cases where users otherwise need to create unique ‘batches.html’, ‘run.html’ and ‘’results.html’ files.
>

That's a good idea -- I've often wanted something similar myself when writing file-based assays. It is possible to write javascript to inject a new button menu into the button bar for a grid but it takes some work to do.

>
> 3. ‘Display as URL’. Pretty much any case where a field is a foreign key, that field should be a link to the corresponding table. This sort of behavior is a huge benefit when an organization has multiple assays or types of data in labkey. For example, in the ELISPOT grid, the peptide field should automatically link to the peptides table, providing the user with quick access to details on that record. If this behavior is not going to be the default for batch/run/results grids, then we should have the option of enabling it during the assay design. Likewise, we might want to have a given field displayed as a link to some other URL. For example, if a field is a Genbank accession number, we might want it to link directly to genbank. As above, it would be great if we could specify this in the XML that defines that assay.
>

I agree that lookups should be displayed as links.

>
> General:
>
> 1. When defining an assay, we create XML files for batch, run and results fields. Each field has something like this:
> <exp:PropertyDescriptor>
>     <exp:Name>TimePoint</exp:Name>
>     <exp:Required>true</exp:Required>
>     <exp:RangeURI>http://www.w3.org/2001/XMLSchema#dateTime</exp:RangeURI>
> </exp:PropertyDescriptor>
>
> It would be great if a host of other options can get specified here (or in some other XML file). The extra properties could dictate behavior ranging from validation to display behavior in a gridview. Most of the things I suggest above could be properties defined in something like this. I realize it is not possible to have infinite customization, but the advantage of building as many options as possible at this level is that we take advantage of professionally developed code and need to create/maintain as little redundant code as possible.
>
> I don’t know how complicated this is, but for things like display behavior, it would be even better if any query based on this assay could inherit the attributes of the fields is uses. For example, if we define that an assay field should be displayed as a URL in a grid, any query that uses this field will also display it as a URL unless this property is changed within that query’s XML. If this becomes complicated, as long as those same properties can be defined using XML, it should be ok.
>

Yes, one of the challenges that we face is deciding the important set of properties that belong in the xml file. In addition, there is another xml format for the table based schemas (tableInfo.xsd) that has a different but similar set of properties (including a URL property.) We will be merging these two formats eventually so only one format is necessary.

>
> General Suggestions For Assays/Lists:
>
> 1. For a view, it would be nice if we could specify the default page size.
>

That's an interesting idea. The customize view page could let you specify that option.

>
> 2. We should be able to define whether a field is editable or read-only. Currently we can only specify whether the entire assay’s data can be edited or not. There’s lot of cases where we’d want to permit some values to be changed (assuming a user has permissions), but other fields should never be edited.
>

Yes, editable and readonly are two properties from tableInfo.xsd column that we'd like to see merged with the expTypes.xsd PropertyDescriptor.

> 3. After CSV import, if a record fails validation labkey currently does not provide a lot of information. If I import a file with 50 rows and only 1 of these has an invalid date, it still says ‘date must be of type DateTime’. Especially as validation becomes more complicated, it would be far more helpful if labkey displayed a grid with the contents of the problem row(s), indicating the cell(s) that failed and why. If the validation triggered warnings only (meaning the user can choose to import anyway) perhaps it makes sense to display the grid of problem rows, indicating the warnings, but to have a button allowing import to proceed.
>

I've been frustrated by this as well.

I like your suggestions and I'll file bugs for the various suggestions in this posting. We very much want to make file-based assays something anyone can create and customize easily. Please keep sending us suggestions like this.
 
Ben Bimber responded:  2009-08-30 16:20
Thanks for the reply. I'm happy to give a few more:

Handling of foreign keys on import/update:

When an organization has multiple types of data managed in labkey, one of the big benefits to having everything in one system is validating the data sources across each other. scientists waste a lot of time massaging data, doing things like keeping track of allele names, peptide numbers, etc. if a record in an assay refers to a peptide, an allele, a DNA oligo (or whatever is appropriate for your assay) your lab should probably be maintaining a master list for these resources along with additional information on them. labkey does this in some assays like ELISPOT, which is tightly linked to peptides. the reason these master lists rarely exist in labs is b/c it's a pain to maintain them and no one wants to do it. plus doing it in excel really isnt that feasible.

if import is designed well, labkey could help solve this. designing a system the maintains referential integrity this without irritating users is important. for example, ELISPOT data references a peptides table. we want to make sure that every peptide number in the ELISPOT table actually has a corresponding record in the peptide table (plus the additional information on that peptide). people will tend to get lazy and not complete this unless given a push. however, simply rejecting ELISPOT records with a non-exisitng peptide_id isnt terribly user friendly. Many instances when a user tries to import ELISPOT data with a non-existant peptide will be typos. in this situation, a warning should be thrown, giving the user a chance to correct the problem. ideally maybe they'd be shown a pull-down with allowable values for that table (the list would be long).

However, there will be cases when a user is importing a novel peptide that should get a record. Ideally, if this error is thrown on CSV import, the system would give the user an option like 'add peptide record', which would produce a pop-up allowing the user to create a new peptide record. this is friendly to users and encourages that they keep a complete record in the system. basically i see several situations that should be allowed, each appropriate in difference circumstances:

1. reject rows where the lookup does not have a corresponding value
2. warn user but allow them to import anyway if they choose
3. warn user, give option of adding new record to the lookup table
3B. this record is added but is 'provisional'. i dont think this is terribly important for us, but i've been told some clients want it. a simple method of handling this situation might be to create some sort of flag in the lookup table called 'approved' with a default value of 0. anything imported in this manner gets flagged as unapproved.
 
Ben Bimber responded:  2009-08-31 09:20
Kevin,

I attached a powerpoint going into more detail on suggestions for the import steps. I think there are some general feature you could add to the upload process (like what i call instruments) that greatly add flexibility while allowing users to keep the default upload code more often. I'm happy to elaborate more on anything if it helps.

I also put some suggestions for experiment / assay UI. I dont know where it's most appropriate to post that.

-Ben