possible cross tab bugs

LabKey Support Forum (Inactive)
possible cross tab bugs Ben Bimber  2010-06-23 11:47
Status: Closed
 
I have been experimenting with cross tabs and assay data. I made a cross tab here:
https://xnight.primate.wisc.edu:8443/labkey/query/WNPRC/WNPRC_Units/Research_Services/MHC_SSP/Private/MHC_DB/executeQuery.view?schemaName=assay&query.queryName=SSP_Summary&query.Id~startswith=r950&query.PrimerPair~startswith=a

There's a couple points worth noting:

1. creating the crosstab is simple enough. however, once you define your fields and hit 'submit', labkey loads that crosstab on your entire assay domain. the concept of giving a preview is good, but the assay has 105,000 records. it eventually completed, but it would be nice if labkey remembered whatever filters were applied when you went to the 'create crosstab' page so it didnt try to load the whole thing.

2. i created the crosstab. i went to my original grid, filtered it, then tried to load the crosstab (on ~20 records). the crosstab loaded, but it did not remember the grid filters, so i got 105,000 records again. I can appreciate why filters were not applied in point 1, but it seems like they should be remembered in this situation. example here:

https://xnight.primate.wisc.edu:8443/labkey/query/WNPRC/WNPRC_Units/Research_Services/MHC_SSP/Private/MHC_DB/executeQuery.view?query.Id~startswith=r950&query.PrimerPair~startswith=a&query.queryName=SSP_Summary&schemaName=assay&query.reportId=db%3A19

3.crosstabs have an 'export to excel' button, which is a good thing. but it throws a java error when i try to export this cross tab. might have something to do with it being too large. the above URL will repro that.

Looking to the future:

Aggregating data and creating aggregate views is quite important across the board in the EHR. Crosstabs fulfill the majority of scenarios, but they are a little rough. It's a basic HTML output: no sorting, no filtering, etc. Instead of using crosstabs I'm considering writing the code to aggregate data in R, then trying to output the results into some sort of grid (either labkey or ext). While it requires that I write the code, it gives us a lot more control. That includes control over the type of aggregate function (ie. could be some oddball custom function instead of just sum, min, max), allows custom formatting (like coloring cells fitting criteria), etc.

I have spoken with some of you about possible ways to approach this in the past, but perhaps some of you could comment a little here. Assuming I create a dataframe in R with my new aggregated data, what is the best approach to try to display that in a grid? The approach some of us discussed before was along the lines of converting R->json, then posting to a custom JS page that would interpret this and display the grid. a labkey grid is probably preferable, but I believe an ext grid would be needed since those accept array stores and labkey does not.

I could also write R code that directly outputs HTML/javascript, but that seems like it would be kind of a pain unless there's a really good R package to handle that (which i have not yet found). Although I might be able to write a reusable function that accepts the dataframe and does all the work, so i at least only have to write that once.

Is there anything already in labkey that would assist this? Any other comments or ideas on how to approach that?
 
 
Peter responded:  2010-06-24 20:06
Ben,

I like the idea of connecting an output R data frame to a labkey grid or an extJs grid. right now the only data-frame oriented function that pushes a data frame from R back into labKey is the Rlabkey cover of assay.saveBatch(). so in order to use this you would need to pre-define the grid column shapes that you want to display as assay result sets. then you would save a batch into the assay results table and then select it back out into a grid. This might be ok for proof of concept but I imagine it would get pretty cumbersome to have to pre-define the grid shapes, and I also suspect the performance would be pretty slow.

if you want to look into hooking an extJS grid into the output data frame of an R function, look into the code of the Rlkabkey labkey.saveBatch.R function in the remoteapi folder. The code takes a data frame and turns it into a set of linked list objects in R, then calls the toJSON function in the rcurl library to turn it into Json. I think the format of our output json stream for a result set was modeled on the extJs grid needs, but I am not sure how close the input format is to the output format.

I'd be interested in learning more about how you'd like this to work.

Peter
 
Ben Bimber responded:  2010-06-24 20:43
hi peter,

i'm not quite sure at what level of detail you're asking, so here's a little more info from several levels:

from the end user perspective: The labkey R interface currently allows you to toggle between the source data and the R view. This is quite powerful. We'd want some mechanism for your R script to output data in a filterable/sortable grid. In most cases this data would be an aggregate view along the lines of a crosstab. The user can toggle to the source data tab, add filters, then reload the R view - this interactive part is pretty important. A one-time redirect from R to a separate javascript page that displays the report does not quite capture this same interaction.

Exactly how that works in R probably depends on what's easiest. Currently labkey R reports support multiple sections (pdf, html, console). At least from the perspective of how the interaction with R reports works right now, allowing a 'grid' section of an R report might be useful. Within R you might build your dataframe, then pass it to some function that handles creation of the grid. I guess if the R report exports HTML, then you might just need some function that accepts a dataframe and creates the javscript/HTML output.

My R knowledge isnt all that deep, so I'm probably not the best to ask on how this might actually get implemented in R. I'd be very keen in learning what it might take and trying to help with developing something.


For what it's worth, a slightly unrelated extension would be to make the whole R report a little more ajax-y. If each tab stopped triggering a reload, it would be easier to embed an R report in a page. It would also be useful if there were a better mechanism to view the source data side-by-side with the R report. For example, you might have the source data and a bar graph (or a filterable grid). As you filter your source data, you want to see how the graph changes and possibly keep trying new permutations. This is easier if you dont need to navigate and reload every time you switch between the two tabs. That's another project though.
 
Peter responded:  2010-06-25 18:22
Hi Ben,

I think I understand. But I'm unclear about one thing-- if the output of R goes to a filterable/sortable grid, you'll have two grids you are working with-- the source data and the R output. Can you filter/sort on either? Does it mean something different? can a change in the output grid affect the source grid?

I'll try to incorporate your scenario in a spec I am working on for the next phase of R - LabKey integration.
 
Ben Bimber responded:  2010-06-28 05:15
Peter,

I was assuming it pretty much had to be uni-directional with changes in source affecting the aggregate grid view (probably requiring some action that forces a re-running of the R script), but not the reverse. When some change triggers a re-running of the R script, I was also assuming that any filters or sorts on the aggregate view would be lost. That seems ok to me, but tell me if I'm setting my sights too low.

Let me try to clarify a little on something I was trying to say: I emphasized the power associated with having the source data tab plus the R view tab. The importance of this is exactly the same if your R report is creating a graph as when your R report is creating a filterable grid. The R code serves to distill the source information and create some presentation of it. Being able to quickly see the source data makes it less opaque (particularly if R performs complex calculations). It also gives the user more control - they can add new filters to the source then re-calculate the R view. The current R report already does this reasonably well. It would be a little nicer if in certain cases they could be viewed side-by-side and if the R report used for ajax, but in general it works. The reason I re-iterated this above is that it is important not to lose the ability to toggle between R view and source when thinking about how to create an R->grid api. Therefore solutions involving a redirect from the R report to some new JS page might not be ideal. hope that makes sense.