selectRows() against a large assay: /home/Support/LabKey Support Forum

selectRows() against a large assay

LabKey Support Forum (Inactive)

View Message

selectRows() against a large assay

Ben Bimber

2011-08-02 07:42

Status: Closed

I have an assay with about 100K rows. I have a JS class that creates a search panel for any query. this class is backed by a LABKEY.ext.Store. The rationale for using the store instead of one of the non-ext query APIs that is because that better handles metadata and the creation of the necessary ext inputs that the non-ext APIs. There's a number ext-based containers that need to do similar things throughout the EHR UI, and it's just easier to have a common code path for processing metadata.

The store is set to use maxRows=0, because we only want metadata, not all the rows. The actual request is using this URL:

https://ehr.primate.wisc.edu/query/WNPRC/WNPRC_Units/Research_Services/MHC_SSP/Private/MHC_DB/selectRows.view?_dc=1312250642279&schemaName=assay&query.queryName=MHC_SSP%20Data&apiVersion=9.1&query.maxRows=0

However, despite using maxRows=0, this can take 20 secs to return. I didnt get why until I looked at the response and see that it includes a bit saying something like 'total rows = 100K'. this implies it must be running something to figure out the total size. That may not be the true reason, but it's the closest thing i could figure out.

Should a call to selectRows() when using maxRow=0 actually be the same speed as without it? Are there other troubleshooting steps to figure out exactly where the bottleneck is?

Thanks.

adam responded:	2011-08-02 08:00
I would generally expect maxRows=0 to be faster than maxRows=-1, though it's up to the database to optimize this case... it's possible PostgreSQL is getting confused. It's also possible there's that LabKey is no longer translating that 0 correctly (for historical reasons, one layer in the server treats 0 as "all rows"). Your theory about row count is possible too. What are the timings with maxRows=1 and maxRows=-1?

Ben Bimber responded:	2011-08-02 09:36
using maxrows=-1 is considerably slower than maxrows=0; however, the former is still 20 secs. when datasets were switched to hard tables, there was a dramatic difference from before. the visible difference when assays were converted to hard tables was less. is there some difference between the guts of how that hard-table transition works? do assay result domains still require pulling some info out of ontology manager? i'm just confused since while this assay is large, comparable datasets load quicker than this. thanks.

adam responded:	2011-09-08 15:09
Count queries on PostgreSQL can be very slow (much slower than the query that returns the rows... esp. when you ask for 0 rows). For the generic DataRegion we're looking at doing these count queries in the background and AJAXing the results to the page when complete. You are probably seeing the same issue in the client API; see #13019. Adam

marki responded:	2011-09-08 15:26
Note that you can use the includeTotalCount config parameter on the existing selectRows API to turn off the computation of totals . https://www.labkey.org/download/clientapi_docs/javascript-api/symbols/LABKEY.Query.html#.selectRows

Ben Bimber responded:	2011-09-09 06:31
this one actually had to do with the EHR exceeding the cache for total number of containers i think. josh sorted it out and could say more. it was improved in 11.2.

jeckels responded:	2011-09-09 09:45
For the record, we were exceeding the SecurityPolicy cache based on the number of containers when we were building up the list of copied-to-study columns for the dataset. In 11.2 we raised the cache limit, which helped in other usage scenarios as well. In 11.3 I changed the code that builds the candidate list of studies to reflect the fact that it's normal to have many more containers (and therefore SecurityPolicies) than you have studies. Thanks, Josh