Labkey pivot query limit

LabKey Support Forum (Inactive)
Labkey pivot query limit klutz  2016-06-03 13:32
Status: Closed
 
We are working with two fairly large data sets, one at 750 proteins and another at ~1400 genes, we are storing this data in long format and can successfully use SQL to pivot each table and create a report individually.

When I try to JOIN the two data sets I get the following error:
ExecutingSelector; SQL []; ERROR: target lists can have at most 1664 entries; nested exception is org.postgresql.util.PSQLException: ERROR: target lists can have at most 1664 entries

We want to be able to share full and integrated data sets, which would be slightly over 2000 columns, more if we want to include more. We are using Labkey version 15.3 on CentOS.

I am wondering...

1. Is this a hard limit of Postgres SQL and Labkey or is this be user error?
2. What would be the common solution/practice for data sets with more than 2000 columns in labkey?

Thank you,
Kristen Lutz

-Indoc Research
 
 
Jon (LabKey DevOps) responded:  2016-06-03 14:35
Hi Kristen,

It's a little bit of Postgres, but it is also somewhat caused by the user.

Postgres has a hard limit of 1664 columns, so if you try to go past that in anyway, you will see the same error appear.

Generally, the best way to work this is to try and reduce the number of columns as you need.

Regarding your data, do you need to have all 2000 columns listed?

Regards,

Jon
 
klutz responded:  2016-06-15 11:07
Hi Jon,

Yes, it is very important to have all 2000 columns listed in order to share the data with our researchers, and it would be very nice to have the ability to have more than 2000 actually.

-Kristen
 
Jon (LabKey DevOps) responded:  2016-06-16 00:51
Hi Kristen,

Unfortunately, this is something that is outside of our control since this is a database limit with Postgres. You could potentially have the data split between two tables to where one table has some columns and another table has the rest, then use a join to get the two tables together in a single query, but even that query would be subjected to the 1664 column limit as well since it is a postgresql limitation. The only other possibility is to switch databases and use a Microsoft SQL database instead. According to Microsoft's website, a non-wide table would only allow 1024 columns, but a wide table would be able to take 30,000 and their select statements can pull in 4069 columns.

https://msdn.microsoft.com/en-us/library/ms143432.aspx

Regards,

Jon
 
jeckels responded:  2016-06-17 09:54
Hi Kristen,

Can you describe how you're hoping to present this wide data set? Typically a 2000+ column table isn't very pleasant to scroll around inside of a web browser. Perhaps there are some other strategies that could be used to generate the SQL for smaller sets of columns, while still allowing the user to have access to all of the data.

For example, it might be possible to construct the pivot such that it shows 200 columns at a time, and allow the user to "page" horizontally across the different sets of columns.

Thanks,
Josh
 
klutz responded:  2016-06-22 09:27
Hi Josh,

We plan to create a wide data set from integrated long tables for sharing.

So all that would really be required is the download function, and the ability to add filters.

I am not sure how you can construct a table to page though, but that sounds like a good potential solution.

Best,
Kristen
 
Jon (LabKey DevOps) responded:  2016-06-24 10:44
Hi Kristen,

It sounds like a custom module would be the best option here since you would be able to code it as you need.

https://www.labkey.org/home/Documentation/wiki-page.view?name=simpleModules

To stay under the SQL column limits, you will most likely need to setup multiple tables to only hold a certain number of columns, but have each table utilize the same primary key value so a JOIN will tie the tables together.

Then with Josh's idea, your pagination would consist of querying one or more tables respectively.

A custom module will also allow you to develop your own download function as well. As for filters, the pagination query idea from above would have to incorporate the filtering as well.

Regards,

Jon