R API and executeSQL Ben Bimber  2010-03-10 09:59
Status: Closed
 
I am trying to query lk from an R script and am receiving an error from the results. below is the R code:

------

library(Rlabkey)

pedigree <- labkey.executeSql(
baseUrl="http://localhost:8080/labkey",
folderPath="/WNPRC/EHR",
schemaName="study",
sql = "select p.sex, p.id FROM study.pedigree p ",
#showHidden = FALSE
)
#pedigree.sire,
pedigree;

----

when i run this query and try to select both columns in my SQL statement, I get this error:

> pedigree <- labkey.executeSql(
+ baseUrl="http://localhost:8080/labkey",
+ folderPath="/WNPRC/EHR",
+ schemaName="study",
+ sql = "select p.sex, p.id FROM study.pedigree p ",
+ #showHidden = FALSE
+ )
Error in names(hold.dat) <- names(decode$rows[[1]]) :
  'names' attribute [3] must be the same length as the vector [1]
Calls: labkey.executeSql -> makeDF
Execution halted


When I try to run this query to return only the ID column, it works and returns this:

> #pedigree.sire,
> pedigree;
         Id
1 rhaa35
2 rhae81
3 rham30
4 rhao45
5 rhau10
6 rhau34
7 rhan92
8 rh-z19
.....




The output is normal. However, when I try to return p.sex (which is a lookup), I get a list for each record:



+ sql = "select p.sex FROM study.pedigree p ",
+ #showHidden = FALSE
+ )
> #pedigree.sire,
> pedigree;
                                                                 Sex
1 list("NA")
2 list("NA")
3 list("NA")
4 list("NA")
5 list("f", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=f")
6 list("f", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=f")
7 list("f", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=f")
8 list("f", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=f")
9 list("f", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=f")
10 list("m", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=m")
11 list("f", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=f")
12 list("m", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=m")
13 list("m", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=m")
14 list("m", "/labkey/list/WNPRC/EHR/details.view?listId=24&pk=m")


I'm assuming this has something to do with the fact that this field is a lookup. i should point out that study.pedigree is a query, not a dataset. Here is the SQL:

SELECT

d.id,
d.dam,
d.sire,

--TODO: handle sex better
CASE d.sex
  WHEN 'm' THEN 1
  WHEN 'f' THEN 2
  WHEN 'e' THEN 1
  WHEN 'c' THEN 2
  WHEN 'v' THEN 1
END AS sex,

FROM study.demographics d

Does anyone have a thought why the first executeSQL() would not be able to produce a dataframe? Do I need to edit the metadata of my query? Thanks in advance.
 
 
Ben Bimber responded:  2010-03-10 10:21
one more odd behavior:

the query study.pedigree has 4 columns: id, dam, sire and sex (see SQL statement above). if i run this R code (note, it only tries to query the p.sire):

library(Rlabkey)

pedigree <- labkey.executeSql(
baseUrl="http://localhost:8080/labkey",
folderPath="/WNPRC/EHR",
schemaName="study",
sql = "select p.sire FROM study.pedigree p ",
#showHidden = FALSE
)
pedigree;

----

I get an error saying 'p.sire does not exist'. To see whether this was related to using a query instead of a dataset, I tried this, which uses the demographics dataset, not the pedigree query:

library(Rlabkey)

pedigree <- labkey.executeSql(
baseUrl="http://localhost:8080/labkey",
folderPath="/WNPRC/EHR",
schemaName="study",
sql = "select p.id, p.dam, p.sire FROM study.demographics p ",
#showHidden = FALSE
)
pedigree;

----

this does return the expected dataframe,including sire. however, if i include the field 'sex', it throws the same error as above.
 
Peter responded:  2010-03-10 12:29
Ben,

I've been working on the Rlabkey package for the past couple months and have fixed many bugs related to null handling and also the extra link info that comes from the server on a lookup field. We haven't released this new version to CRAN yet (we are waiting on a change to rjson that will give us a bid speed boost) but I have attached the current package for you to install as a zip file into R.

There are also a set of new apis that make it easier for a R user to work with the data in labkey; see getSession and related apis.
 
Ben Bimber responded:  2010-03-10 16:19
Peter,

thanks. the new R package fixed that issue of additional data being returned with lookup columns. however, the issue i'm having with the sire field not being return (see post above) still exists. it switched from executeSQL() to labkey.selectRows() and it seems to work ok now though.
 
Peter responded:  2010-03-10 18:13
did you try setting showHidden=TRUE ?

One difference between executeSql and selectRows is that the latter uses the default view for the query as the source of its select.

If neither of these suggestions helps, I'd be happy to look into it if there is a way you can export a study containing your demographic data set and send it to me.
 
Ben Bimber responded:  2010-03-11 05:14
Hi Peter,

Thanks for the offer. Shortly our test server should be available and I will send you the data and a link to see the report. In the meantime, I was able to complete the report using labkey.selectRows(). Besides the odd behavior of that one query, everything else seems to be working fine.
 
Ben Bimber responded:  2010-03-11 11:25
peter,

i think i see what is happening - a dumb mistake on my part. there's a conflict between the name of a dataset and a query. on had the field I wanted, and one did not. rename the query and it's fixed. i'll post more if i find any other problems, but there's nothing i see right now.