LabKey/Rserve Setup Guide: /Documentation/Archive/22.3

LabKey/Rserve Setup Guide

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

This topic provides instructions for enabling LabKey Server to execute R reports against a remote Rserve instance. Running R scripts on a remote Rserve server has a number of advantages over running them on your local machine:

A remote Rserve server frees up resources on your local machine, which otherwise might be clogged with expensive R processes.
It provides faster overall results, because there is no need to recreate a new R session for each process.
There is no need to wait for one process to end to begin another, because LabKey Server can handle multiple connections to Rserve at one time.

This topic assumes a working knowledge of R and also covers required changes to client code to take advantage of the Rserve integration features.

Topics

Set Up

Rserve Machine (RS-MAC): Install Rserve
Run Rserve Securely
LabKey Machine (LK-PC): Setup Report and Data Shares
Rserve Machine (RS-MAC): Connect to Report and Data Shares
LabKey Machine (LK-PC): Enabling Scripting Using Rserve
Rserve Machine (RS-MAC): Start Your Rserve Instance

Client Code Changes
Running Rserve and LabKey on the Same Machine
Setting Default R Engines, Local or Remote
Approved Functions List
Direct Execution of R Functions - LABKEY.Report.executeFunction
Troubleshooting

Set Up

The "Rserve machine" refers to the machine running the Rserve instance; "LabKey machine" will refer to the LabKey web server.

For illustration purposes a concrete example will be used throughout this setup guide. In particular, we assume a configuration where LabKey Server is running on a Windows PC (called "LK-PC") and the Rserve instance is running on OSX (called "RS-MAC"). Note that setup instructions will vary depending upon the operating systems of the machines used. Values indicated by colored text are used in multiple places and must be consistent between the two machines.

Rserve Machine (RS-MAC): Install Rserve

First step is to install R if you haven’t already. Rserve uses your R installation, so any packages, environments, and libraries you have already installed as part of your R installation are used by Rserve.

Information and installation of Rserve can be found here: http://www.rforge.net/Rserve/. There is a lot of good info on that site about Rserve, so it’s worth reading through the FAQs and documentation. Note that running Rserve on a Windows machine is not advised. From the download page pick the binary that matches your OS or you can install from within R:

install.packages('Rserve',,'http://www.rforge.net/')

After installing Rserve, you may not be able to run Rserve from the shell. If you get an error about Rserve not being found, then you can either put the executable on your path or copy the executable (rserve or rserve.dbg) to your $(R_HOME)/bin directory. You can find the R home directory by launching R and typing R.home() at the prompt. A typical installation: rserve and rserve.dbg are in /library/frameworks/R.framework/resources/library/rserve/libs/x86_64. The value of R_HOME is /library/frameworks/R.framework/resources.

Run Rserve Securely

We recommend running Rserve under a user account with restricted privileges (not as an administrator or root user). This will help limit the damage a malicious R script can do to the machine.

Second, we recommend that the Rserve configuration specify “auth required” and “plaintext disable”. This will prevent unauthorized users from connecting to the Rserve box in the first place. Note that the login required for Rserve may or may not be the same user account under which Rserve is run.

The Rserve configuration is loaded from an /etc/rserv.conf file. By default Rserve won’t accept connections from a different machine so you must edit/create the configuration file above.

Example rserv.conf TSV file:

remote		enable
auth		required
encoding	utf8
plaintext	disable
pwdfile		/users/shared/rserve/logins

The logins file referenced above simply has a name value pair and is located in /users/shared/rserve/logins as indicated by the pwdfile value. The contents are:

rserve_usr	
rserve_pwd

Remember the values for rserve_user and rserve_pwd.

This is the user and password that the LabKey machine will use to make a connection to Rserve. If you do not provide a user and password then you will allow any machine to make an anonymous connection to your Rserve machine. This is supported by LabKey but not recommended.

LabKey Machine (LK-PC): Setup Report and Data Shares

In the context of running R reports, LabKey Server needs access to two roots:

A reports root under which temporary files are created when an R report request gets serviced
A pipeline data root where any external data is read (if required)

The Rserve machine must have access to the reports root at a minimum. If the R script being executed on the Rserve machine also accesses the data files then it may need access to the pipeline data share as well. Note that the pipeline data share may be on a separate machine entirely from either the LabKey or Rserve machines.

First, create a guest user account on the LabKey machine. This is the user to whom you will grant access to these data shares. For this example, create a user RserveShare with a password RserveShare_pwd. This is the account that the remote Rserve machine will use when connecting to these shares.

The reports root on the LabKey install is <CATALINA_HOME>/temp/reports_temp. <CATALINA_HOME> refers to the installation of Tomcat, such as C:\labkey\apps\apache\apache-tomcat-#.#.##. Create a share called reports_temp and give read/write access to the RserveShare account (which must be created on the LabKey web server machine). Write access is required because the Rserve machine will ultimately copy an output plot file to this temp directory.

If you need your R scripts to both read and create data files on this share, repeat the same steps and settings for the pipeline root directory and create a data share with read/write access. Also grant RserveShare access to this share.

There are many OS-specific ways to setup and secure data shares. The bottom line is that the machine running R must have access to the files on the LabKey machine’s report temp directory, and, if applicable, to your pipeline data.

Rserve Machine (RS-MAC): Connect to Report and Data Shares

Connect to the file share you created above. You need to create one “drive” for the reports_temp directory and, if your R script references pipeline data, then one for the pipeline data directory.

For the concrete example, create a volume that references the LK-PC using smb. In the finder menu, connect to smb://LK-PC. Note that this may be the ip address of LK-PC as well. Be sure to connect to the reports_temp and, if applicable, data shares using the RserveShare account and password created on LK-PC. From RS-MAC’s point of view, these shares are mounted as volumes, respectively accessed as /volumes/reports_temp and /volumes/data.

LabKey Machine (LK-PC): Enabling Scripting Using Rserve

Ensure your LabKey webserver is up and running. You’ll need admin access to your server to setup the scripting engine to use Rserve.

Go to (Admin) > Site > Admin Console.
Under Configuration, click Views and Scripting.
If there is already an "R Scripting Engine" configuration, select and delete it.
Add a New Remote R Engine configuration.

Click the Remote column in the Path Mapping section to open a box for entering the appropriate location mapping.

The table below shows properties and sample values for the running example.

Setting	Sample value	Description
Name	Remote R Scripting Engine
Language	R
Language Version	3.6.1	Optional, but can be helpful to record
File Extensions	R,r
Machine Name	RS-MAC	Machine name or IP address of running Rserve instance
Port	6311	Port that the Rserve instance is listening on
Path Mapping Section	Add/Remove as needed	Provide Local and Remote paths to map
	Rserve data volume root /volumes/data	The name of an optional pipeline data share as referenced by the Rserve machine. This is where data files are read in from the pipeline root, for example: /volumes/data/
	Rserve report volume root /volumes/reports_temp	The name of the required reports share as referenced by the Rserve machine. This is where report output files get written: for example, /volumes/reports_temp
Change Password	Check to activate boxes below
Remote user	RserveShare	Name of the user allowed to connect to an RServe instance. This user is managed by the admin of the Rserver machine.
Remote password	RserveShare_pwd	Password for the Rserve user
Program Command	tools:::.try_quietly(capture.output(source("%s")))
Output File Name	${scriptName}.Rout
Site Default	Check box to make this the site default
Sandboxed	Check if this Rserve instance is considered sandboxed
Use pandoc & rmarkdown	Check to use pandoc and rmarkdown
Enabled	Check to enable this configuration

Rserve Machine (RS-MAC): Start Your Rserve Instance

You need to start the server to accept incoming connections. You can start Rserve from your shell by typing:

rserve --no-restore --no-save --slave

Refer to the Rserve documentation for command line options. If the option is preceded by --RS-option then Rserve picks it up. If not, then the command line option is passed on to R. In the above example, the parameters tell R not to restore any previously saved session, not to save the environment on exit, and to suppress prompts and startup text.

Running the debug version of Rserve (rserve.dbg) will help you troubleshoot any connection or script problems you have. At this point, you are ready to execute R views, run R scripts, etc. All scripts will be run on your server.

Client Code Changes

R Script Changes

For the most part, an R script executing locally will execute just fine when running remotely. However, there are a few things to keep in mind:

1. There is no implicit printing or plotting. To guarantee that you write to the graphics device you must wrap these statements with print(). This is because LabKey will be using R’s source command and nested commands do not automatically print. So, instead of xyplot(..), for example, you should use print(xyplot(..));

2. If you are accessing data shares from within your R script, you cannot access them as if you were running on the LabKey machine. For parameter substitutions like ${imgout:graph.png}, LabKey will replace this parameter with a file reference relative to the /volumes/reports_temp directory you setup above. However, for referencing data pipeline files, you need to do your own file mapping. To assist with this, the prolog of your script file will contain two new values:

labkey.pipeline.root: the root directory as accessed by the labkey machine (LK-PC)
labkey.remote.pipeline.root: the root as accessed by the Rserve machine (RS-MAC).

You can use a helper function from the Rlabkey R package to create the correct remote path using these values and a fully-qualified file path. For example, if you passed in the full path to your file as a URL parameter to the reports web part, you could use the following line in your script:

rootPath <- labkey.makeRemotePath(labkey.pipeline.root, labkey.remote.pipeline.root, labkey.url.params$path);

3. If you are using R session sharing (more on that below) then you should write your scripts to take advantage of any work done in previous requests by either the same or other R scripts. For example, you could check to see if libraries have already been loaded by using a variable in the environment and then checking for the existence of that variable:

if (exists(“flowGraph.session”)) {...}.

JavaScript Changes

If you want to take advantage of R session sharing then you’ll need to acquire and pass a reportSessionId parameter into the reports web part config. Very briefly, your Javascript needs to create a session using LABKEY.Report.createSession() API. On success this function will return a data object containing a unique report session identifier that can be used in subsequent report web part invocations:

reportWebPartConfig.reportSessionId = data.reportSessionId;

All R reports run using this report session will share the same environment. When the client is done with the session then a call to LABKEY.Report.deleteSession(reportSessionId) will clean up the resources associated with the underlying R connection. Otherwise, report session ids are destroyed when the client’s session ends either by a globally configured Tomcat timeout option or when the client logs out of LabKey.

Running Rserve and LabKey on the Same Machine

You can run Rserve on the same machine as LabKey. This puts more burden on your LabKey web server but in some cases it can provide very quick response times as data does not need to be moved between machines. Following the concrete example, let’s assume we want to run everything on RS-MAC. To do this:

Install Rserve but you don’t need to enable remote in your Rserv.conf file.
Enable the Rserve Reporting feature as before in LabKey
You don’t need to setup any data shares but you do need to ensure that whatever account you are running Rserve under has access to the data.
You don’t need to translate any data pipeline paths in your R script itself
Your Remote R scripting engine configuration values would look like the following:

Setting	Sample value	Description
machine name	localhost	Machine name or IP address of running Rserve instance
port	6311	Port that Rserve instance is listening on
Rserve data volume root		The name of an optional pipeline data share as referenced by the Rserve machine. This is where data files are read in from the pipeline root, for example: /volumes/data
Rserve report volume root		The name of the required reports share as referenced by the Rserve machine. This is where report output files get written: for example, /volumes/reports_temp
Rserve user	RserveShare	Name of the user allowed to connect to an RServe instance. This user is managed by the admin of the Rserver machine.
Rserve password	RserveShare_pwd	Password for the Rserve user

Setting Default R Engines, Local or Remote

You can register both remote and local R engines, using one or the other as desired. If two engines are registered, and a report job does not specify which to use, LabKey Server will try the local server by default. You can configure LabKey to try the remote server by default by providing a metadata XML file for the report in question. The XML file should follow this naming pattern: <R-Report-Name>.report.xml. The XML file for the script/report should include a <scriptEngine> element, as follows:

<?xml version="1.0" encoding="UTF-8"?>
<ReportDescriptor>
     <description>setup the R session</description>
     <reportType>
         <R>
             <scriptEngine remote="true"/>
             <functions>
                <function name="getStats"/>
             </functions>
         </R>
     </reportType>
</ReportDescriptor>

Approved Functions List

The <functions> list above is a list of allowed or approved functions, to ensure that arbitrary R code cannot be invoked. If your function name is not found in the list, a ScriptException is thrown.

Direct Execution of R Functions - LABKEY.Report.executeFunction

You can use the LABKEY.Report.executeFunction API to "directly" invoke a function without the need for a backing report to execute. This is both convenient in many cases, and can save time, especially if you need to call the function multiple times within a session, because the report does not need to loaded every time you call the function.

executeFunction takes a config object with the following properties:

containerPath: The container in which to make the request, defaults to the current container.
scope: The scope to use when calling the callbacks (defaults to this).
functionName: The name of the function to execute.
reportSessionId: A valid report session returned by Report.createSession.
inputParams: An optional object with properties for input parameters.
success: A function to call if the operation is successful. The callback will receive an object with the following properties:

console: A string[] of information written by the script to the console.
errors: An array of errors returned by the script or LabKey.
outputParams: An array of length 1 that contains a single JSON output parameter value.

failure: A function to call if an error preventing script execution occurs. This function will receive one parameter which is the exception message.

Currently, executeFunction only supports a single JSON return value (although the JSON object can be arbitrarily complex).

Functions called via executeFunction must be explicitly listed, to ensure that arbitrary R code cannot be executed. For details, see Approved Functions List above.

Example executeFunction Workflow

For example, suppose you have a report (setup.R) that performs time-consuming work once (to setup libraries, load data, etc). You also want to call the getStats method off of this report multiple times over the course of your application, but it would be expensive to reload the report every time just to call this function. This is a good opportunity to use the Report.executeFunction API. A typical workflow might look like:

Call LABKEY.Report.createSession to create a report session.
Call LABKEY.Report.execute with this session and call your setup.R module report. This will load the report and run it, putting all it’s work in the session passed in.
Call the function "getStats" via LABKEY.Report.executeFunction using the same report session. This will execute the function in the session without needing to load any reports.

In this example, the setup.R module report must declare that the getStats method is callable by the executeFunction API. The report author does this by also adding a setup.report.xml metadata file (the file name is the name of the R report + 'report.xml') and specifying the function in the <functions> element list.

Troubleshooting

java.lang.RuntimeException: Could not connect to: rs-mac:6311

Can you ping “rs-mac”? Is the name resolved?
Is rs-mac the correct machine running Rserve? If not, you’ll need to change your R scripting engine configuration setting
Is the Rserve instance running on rs-mac?
Is Rserve listening on the port 6311?

java.lang.RuntimeException: eval failed, request status: error code: 127 Error in file (filename, “r”, encoding=encoding) : cannot open the connection

Have you setup the data share and mounted a volume on the Rserve machine? Are reports_temp and /volumes/reports_temp setup correctly?
Did you connect to the shares with the correct account? RserveShare

java.lang.RuntimeException: could not login to Rserve with user: foo_bar

Verify your R script engine configuration settings have the correct user name and password

java.lang.RuntimeException: eval failed, request status: error code: 127 …

This usually means a script evaluation failed. This could be a syntax error in your R script (try running it in R to see if there is an issue with your script)
You can also run rserve.dbg for better output on the server side (DAX-MAC) to see better error information.

javax.script.ScriptException: The report session is invalid

The reportSessionId you passed in is no longer valid. Did you get the reportSessionId from a call to LABKEY.Report.createSession()?
The web session expired out from underneath you. This could happen because the session timeout expired (default timeout is 30 minutes in tomcat) or you signed out. You’ll need to refresh the page hosting the reports web part and call LABKEY.Report.createSession() to get a new session.

LabKey Support

LabKey Support