Running R scripts on a remote Rserve server has a number of advantages over running them on your local machine:
  • A remote Rserve server frees up resources on your local machine, which otherwise might be clogged with expensive R processes.
  • It provides faster overall results, because there is no need to recreate a new R session for each process.
  • There is no need to wait for one process to end to begin another, because LabKey Server can handle multiple connections to Rserve at one time.

Set Up

This document provides instructions for enabling LabKey Server to execute R reports against a remote Rserve instance. Below, the "Rserve machine" refers to the machine running the Rserve instance; "LabKey machine" will refer to the LabKey web server. This document assumes a working knowledge of R. This document also covers required changes to client code to take advantage of the Rserve integration features. Note that Rserve integration is currently only available as an experimental feature that must be enabled on the LabKey machine. See Experimental Features.

For illustration purposes a concrete example will be used throughout this setup guide. In particular, we assume a configuration where LabKey Server is running on a Windows PC (called "LK-PC") and the Rserve instance is running on a Mac (called "RS-MAC"). Note that setup instructions will vary depending upon the operating systems of the machines used. Values indicated by colored text are used in multiple places and must be consistent between the two machines.

Rserve Machine (RS-MAC): Install Rserve

First step is to install R if you haven’t already. Rserve uses your R installation, so any packages, environments, and libraries you have already installed as part of your R installation are used by Rserve.

Information and installation of Rserve can be found here: http://www.rforge.net/Rserve/. There is a lot of good info on that site about Rserve, so it’s worth reading through the FAQs and documentation. Note that running Rserve on a Windows machine is not advised. From the download page pick the binary that matches your OS or you can install from within R:

install.packages('Rserve',,'http://www.rforge.net/')

After installing Rserve, you may not be able to run Rserve from the shell. If you get an error about Rserve not being found, then you can either put the executable on your path or copy the executable (rserve or rserve.dbg) to your $(R_HOME)/bin directory. You can find the R home directory by launching R and typing R.home() at the prompt. A typical installation: rserve and rserve.dbg are in /library/frameworks/R.framework/resources/library/rserve/libs/x86_64. The value of R_HOME is /library/frameworks/R.framework/resources.

Run Rserve Securely

We recommend running Rserve under a user account with restricted privileges (i.e., not an administrator or root user). This will help limit the damage a malicious R script can do to the machine.

Second, we recommend that the Rserve configuration specify “auth required” and “plaintext disable”. This will prevent unauthorized users from connecting to the Rserve box in the first place. Note that the login required for Rserve may or may not be the same user account under which Rserve is run.

The Rserve configuration is loaded from an /etc/rserv.conf file. By default Rserve won’t accept connections from a different machine so you must edit/create the configuration file above.

Example rserv.conf TSV file:

remote		enable
auth required
encoding utf8
plaintext disable
pwdfile /users/shared/rserve/logins

The logins file referenced above simply has a name value pair and is located in /users/shared/rserve/logins as indicated by the pwdfile value. The contents are:

rserve_usr	
rserve_pwd

Remember the values for rserve_user and rserve_pwd.

This is the user and password that the LabKey machine will use to make a connection to Rserve. If you do not provide a user and password then you will allow any machine to make an anonymous connection to your Rserve machine. This is supported by LabKey but not recommended.

LabKey Machine (LK-PC): Setup Report and Data Shares

In the context of running R reports, LabKey Server needs access to two roots:

  1. A reports root under which temporary files are created when an R report request gets serviced
  2. A pipeline data root where any external data is read (if required)
The Rserve machine must have access to the reports root at a minimum. If the R script being executed on the Rserve machine also accesses the data files then it may need access to the pipeline data share as well. Note that the pipeline data share may be on a separate machine entirely from either the LabKey or Rserve machines.

First, create a guest user account on the LabKey machine. This is the user to whom you will grant access to these data shares. For this example, create a user RserveShare with a password RserveShare_pwd. This is the account that the remote Rserve machine will use when connecting to these shares.

The reports root on the LabKey install is $(CATALINA_HOME)tempreports_temp. $(CATALINA_HOME) refers to the c:tomcat directory. Create a share called reports_temp and give read/write access to the RserveShare account (which must be created on the LabKey web server machine). Write access is required because the Rserve machine will ultimately copy an output plot file to this temp directory.

If you need your R scripts to both read and create data files on this share, repeat the same steps and settings for the pipeline root directory and create a data share with read/write access. Also grant RserveShare access to this share.

There are many OS-specific ways to setup and secure data shares. The bottom line is that the machine running R must have access to the files on the LabKey machine’s report temp directory, and, if applicable, to your pipeline data.

Rserve Machine (RS-MAC): Connect to Report and Data Shares

Connect to the file share you created above. You need to create one “drive” for the reports_temp directory and, if your R script references pipeline data, then one for the pipeline data directory.

For the concrete example, create a volume that references the LK-PC using smb. In the finder menu, connect to smb://LK-PC. Note that this may be the ip address of LK-PC as well. Be sure to connect to the reports_temp and, if applicable, data shares using the RserveShare account and password created on LK-PC. From RS-MAC’s point of view, these shares are mounted as volumes, respectively accessed as /volumes/reports_temp and /volumes/data.

LabKey Machine (LK-PC): Enabling Scripting Using Rserve

Ensure your LabKey webserver is up and running. You’ll need admin access to your server to setup the scripting engine to use Rserve. This feature is still in the experimental stage so you need to turn the feature on first.

  1. Sign in as an admin.
  2. Go to Admin > Site > Admin Console.
  3. Click Experimental Features.
  4. Under Rserve Reports, click Enable.

Now you need to add a scripting configuration:

  1. Go to Admin > Site > Admin Console.
  2. Click Views and Scripting.
  3. If there is already an ‘R Scripting Engine’ configuration, select and delete it.
  4. Add a new R Scripting Engine configuration. The table below shows properties and sample values for the running example.
SettingSample valueDescription
machine nameRS-MACMachine name or IP address of running Rserve instance
port6311Port that the Rserve instance is listening on
Rserve data volume root/volumes/dataThe name of an optional pipeline data share as referenced by the Rserve machine. This is where data files are read in from the pipeline root, for example: /volumes/data/
Rserve report volume root/volumes/reports_tempThe name of the required reports share as referenced by the Rserve machine. This is where report output files get written: for example, /volumes/reports_temp
Rserve userRserveShareName of the user allowed to connect to an RServe instance. This user is managed by the admin of the Rserver machine.
Rserve passwordRserveShare_pwdPassword for the Rserve user

Note that LabKey Server does not currently support having both local and remote R scripting engines. If you have the Rserve Reports experimental feature turned on then all your reports will be run against Rserve.

Rserve Machine (RS-MAC): Start Your Rserve Instance

You need to start the server to accept incoming connections. You can start Rserve from your shell by typing:

rserve --no-restore --no-save --slave

Refer to the Rserve documentation for command line options. If the option is preceded by --RS-option then Rserve picks it up. If not, then the command line option is passed on to R. In the above example, the parameters tell R not to restore any previously saved session, not to save the environment on exit, and to suppress prompts and startup text.

Running the debug version of Rserve (rserve.dbg) will help you troubleshoot any connection or script problems you have. At this point, you are ready to execute R views, run R scripts, etc. All scripts will be run on your server.

Client Code Changes

R Script Changes

For the most part, an R script executing locally will execute just fine when running remotely. However, there are a few things to keep in mind: 1. There is no implicit printing or plotting. To guarantee that you write to the graphics device you must wrap these statements with print(). This is because LabKey will be using R’s source command and nested commands do not automatically print. So, instead of xyplot(..), for example, you should use print(xyplot(..));

2. If you are accessing data shares from within your R script, you cannot access them as if you were running on the LabKey machine. For parameter substitutions like ${imgout:graph.png}, LabKey will replace this parameter with a file reference relative to the /volumes/reports_temp directory you setup above. However, for referencing data pipeline files, you need to do your own file mapping. To assist with this, the prolog of your script file will contain two new values:

  • labkey.pipeline.root: the root directory as accessed by the labkey machine (LK-PC)
  • labkey.remote.pipeline.root: the root as accessed by the Rserve machine (RS-MAC).
You can use a helper function from the Rlabkey R package to create the correct remote path using these values and a fully-qualified file path. For example, if you passed in the full path to your file as a URL parameter to the reports web part, you could use the following line in your script:

rootPath <- labkey.makeRemotePath(labkey.pipeline.root, labkey.remote.pipeline.root, labkey.url.params$path);

3. If you are using R session sharing (more on that below) then you should write your scripts to take advantage of any work done in previous requests by either the same or other R scripts. For example, you could check to see if libraries have already been loaded by using a variable in the environment and then checking for the existence of that variable:

if (exists(“flowGraph.session”)) {...}.

JavaScript Changes

If you want to take advantage of R session sharing then you’ll need to acquire and pass a reportSessionId parameter into the reports web part config. Very briefly, your Javascript needs to create a session using LABKEY.Report.createSession() API. On success this function will return a data object containing a unique report session identifier that can be used in subsequent report web part invocations:

reportWebPartConfig.reportSessionId = data.reportSessionId;

All R reports run using this report session will share the same environment. When the client is done with the session then a call to LABKEY.Report.deleteSession(reportSessionId) will clean up the resources associated with the underlying R connection. Otherwise, report session ids are destroyed when the client’s session ends either by a globally configured Tomcat timeout option or when the client logs out of LabKey.

Running Rserve and Labkey on the Same Machine

You can run Rserve on the same machine as LabKey. This puts more burden on your LabKey web server but in some cases it can provide very quick response times as data does not need to be moved between machines. Following the concrete example, let’s assume we want to run everything on RS-MAC. To do this:

  1. Install Rserve but you don’t need to enable remote in your Rserv.conf file.
  2. Enable the Rserve Reporting feature as before in LabKey
  3. You don’t need to setup any data shares but you do need to ensure that whatever account you are running Rserve under has access to the data.
  4. You don’t need to translate any data pipeline paths in your R script itself
  5. Your R scripting engine configuration values would look like the following:
SettingSample valueDescription
machine namelocalhostMachine name or IP address of running Rserve instance
port6311Port that Rserve instance is listening on
Rserve data volume root The name of an optional pipeline data share as referenced by the Rserve machine. This is where data files are read in from the pipeline root, for example: /volumes/data
Rserve report volume root The name of the required reports share as referenced by the Rserve machine. This is where report output files get written: for example, /volumes/reports_temp
Rserve userRserveShareName of the user allowed to connect to an RServe instance. This user is managed by the admin of the Rserver machine.
Rserve passwordRserveShare_pwdPassword for the Rserve user

Setting Default R Engines, Local or Remote

You can register both remote and local R engines, using one or the other as desired. If two engines are registered, and a report job does not specify which to use, LabKey Server will try the local server by default. You can configure LabKey to try the remote server by default by providing a metadata XML file for the report in question. The XML file should follow this naming pattern: <R-Report-Name>.report.xml. The XML file for the script/report should include a <scriptEngine> element, as follows:

<?xml version="1.0" encoding="UTF-8"?>
<ReportDescriptor>
<description>setup the R session</description>
<reportType>
<R>
<scriptEngine remote="true"/>
<functions>
<function name="getStats"/>
</functions>
</R>
</reportType>
</ReportDescriptor>

White Listing Functions

The <functions> list above is a "white list" of allowed functions, that is, an approved list functions, to ensure that arbitrary R code cannot be invoked. If your function name is not found in the list, a ScriptException is thrown.

Direct Execution of R Functions - LABKEY.Report.executeFunction

You can use the LABKEY.Report.executeFunction API to "directly" invoke a function without the need for a backing report to execute. This is both convenient in many cases, and can save time, especially if you need to call the function multiple times within a session, because the report does not need to loaded every time you call the function.

executeFunction takes a config object with the following properties:

  • containerPath: The container in which to make the request, defaults to the current container.
  • scope: The scope to use when calling the callbacks (defaults to this).
  • functionName: The name of the function to execute.
  • reportSessionId: A valid report session returned by Report.createSession.
  • inputParams: An optional object with properties for input parameters.
  • success: A function to call if the operation is successful. The callback will receive an object with the following properties:
    • console: A string[] of information written by the script to the console.
    • errors: An array of errors returned by the script or LabKey.
    • outputParams: An array of length 1 that contains a single JSON output parameter value.
  • failure: A function to call if an error preventing script execution occurs. This function will receive one parameter which is the exception message.
Currently, executeFunction only supports a single JSON return value (although the JSON object can be arbitrarily complex).

Functions called via executeFunction must be white listed, to ensure that arbitrary R code cannot be executed. For details, see White Listing Functions above.

Example executeFunction Workflow

For example, suppose you have a report (setup.R) that performs time-consuming work once (to setup libraries, load data, etc). You also want to call the getStats method off of this report multiple times over the course of your application, but it would be expensive to reload the report every time just to call this function. This is a good opportunity to use the Report.executeFunction API. A typical workflow might look like:

  • Make sure the Rserve experimental feature is enabled.
  • Call LABKEY.Report.createSession to create a report session.
  • Call LABKEY.Report.execute with this session and call your setup.R module report. This will load the report and run it, putting all it’s work in the session passed in.
  • Call the function "getStats" via LABKEY.Report.executeFunction using the same report session. This will execute the function in the session without needing to load any reports.
In this example, the setup.R module report must declare that the getStats method is callable by the executeFunction API. The report author does this by also adding a setup.report.xml metadata file (the file name is the name of the R report + 'report.xml') and specifying the function in the <functions> element white list.

Troubleshooting

java.lang.RuntimeException: Could not connect to: rs-mac:6311

  • Can you ping “rs-mac”? i.e. is the name resolved?
  • Is rs-mac the correct machine running Rserve? if not, you’ll need to change your R scripting engine configuration setting
  • Is the Rserve instance running on rs-mac?
  • Is Rserve listening on the port 6311?

java.lang.RuntimeException: eval failed, request status: error code: 127 Error in file (filename, “r”, encoding=encoding) : cannot open the connection

  • Have you setup the data share and mounted a volume on the Rserve machine? i.e. are reports_temp and /volumes/reports_temp setup correctly?
  • Did you connect to the shares with the correct account? RserveShare

java.lang.RuntimeException: could not login to Rserve with user: foo_bar

  • Verify your R script engine configuration settings have the correct user name and password

java.lang.RuntimeException: eval failed, request status: error code: 127 …

  • This usually means a script evaluation failed. This could be a syntax error in your R script (try running it in R to see if there is an issue with your script)
  • You can also run rserve.dbg for better output on the server side (DAX-MAC) to see better error information.

javax.script.ScriptException: The report session is invalid

  • The reportSessionId you passed in is no longer valid. Did you get the reportSessionId from a call to LABKEY.Report.createSession()?
  • The web session expired out from underneath you. This could happen because the session timeout expired (default timeout is 30 minutes in tomcat) or you signed out. You’ll need to refresh the page hosting the reports web part and call LABKEY.Report.createSession() to get a new session.

This feature requires the “Rserve Reporting” experimental feature be turned on

An attempt to call LABKEY.createSession or LABKEY.deleteSession API was made against a server that does not have Rserve Reporting enabled.

Related Topics

Discussion

previousnext
 
expand all collapse all