External Schemas and Data Sources

An externally-defined schema can provide access to tables that are managed on any PostgreSQL database, or with Premium Editions of LabKey Server, any Microsoft SQL Server, Oracle, MySQL, Amazon Redshift, or SAS/SHARE database server available at your institution.

Site Administrators can make externally-defined schemas accessible within the LabKey interface, limiting access to authorized users and, if desired, a subset of tables within each schema. Once a schema is accessible, externally-defined tables become visible as tables within LabKey and LabKey applications can be built using these tables.

This document explains how to configure external data sources and load schemas from those data sources.


Externally defined schemas give administrators access to edit external tables within the LabKey interface, if the schema has been marked editable and the table has a primary key. XML meta data can also be added to specify formatting or lookups. Folder-level security is enforced for the display and editing of data contained in external schemas.

You can also pull data from an existing LabKey schema in a different folder by creating a "linked schema". You can choose to expose some or all of the tables from the original schema. The linked tables and queries may be filtered such that only a subset of the rows are shown. For details see Linked Schemas and Tables.

Note: you cannot create joins across data sources, including joins between external and internal schema on LabKey Server. As a work around, use an ETL to copy the data from the external data source(s) into the main internal data source. Once all of the data is in the main data source, you can create joins on the data.

Usage Scenarios

  • Display, analyze, and report on any data stored on any database server within your institution.
  • Build LabKey applications using external data without relocating the data.
  • Create custom queries that join data from standard LabKey tables with user-defined tables in the same database.
  • Publish SAS data sets to LabKey Server, allowing secure, dynamic access to data sets residing in a SAS repository.
Changes to the data are reflected automatically in both directions. Data rows that are added, deleted, or updated from either the LabKey Server interface or through external routes (for example, external tools, scripts, or processes) are automatically reflected in both places. Changes to the table schema are not immediately reflected, see below.)

Avoid: LabKey strongly recommends that you avoid defining the core LabKey Server schemas as external schemas. There should be no reason to use a LabKey schema as an external schema and doing so invites problems during upgrades and can be a source of security issues.

Data Source Configuration

Before you define an external schema in LabKey server, you must first configure a new data source resource in LabKey Server. Typically this is done by editing the labkey.xml configuration file, and in some cases, other steps. See the following topics for the preliminary configuration steps, depending on the type of external data source you are using:

Load an External Schema/Data Source

When an externally defined schema (i.e., a set of data tables) is added to LabKey Server, the tables are surfaced in the Query Schema Browser.

To load an externally-defined schema into LabKey Server:

  • Click on the folder/project where you would like to place the schema.
  • Select (Admin) > Developer Links > Schema Browser.
  • Click Schema Administration.
  • Click New External Schema.

Fill out the following fields:

  • Schema Name – Required.
    • Name of the schema as you want to see it within LabKey Server.
    • When you select the Database Schema Name two items below this, this field will default to match that name, but you can specify an alternate name here if needed.
  • Data Source - JNDI name of the DataSource associated with this schema.
    • This name is typically included as the first parameter of the <Resource> element with a "DataSource" suffix that is not shown in the dropdown (i.e. "someExternalDataSource" will appear as "someExternal").
    • Data source names must be unique.
    • All external data sources identified in the labkey.xml/ROOT.xml file are listed as options in this drop-down.
  • Database Schema Name: Required. Name of the physical schema within the external database.
    • This dropdown will offer all the schemas accessible to the user account under which the external schema was defined. If you don't see a schema you expect here, check the external database directly using the same credentials as the external data source connection uses.
    • Show System Schemas: Check the box to show system schemas which are filtered out of this dropdown.
  • Editable: Check to allow insert/update/delete operations on the external schema. This option currently only works on MSSQL and Postgres databases, and only for tables with a single primary key.
  • Index Schema Meta Data: Determines whether the schema should be indexed for full-text search.
  • Fast Cache Refresh: Whether or not this external schema is set to refresh its cache often. Intended for use during development.
  • Tables: Allows you to expose or hide selected tables within the schema.
    • Click the to expand the table list.
    • Checked tables are shown in the Query Schema Browser; unchecked tables are hidden.
    • If you don't see a table you expect here, check the external database directly using the same credentials as the external data source connection uses.
  • Meta Data: You can use a specialized XML format to specify how columns are displayed in LabKey.
    • For example you can specify data formats, column titles, and URL links. This field accepts instance documents of the TableInfo XML Schema XML schema.
When you are finished, click the Create button at the bottom of the form.

Manage External Schemas

  • Select (Admin) > Developer Links > Schema Browser.
  • Click Schema Administration.
The Schema Administration page displays the external and linked schemas that have been defined in the current folder. Click links for each to:
  • View Schema
  • Edit
  • Reload
  • Delete

Reload an External Schema

External schema metadata is not automatically reloaded. It is cached within LabKey Server for an hour, meaning changes, such as to the number of tables or columns, are not immediately reflected. If you make changes to external schema metadata, you may explicitly reload your external schema immediately using the reload link on the Schema Administration page.

Locate External Schemas Site-Wide

Find a comprehensive list of all external datasources and external schemas on the admin console:

  • Select (Admin) > Site > Admin Console.
  • Under Diagnostics, click Data Sources.
You'll see links to all containers (beginning with /) followed by all the external schemas defined therein. Click to directly access each external schema definition.

Test External Data Sources

On the Data Source Administration page, for each data source, you'll see a Test button. Click to open the test preparation page.

This test will enumerate all schemas and tables within the data source, attempting to retrieve the row count and 100 rows of data from each table, skipping the first 10 rows. This operation could take several minutes, depending on the number of schemas and tables. The server log will show progress as each table is tested.

This is not an exhaustive test but it does exercise paging, sorting, data type conversion, etc. on many different tables. Since some schemas and tables can be problematic (e.g., performance issues or unsupported data types), you can exclude testing of certain schemas and tables via name or prefix before executing the test by adding names to the corresponding text boxes on the test preparation page. Appending * to the end of a name acts as a prefix filter. These exclusions will be saved associated with the data source name.

Configure for Connection Validation

If there is a network failure or if a database server is restarted, the connection to the data source is broken and must be reestablished. Tomcat can be configured to test each connection and attempt reconnection by specifying a simple validation query. If a broken connection is found, Tomcat will attempt to create a new one. The validation query is specified in your DataSource resource in labkey.xml.

For a PostgreSQL or Microsoft SQL Server datasource, add this parameter:

validationQuery="SELECT 1"

For a SAS data source, add this parameter:

validationQuery="SELECT 1 FROM sashelp.table"

For a MySQL data source, add this parameter:

validationQuery="/* ping */"

For an Oracle data source, add this parameter:

validationQuery="SELECT 1 FROM dual"

Retry Failed Data Source Connections

Connections to external Data Sources are attempted during server startup. If an error occurs, the Data Source will not be available and will not appear on the dropdown list. When you view external schemas, you'll see the reason the connection isn't available, such as "no data source configuration exists" or "connection to [someExternalDataSource] failed".

To retry failed connections without restarting the server:

  • Select (Admin) > Developer Links > Schema Browser.
  • Click Schema Administration.
  • Click Retry Failed Data Source Connections.

Troubleshoot External Schemas

If you encounter problems defining and using external schema, start by checking the following:

  • Does the <Resource> tag for the external schema in your labkey.xml conform to the database-specific templates we provide. Over time both expected parameters and URL formats will change.
    • Check that the version of the jdbc driver for your external database is of a compatible version, and compare <Resource> tag parameters to driver expectations. Learn more in the database-specific pages
    • Check both the current documentation and the {archive for the specific version of LabKey you are using|/Documentation/Archive/project-begin.view?}.
    • Compare the resource tag carefully to the template provided, including the syntax of a connection validation query if present.
  • Is the name in the <Resource> tag unique, and does it exactly match what you are entering when you define the external schema?
  • Is your external database of a supported version? The database specific pages provide additional information.
  • If you don't see the expected tables when defining a new external schema:
    • Try directly accessing the external database with the credentials provided in the <Resource> tag to confirm access to the "Database Schema Name" you specified to see if you see the expected tables.
    • Check whether a schema of that name already exists on your server. Check the box to show hidden schemas to confirm.
    • If you can directly access the LabKey database, check whether there is already an entry in the query.externalschema table. This table stores these definitions but is not surfaced within the LabKey schema browser.
    • If there is an unexpected duplicate external schema, you will not be able to add a new one. One possible cause could be a "broken view" on that schema. Learn more in this topic.
  • Check the raw schema metadata for the external schema by going to the container where it is defined and accessing the URL of this pattern (substitute your SCHEMA_NAME):
  • Try enabling additional logging. Follow the instructions in this topic to temporarily change the logging level from "INFO" to "DEBUG", refresh the cache, and then retry the definition of the external schema to obtain debugging information.
    • Loggers
    • Loggers of interest:
      • org.labkey.api.data.SchemaTableInfoCache - for loading of tables from an external schema
      • org.labkey.api.data.ConnectionWrapper - for JDBC metadata, including attempts to query external databases for tables
      • org.labkey.audit.event.ContainerAuditEvent - at the INFO level, will report creation of external schema

Related Topics


Was this content helpful?

Log in or register an account to provide feedback

expand all collapse all