External Schemas and Data Sources

Documentation
This document explains how to configure external data sources and load schemas from those data sources.

Topics:

Overview

An externally-defined schema can provide access to tables that are managed on any PostgreSQL, Microsoft SQL Server, SAS, Oracle, or MySQL database server in your institution. Site Administrators can make externally-defined schemas accessible within the LabKey interface, limiting access to authorized users and, if desired, a subset of tables within each schema. Once a schema is accessible, externally-defined tables become visible as tables within LabKey and LabKey applications can be built using these tables.

Furthermore, the external tables are editable within the LabKey interface, if the schema has been marked editable and the table has a primary key. XML meta data can also be added to specify formatting or lookups. Folder-level security is enforced for the display and editing of data contained in external schemas.

You can also pull data from an existing LabKey schema in a different folder by creating a "linked schema". You can choose to expose some or all of the tables from the original schema. The linked tables and queries may be filtered such that only a subset of the rows are shown. For details see Linked Schemas and Tables.

Note that you cannot create joins across data sources, including joins between external and internal schema on LabKey Server. As a work around, use an ETL to copy the data from the external data source(s) into the main internal data source. Once all of the data is in the main data source, you can create joins on the data.

Usage Scenarios

  • Display, analyze, and report on any data stored on any database server within your institution.
  • Build LabKey applications using external data without relocating the data.
  • Create custom queries that join data from standard LabKey tables with user-defined tables in the same database.
  • Publish SAS data sets to LabKey Server, allowing secure, dynamic access to data sets residing in a SAS repository.
Changes to the data are reflected automatically in both directions. Data rows that are added, deleted, or updated from either the LabKey Server interface or through external routes (for example, external tools, scripts, or processes) are automatically reflected in both places. Changes to the table schema are not immediately reflected, see below.)

Please Avoid: LabKey strongly recommends that you avoid defining the core LabKey Server schemas as external schemas. There should be no reason to use a LabKey schema as an external schema and doing so invites problems during upgrades and can be a source of security issues.

Data Source Configuration

Before you define an external schema in LabKey server, you must first configure a new data source resource in LabKey Server. Typically this is done by editing the labkey.xml configuration file, and in some cases, other steps. See the following topics for the preliminary configuration steps, depending on the type of external data source you are using:

Load an External Schema/Data Source

You can use schemas you have created in external tools (e.g., PGAdmin, SQL or SAS) within your LabKey Server. You will need to tell your LabKey Server about the external schema in order to access it.

To load an externally-defined schema:

  • Click on the folder/project where you would like to place the schema.
  • Select Admin > Developer Links > Schema Browser.
  • On the Query Schema Browser page, click Schema Administration.
  • Click New External Schema.
  • Fill out the following fields:
    • Schema Name – Required. Name of the schema within LabKey Server.
    • Data Source - JNDI name of the DataSource associated with this schema.
    • Database Schema Name – Required. Name of the physical schema within the underlying database. All external data sources identified in the labkey.xml file are listed as options in this drop-down.
    • Show System Schemas - Check the box to show system schemas (such as information_schema in PostgreSQL); by default they are filtered out of this dropdown.
    • Editable - Check to allow insert/update/delete operations on the external schema. This option currently only works on MSSQL and Postgres databases, and only for tables with a single primary key.
    • Index Schema Meta Data - Determines whether the schema should be indexed for full-text search.
    • Tables - Allows you to expose or hide selected tables within the schema. Checked tables are shown in the Query Schema Browser; unchecked tables are hidden.
    • Meta Data – You can use a specialized XML format to specify how columns are displayed in LabKey. For example you can specify data formats, column titles, and URL links. This field accepts instance documents of the TableInfo XML schema. In the following example, the AddressLine1 column on the Address table is displayed with the column title "Street Address".
<tables xmlns="http://labkey.org/data/xml">
<table tableName="Address" tableDbType="TABLE">
<columns>
<column columnName="AddressLine1">
<columnTitle>Street Address</columnTitle>
</column>
</columns>
</table>
</tables>

When you are finished, click the Create button at the bottom of the form. The screen shot below shows the Define External Schema form:

Edit a Previously Defined External Schema

The Schema Administration page displays all schemas that have been defined in the folder and allows you to view, edit, reload, or delete them.

Reload an External Schema

External schema meta data is not automatically reloaded. It is cached within LabKey Server for an hour, meaning changes, such as to the number of tables or columns, are not immediately reflected. If you make changes to external schema metadata, you may explicitly reload your external schema immediately using the reload link on the Schema Administration page.

Configure for Connection Validation

If there is a network failure or if a database server is restarted, the connection to the data source is broken and must be reestablished. Tomcat can be configured to test each connection and attempt reconnection by specifying a simple validation query. If a broken connection is found, Tomcat will attempt to create a new one. The validation query is specified in your DataSource resource in labkey.xml.

For a Microsoft SQL Server or PostgreSQL, data source, add this parameter:

validationQuery="SELECT 1"

For a SAS data source, add this parameter:

validationQuery="SELECT 1 FROM sashelp.table"

For a MySQL data source, add this parameter:

validationQuery="/* ping */"

Related Topics

Discussion

previousnext
 
expand all collapse all