LabKey includes a database schema management system that module writers use to automatically install and upgrade schemas on the servers that deploy their modules, providing convenience and reliability to the server admins. Module writers should author their SQL scripts carefully, test them on multiple databases, and follow some simple rules to ensure compatibility with the script runner. Unlike most code bugs, a SQL script bug has the potential to destroy data and permanently take down a server. We suggest reading this page completely before attempting to write module SQL scripts. If you have any questions, please contact the LabKey team.
If your module is checked in to LabKey's subversion repository, or your module has the potential to be installed on additional servers (including by other developers), you should be especially conscious of updates to SQL scripts. Once a script has been checked in to LabKey's repository or run by another instance of LabKey, it is a good guideline to consider it immutable. If a table needs to be altered, no matter how trivial the change, a new upgrade script should normally be used. This is because if another server installs or upgrades using this script, it will not be re-run. If the script is then edited, this can cause the other machine to have an incomplete schema, which can easily result in errors downstream or on subsequent updates. For the case of scripts checked in to LabKey's subversion repository, be aware that other developers and LabKey's testing servers routinely run all checked-in scripts and it is very easy for problems to arise from inappropriately changed scripts. See the Hints and Advanced Topics
section below for ways to make this process easier.
Note that module-based SQL scripts for assay types are not supported.
SQL Script Manager
You must name your SQL scripts correctly and update your module versions appropriately, otherwise your scripts might not run at all, scripts might get skipped, or scripts might run in the wrong order. The LabKey SQL Script Manager gets called when a new version of a module gets installed. Specifically, a module gets updated at startup time if (and only if) the version number listed for the module in the database is less than the current version in the code. The module version in the database is stored in core.Modules; the module version in code is returned by the getVersion() method in each Module class (Java module) or listed in version.properties (file-based module).Rule #1: The module version must be bumped to get any scripts to run.
When a module is upgraded, the SQL Script Manager automatically runs the appropriate scripts to upgrade to the new schema version. It determines which scripts to run based on the version information encoded in the script name. The scripts are named using the following convention: <dBschemaName>-<fromVersion #.00>-<toVersion #.00>.sqlRule #2: Use the correct format when naming your scripts; anything else will get ignored.
Use dashes, not underscores. Use two (or three, if required) decimal places for version numbers (0.61, 1.00, 12.10). We support three decimal places for very active modules, those that need more than 10 incremental scripts per point release. But most modules should use two decimal places.
- foo-0.00-1.00.sql: Upgrades foo schema from version 0.00 to 1.00
- foo-1.00-1.10.sql: Upgrades foo schema from version 1.00 to 1.10
- foo-1.10-1.20.sql: Upgrades foo schema from version 1.10 to 1.20
- foo-0.00-1.20.sql: Upgrades foo schema from version 0.00 to 1.20
(Note that the schema produced by running the first three scripts above should be the same as the schema produced by running the fourth script alone.)
The script directories can have many incremental & full scripts to address a variety of upgrade scenarios. The SQL Script Manager follows a specific algorithm when determining which script(s) to run for an upgrade. This is what it does:
- Determine installed module version number ("old") and new module version number ("new").
- Find all scripts in the directory that start at or above "old" and end at or below "new". Eliminate any scripts that have already been run on this database (see the core.SqlScripts table).
- Of these scripts, find the script(s) with the lowest "from" version. If there's just a single script with this "from" version, pick it. If there are more than one, pick the script with the highest "to" version.
- Run that script. Now the schema has been updated to the "to" version indicated in the script just run.
- Determine if more scripts need to be run. To do this, treat the "to" version of the script just run as the currently installed version.
- Repeat all the steps above (create list of scripts in the new range, eliminate previously run scripts, choose the script with the lowest starting point having the greatest range, and run it) until there are no more scripts left.
A few scenarios based on the "foo" example above may help clarify the process:
Rule #3: Name your script as starting at the current module version in code.
|Installed Module Version||New Module Version||Script(s) Run|
|0.00 (not installed)||1.10||foo-0.00-1.00.sql, foo-1.00-1.10.sql|
|0.00 (not installed)||1.20||foo-0.00-1.20.sql|
|1.11||1.20||None of these scripts|
This rule is important, but easily forgotten. If the most recent script in a directory is "foo-0.90-1.00.sql" and the new module version will be 2.00, it may be tempting to name the new script "foo-1.00-2.00.sql". This is almost certainly a mistake. What matters is the module version in code, not the ending version of the last script. The module number in code gets bumped for a variety of reasons (e.g., for a major release, for other schemas, or to force after-schema-update code to run), so a script that starts where the last script left off will probably never run. You must look at the current module version in code instead. There will be "gaps" in the progression; this is expected and normal.
If you're creating a new incremental script, here is a (nearly) foolproof set of steps that will produce a correct script name for module "Foo" that uses schema "foo":
- Finalize and test your script contents.
- Do an svn update to get current on all files. This ensures that no one else has bumped the version or checked in an incremental script with the same name.
- Find the current version number returned by the FooModule getVersion() method. Let's say it's 1.02.
- Name your script "foo-1.02-1.03.sql". (Incrementing by 0.01 gives you room to get multiple schema changes propagated and tested during the development period between major releases.)
- Bump the version number returned by FooModule.getVersion() to 1.03.
- Build, test, and commit your changes.
Everyone who syncs to your repository (e.g., all the developers on your team, your continuous integration server) will update, build, start their servers, and automatically run your upgrade script, resulting in Foo module version 1.03 successfully installed (unless you make a mistake… in which case you get to fix their database). After your commit there's no going back; you can't change scripts once they've been run. Instead, you must check in a new incremental that produces the appropriate changes (or rolls back your changes, etc.).Rule #4: After a release, the next incremental script is still a point version of the release
Just before releasing a new version of LabKey Server, the LabKey team "rolls up" each module's incremental scripts into a single script for that release (e.g., foo-1.00-1.01.sql, foo-1.01-1.02.sql, and foo-1.02-1.03.sql get concatenated into foo-1.00-1.10.sql). This cleans things up a bit and reduces the number of script files, but it isn't required at all. The critical step is to get the incremental script right; you only get one chance for that.
The LabKey team will also bump all module versions to match the release. foo would now be version 1.10. The next script, intended for the 1.20 release, will be foo-1.10-foo-1.11.sql. Never bump the module version past the in-progress LabKey release.
(e.g., if you get up to foo-1.18-1.19.sql before the 1.20 release, and still need another script, it would be foo-1.19-1.191.sql)
If you're testing an extensive schema change you may want to check in a script but not have it run on other developers' machines yet. This is simple; check in the script but don't bump the version number in code. When you're done testing, bump the version and everyone will upgrade.
The above guidelines eliminate most, but not all, problems with script naming. In particular, if multiple developers are working on the same module they must coordinate with each other to ensure scripts don't conflict with each other.
Remember that all scripts adhere to the module version number progression. If a single module manages multiple database schemas you must be extra careful about rule #3 and plan to see many gaps between each schema's script files.
Hints and Advanced Topics
- Modules are upgraded in dependency order, which allows schemas to safely depend on each other.
- Modules can (optionally) include two special scripts for each schema: <schema>-create.sql and <schema>-drop.sql. The drop script is run before all module upgrades and the create script is run after that schema's scripts are run. The primary purpose is to create and drop SQL views in the schema. The special scripts are needed because some databases don't allow modifying tables that are used in views. So LabKey drops all views, modifies the schema, and re-creates all views on every upgrade.
- Java upgrade code. Some schema upgrades require code. One option is to implement and register a class in your module that implements UpgradeCode and invoke its methods from inside a script via the core.executeJavaUpgradeCode stored procedure. This works well for self-contained code that assumes a particular schema structure; the code is run once at exactly the right point in your upgrade sequence.
- After schema update. Another option for running Java code is to call it from the Module afterUpdate() method. This can be useful if the upgrade code needs to call library methods that change based on the current schema. Be very careful here; the schema could be in a completely unknown state (if the server hasn't upgraded in a while then your code could execute after two years of future upgrade scripts have run).
- bootstrap. On a developer machine: shut down your server, run "gradlew bootstrap", and restart your server to initiate a full bootstrap on your currently selected database server. This is a great way to test SQL scripts on a clean install. Use "gradlew pickPg" and "gradlew pickMSSQL" to test against the other database server.
- The Admin Console provides other helpful tools. The "Sql Scripts" link shows all scripts that have run and those that have not run on the current server. From there, you can choose to "Consolidate Scripts" (e.g., rolling up incremental scripts into version upgrade scripts or creating bootstrap scripts, <schema>-0.00-#.00.sql). While viewing a script you have the option to "Reorder" the script, which attempts to parse and reorder all the statements to group all modifications to each table together. This can help streamline a script (making redundant or unnecessary statements more obvious), but is recommended only for advanced users.
- In addition to these scripts, you will need to create a schema XML file. This file is located in the /scripts folder of your module. There is one XML file per schema. This file can be auto-generated for an existing schema. To get an updated XML file for an existing schema, go to the Admin Console then pick 'Check Database'. There will be a menu to choose the schema and download the XML. If you would like to download an XML file for a schema not yet visible to labkey, you can use a URL along these lines directly: http://localhost:8080/labkey/admin/getSchemaXmlDoc.view?dbSchema=<yourSchemaName>. Simply replace the domain name & port with the correct values for your server. Also put the name of your schema after 'dbSchema='. Note: Both the schema XML file name and 'dbSchema=' value are case-sensitive. They must match the database schema name explicitly.
- LabKey offers automated tests that will compare the contents of your schema XML file with the actual tables present in the DB. To run this test, visit a URL similar to: http://localhost:8080/labkey/junit/begin.view?, but substitute the correct domain name and port. Depending on your server configuration, you may also need to omit "/labkey" if labkey is run as the root webapp. This page should give a list of all Junit test. Run the test called "org.labkey.core.admin.test.SchemaXMLTestCase".
- Schema delete. When developing a new module, schemas can change rapidly. During initial development, it may be useful to completely uninstall / reinstall a module in order to rebuild the schema from scratch, rather than make changes via a large number of incremental scripts. Uninstalling a module requires several steps: drop the schema, delete the entry in the core.Modules table, delete all the associated rows in the core.SqlScripts table. The "Module Details" page (from the Admin Console) provides a quick way to uninstall a module; when your server is restarted, the module will be reinstalled and the latest scripts run. Use extreme caution… deleting a schema or module should only be done on development machines. Also note that while this is useful for development, see warnings above about editing scripts once checked into subversion and/or otherwise made available to other instances of LabKey.
The conventions below are designed to help everyone write better scripts. They 1) allow developers to review & test each other's scripts and 2) produce schema that can be changed easily in the future. The conventions have been developed while building, deploying, and changing production LabKey installations over the last eight years; we've learned some lessons along the way.
Databases & Schemas
Most modules support both PostgreSQL and Microsoft SQL Server. LabKey Server uses a single primary database (typically named "labkey") divided into 20 - 30 "schemas" that provide separate namespaces, usually one per module. Note that, in the past, SQL Server used the term "owner" instead of "schema," but that term is being retired.
SQL keywords should be in all caps. This includes SQL commands (SELECT, CREATE TABLE, INSERT), type names (INT, VARCHAR), and modifiers (DEFAULT, NOT NULL).
Identifiers such as table, view, and column names are always initial cap camel case. For example, ProtInfoSources, IonPercent, ZScore, and RunId. Note that we append 'Id' (not 'ID') to identity column names.
We use a single underscore to separate individual identifiers in compound names. For example, a foreign key constraint might be named 'FK_BioSource_Material'. More on this below.
Constraints & IndexesDo not use the PRIMARY KEY modifier on a column definition to define a primary key. Do not use the FOREIGN KEY modifier on a column definition to define a foreign key.
Doing either will cause the database to create a random name that will make it very difficult to drop or change the index in the future. Instead, explicitly declare all primary and foreign keys as table constraints after defining all the columns. The SQL Script Manager will enforce this convention.
- Primary Keys should be named 'PK_<TableName>'
- Foreign Keys should be named 'FK_<TableName>_<RefTableName>'. If this is ambiguous (multiple foreign keys between the same two tables), append the column name as well
- Unique Constraints should be named 'UQ_<TableName>_<ColumnName>'
- Normal Indexes should be named 'IX_<TableName>_<ColumnName>'
- Defaults are also implemented as constraints in some databases, and should be named 'DF_<TableName>_<ColumnName>'
Keep Your SQL as Database-Independent as Possible
You may prefer using PostgreSQL over SQL Server (or vice versa), but don't forget about the other database… write your scripts to work with both databases and you'll save yourself many headaches. Test your scripts on both databases.
Every statement should end with a semicolon, on both PostgreSQL and SQL Server. In older versions of SQL Server, "GO" statements needed to be interjected frequently within SQL scripts. They are rarely needed now, except in a few isolated cases:
- After creating a new user-defined type (sp_addtype), which is rare
- Before and after a stored procedure definition; SQL Server requires each stored procedure definition to be executed in its own block
- After a DROP and re-CREATE
- After an ALTER statement, if the altered object is referenced later in the scripts
Scripting from SQL Server
It is often convenient to create SQL Server objects or data via visual tools first, and then have SQL Server generate the correct CREATE, INSERT, etc scripts. This is fine; however be aware that the script will have some artifacts which should be removed before committing your upgrade script:
- A "USE database name" statement at the top. The database name in other environments is entirely arbitrary.
- SET ANSI_NULLS ON and SET QUOTED_IDENTIFIER ON. These scripts are run on pooled connections, and these settings will remain for the next operation which gets the connection the script ran on.
- Square brackets around many identifiers (schema, table, and column names). These should only be used if absolutely necessary.
- Foreign keys constraints get created WITH CHECK. This should be removed.
- Foreign keys will then have a separate statement to CHECK the constraint. This statement should also be removed.
- Indexes are created with an explicit setting for every option, even though most/all of them will be the default setting. Only necessary options should be included.