Modules: SQL Script Conventions

2024-03-29

The conventions outlined in this topic are designed to help everyone write better SQL scripts. They 1) allow developers to review & test each other's scripts and 2) produce schema that can be changed easily in the future. The conventions have been developed while building, deploying, and changing production LabKey installations over the years; we've learned some lessons along the way.

Databases & Schemas

Most modules support both PostgreSQL and Microsoft SQL Server. LabKey Server uses a single primary database (typically named "labkey") divided into 100+ "schemas" that provide separate namespaces, usually one per module.

Capitalization

SQL keywords should be in all caps. This includes SQL commands (SELECT, CREATE TABLE, INSERT), type names (INT, VARCHAR), and modifiers (DEFAULT, NOT NULL).

Identifiers such as table, view, and column names are always initial cap camel case. For example, ProtInfoSources, IonPercent, ZScore, and RunId. Note that we append 'Id' (not 'ID') to identity column names.

We use a single underscore to separate individual identifiers in compound names. For example, a foreign key constraint might be named 'FK_BioSource_Material'. More on this below.

Constraints & Indexes

Do not use the PRIMARY KEY modifier on a column definition to define a primary key. Do not use the FOREIGN KEY modifier on a column definition to define a foreign key. Doing either will cause the database to create a random name that will make it very difficult to drop or change the index in the future. Instead, explicitly declare all primary and foreign keys as table constraints after defining all the columns. The SQL Script Manager will enforce this convention.

  • Primary Keys should be named 'PK_<TableName>'
  • Foreign Keys should be named 'FK_<TableName>_<RefTableName>'. If this is ambiguous (multiple foreign keys between the same two tables), append the column name as well
  • Unique Constraints should be named 'UQ_<TableName>_<ColumnName>'
  • Normal Indexes should be named 'IX_<TableName>_<ColumnName>'
  • Defaults are also implemented as constraints in some databases, and should be named 'DF_<TableName>_<ColumnName>'
  • Almost all columns that have a foreign key, whether explicitly defined as a constraint in the table definition or as a soft foreign key, should have an associated index for performance reasons.

Keep Your SQL as Database-Independent as Possible

You may prefer using PostgreSQL over SQL Server (or vice versa), but don't forget about the other database. Write your scripts to work with both databases and you'll save yourself many headaches. Test your scripts on both databases.

  • NVARCHAR is preferred for almost all text-based columns on SQL Server since it allows extended characters.
  • SQL Server did not support the DROP IF EXISTS syntax until 2016. Since LabKey continues to support SQL Server 2012, modules cannot use this syntax in their upgrade scripts. Postgres has long supported this syntax, so it is legal to use there. For SQL Server, use the core.fn_dropifexists stored procedure instead:
EXEC core.fn_dropifexists 'TABLE_NAME', 'SCHEMA_NAME', 'TABLE', NULL;

Statement Endings

Every statement should end with a semicolon, on both PostgreSQL and SQL Server. In older versions of SQL Server, "GO" statements needed to be interjected frequently within SQL scripts. They are rarely needed now, except in a few isolated cases:

  • After creating a new user-defined type (sp_addtype), which is rare
  • Before and after a stored procedure definition; SQL Server requires each stored procedure definition to be executed in its own block
  • After a DROP and re-CREATE
  • After an ALTER statement, if the altered object is referenced later in the scripts

Creating Scripts from DB Management Tools

It is often convenient to create tables or data via visual tools first, and then use a tool that generate CREATE, INSERT, etc based on the end result. This is fine; however be aware that the script will have some artifacts which should be removed before committing your upgrade script. Some specific recommendations related to SQL Server Management Studio:

  • A "USE database name" statement at the top. The database name in other deployments will likely differ from your database name, which will break the script.
  • SET ANSI_NULLS ON and SET QUOTED_IDENTIFIER ON. These scripts are run on pooled connections, and these settings will remain for the next operation which gets the connection the script ran on.
  • General style recommendations for consistency and to make scripts less verbose:
    • Square brackets around many identifiers (schema, table, and column names). These should only be used if absolutely necessary.
    • Foreign keys constraints get created WITH CHECK. This should be removed.
    • Foreign keys will then have a separate statement to CHECK the constraint. This statement should also be removed.
    • Indexes are created with an explicit setting for every option, even though most/all of them will be the default setting. Only necessary options should be included.

Related Topics