Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

You can set a polling schedule to check the source database for new data and automatically run the ETL process when new data is found. Either specify a time interval, or use a full cron expression to schedule ETLs. When choosing a schedule for running ETLs, consider the timing of other processes, like automated backups, which could cause conflicts with running your ETL.

Enable ETL Scheduling

To enable a scheduled ETL, you need to include schedule expression in the ETL definition following one of the following patterns. Then you can use the user interface to add and enable it:

  • Select (Admin) > Folder > Management. Click the ETLs tab.
  • Click (Insert new row) and enter the ETL definition in the XML panel. Click Save.
  • Select (Admin) > Go To Module > Data Integration.
  • Check the Enabled box for each ETL that you want to enable to run on the defined schedule.

For a walkthrough of adding a new ETL, see the ETL Tutorial.

ETL Schedule Options

The schedule below checks every hour for new data:

<schedule><poll interval="1h" /></schedule>

These examples show some cron expressions to schedule running of the job:

<!-- run at 10:15 every day -->
<schedule><cron expression="0 15 10 ? * *"/></schedule>

<!-- run every hour on the hour every day, i.e. 9:00, 10:00, etc. -->
<schedule><cron expression="0 0 * ? * *"/></schedule>

<!-- run on Tuesdays and Thursdays at 3:30 pm -->
<schedule><cron expression="0 30 15 ? * TUE,THU *"/></schedule>

Cron expressions consist of six or seven space separated strings for the seconds, minutes, hours, day-of-month, month, day-of-week, and optional year in that order. The wildcard '*' indicates every valid value. The character '?' is used in the day-of-month or day-of-week field to mean 'no specific value,' i.e, when the other value is used to define the days to run the job.

It is good practice to include a plain text comment clarifying the behavior of the cron expression you use.

To assist you, use a builder for the Quartz cron format. One is available here: https://www.freeformatter.com/cron-expression-generator-quartz.html.

A full description of the cron syntax is available on the Quartz site here.

Note: If checking the source database involves a long running query or many ETLs are scheduled for the same time, there can be a delay between the scheduled time and the time the ETL job is placed in the pipeline queue, corresponding to the database response. It's possible this could result in an execution order inconsistent with the chronological order of closely scheduled ETLs.

Sequence ETLs

If ETLs must run in a particular order, it is recommended to put them as multiple steps in another ETL to ensure the order of execution. This can be done in either of the following ways:

Disable ETL Schedules

Over time, you may wish to disable an ETL that was previously scheduled to run regularly. For example, if data formats or needs change, ETLs may be obsolete or raise spurious errors.

You can edit the ETL to remove the <schedule> statement added above. You can also disable scheduled running of the ETL as follows:

  • Select (Admin) > Go To Module > Data Integration.
  • Uncheck the Enabled box for each ETL that you want to disable.

An ETL disabled using this checkbox may still be run manually if needed.

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all