Study reload can be used to refresh study data from another source into LabKey Server, often on a nightly basis, to enable analysis and integration.
Study reload can be useful for refreshing studies whose data is managed externally. For example, if the database of record is SAS, a SAS script could automatically generate TSVs nightly to be reloaded into LabKey Server. This simplifies the process of using LabKey tools for administration, analysis, reporting, and data integration without forcing migration of existing storage or data collection frameworks.
Caution: Reloading a study will replace existing data with the data contained in the imported archive.
To manually reload a study, you need to have a study archive format of .xml files and directories. To generate a study archive, see this topic: Export a Study
. Move the unzipped archive artifacts to the pipeline root directory. If you use the option "Pipeline root export
directory, as individual files" in the export UI, the files will be in this location.
- In your study, click the Manage tab.
- Click Reload Study at the bottom of the page.
- Click Use Pipeline.
- Locate and select the file "study.xml".
- Click Import Data and confirm Reload Study is selected.
- Click Import.
- Select import options if desired.
- Click Start Import.
Inferring New Datasets and Lists
Upon reloading, the server will create new datasets, and infer column names and data types for both datasets and lists, according to the following rules.
- Datasets and lists can be provided as Excel files or as TSV files.
- The target study must already exist and have the same 'timepoint' style, either Date-based or Visit-based, as the incoming study archive.
- If lists.xml or dataset_metadata.xml are present in the incoming study archive, the server will use the column definitions therein to add columns if they are not already present.
- If lists.xml or dataset_metadata.xml are not present, the server will also infer new column names and column types based on the incoming data files.
- The datasets/lists must be located in the archive’s datasets and lists subdirectories. The relative path to these directories is set in the study.xml file. For example, the following study.xml file locates the datasets directory in /myDatasets and the lists directory in /myLists.
<study xmlns="http://labkey.org/study/xml" label="Study Reload" timepointType="DATE" subjectNounSingular="Mouse" subjectNounPlural="Mice" subjectColumnName="MouseId" startDate="2008-01-01-08:00" securityType="ADVANCED_WRITE">
<cohorts type="AUTOMATIC" mode="SIMPLE" datasetId="5008" datasetProperty="Group"/>
<datasets dir="myDatasets" />
<lists dir="myLists" />
- When inferring new columns, column names are based on the first row of the file, and the column types are based on the first 5 rows of data.
- LabKey Server decides on the target dataset or list based on the name of the incoming file. For example, if a file named "DatasetA.xls" is present, the server will update an existing dataset “DatasetA”, or create a new dataset called "DatasetA". Using the same naming rules, the server will update (or add) Lists.
study.xml - A simplified metadata description of the study.
API Based Reload
This API-based reload option is provided for backwards compatibility reasons. Using the file watcher
mechanism for reloading a study is preferred and provides many more options.
The process for API-based reload typically involves an automated script updating the timestamp on a "studyload.txt" file placed in the pipeline root folder, then prompting the server to reload the study archive from this folder. LabKey Server ignores the contents of studyload.txt, looking only at the file's modification timestamp.
The script follows steps similar to these:
- Read dataset, specimen, and other important study data from a master database and/or specimen LIMS system.
- Write the data to the file system in the LabKey study archive format.
- Touch the studyload.txt file to update the timestamp to the current date/time.
- Signal to the LabKey Server that the archive is ready to load using a POST to checkForReload.api. Details are below.
POST to checkForReload.api
This last step in the process causes the server to validate the request and, if successful, queue the reload pipeline job. It must be run as a logged in user with administrator permissions, typically by use of an API Key
The script signals the server by issuing a POST (including CSRF token) to checkForReload.api in the folder that should receive the reload. There are two optional parameters to this action:
- queryValidation: If true, it instructs the server to perform a query validation step after the study is reloaded. This process flags query errors that might not otherwise be noticed, but can be time consuming.
- failOnUndefinedVisits: By default, undefined visits in the upload will trigger creation of new visits in the study schedule. Set this parameter to true to prevent auto-creation of new visits, and instead fail the upload if it contains data associated with unknown visits.