Study reload is used when you want to refresh study data, and is particularly useful when data is updated in another data source and brought into LabKey Server for analysis. Rather than updating each dataset individually, reloading a study in its entirety will streamline the process.

For example, if the database of record is outside LabKey Server, a script could automatically generate TSVs to be reloaded into the study every night. Study reload can simplify the process of managing data and ensure researchers see daily updates without forcing migration of existing storage or data collection frameworks.

Caution: Reloading a study will replace existing data with the data contained in the imported archive.

Manual Reloading

To manually reload a study, you need to have an unzipped study archive format of .xml files and directories and place it in the pipeline root directory. Learn more about generating study archives in this topic: Export a Study.

Export for Manual Reloading

  • Navigate to the study you want to refresh.
  • On the Manage tab, click Export Study.
  • Choose the objects to export and under Export to:, select "Pipeline root export directory, as individual files."
  • The files will be exported unzipped to the correct location from which you can refresh.

Explore the downloaded archive format in the export directory of your file browser. You can find the dataset and other study files in the subfolders. Locate the physical location of these files by determining the pipeline root for the study. Make changes as needed.

Update Datasets

In the unzipped archive directory, you will find the exported datasets as TSV (tab-separated values) text files in the datasets folder. They are named by ID number, so for example, a dataset named "Lab Results" that was originally imported from an Excel file, might be exported as "dataset5007.tsv". If you have a new Excel spreadsheet, "Lab_Results.xlsx", convert the data to a tab-separated format and replace the contents of dataset5007.tsv with the new data. Use caution that column headers and tab separation formats are maintained or the reload will fail.

Add New Datasets

If you are adding new datasets to the study:

  • First ensure that each Excel and TSV data file includes the right subject and time identifying columns as the rest of your study.
  • Place the new Excel or TSV files into the /export/study/datasets directory directly alongside any existing exported datasets.
  • Delete the files "datasets_manifest.xml", "datasets_metadata.xml" and "XX.dataset" (the XX in this filename will be the first word in the name of your folder) from the directory. These files, when present, tell the study reloader what datasets to load. When they are missing, the server scans the entire directory.
  • Reload the study.

Reload from Pipeline

  • In your study, select (Admin) > Go To Module > FileContent.
    • You can also use the Manage tab, click Reload Study, and then click Use Pipeline.
  • Locate the "study.xml" file and select it.
  • Click Import Data and confirm Reload Study is selected.
  • Click Import.
  • Select import options if desired.
  • Click Start Import.

Study Creation via Reload

You can use the manual reload mechanism to populate a new empty study by moving or creating the same study archive folder structure in the pipeline root of another study folder. This mechanism avoids the manual process of creating numerous individual datasets.

To get started, export an existing study and copy the folders and structures to the new location you want to use. Edit individual files as needed to describe your study. When you reload the study following the same steps as above, it will create the new datasets and other structure from scratch. For a tutorial, use the topic: Tutorial: Inferring Datasets from Excel and TSV Files.

Inferring New Datasets and Lists

Upon reloading, the server will create new datasets for any that don't exist, and infer column names and data types for both datasets and lists, according to the following rules.

  • Datasets and lists can be provided as Excel files or as TSV files.
  • The target study must already exist and have the same 'timepoint' style, either Date-based or Visit-based, as the incoming study archive.
  • If lists.xml or dataset_metadata.xml are present in the incoming study archive, the server will use the column definitions therein to add columns if they are not already present.
  • If lists.xml or dataset_metadata.xml are not present, the server will also infer new column names and column types based on the incoming data files.
  • The datasets/lists must be located in the archive’s datasets and lists subdirectories. The relative path to these directories is set in the study.xml file. For example, the following study.xml file locates the datasets directory in /myDatasets and the lists directory in /myLists.
study.xml
<study xmlns="http://labkey.org/study/xml" label="Study Reload" timepointType="DATE" subjectNounSingular="Mouse" subjectNounPlural="Mice" subjectColumnName="MouseId" startDate="2008-01-01-08:00" securityType="ADVANCED_WRITE">
<cohorts type="AUTOMATIC" mode="SIMPLE" datasetId="5008" datasetProperty="Group"/>
<datasets dir="myDatasets" />
<lists dir="myLists" />
</study>
  • When inferring new columns, column names are based on the first row of the file, and the column types are inferred from values in the first 5 rows of data.
  • LabKey Server decides on the target dataset or list based on the name of the incoming file. For example, if a file named "DatasetA.xls" is present, the server will update an existing dataset “DatasetA”, or create a new dataset called "DatasetA". Using the same naming rules, the server will update (or add) Lists.
root
study
study.xml - A simplified metadata description of the study.
myDatasets
DatasetA.tsv
DatasetB.tsv
Demographics.xlsx
myLists
ListA.tsv
ListB.xlsx

Automated Study Reload: File Watchers (Premium Feature)

To enable automated reload, configure a file watcher to either reload datasets from files or reload the entire study.

API Based Reload (Deprecated)

This API-based reload option is provided for backwards compatibility reasons. Using the file watcher mechanism for reloading a study is preferred and provides many more options.

The process for API-based reload typically involves an automated script updating the timestamp on a "studyload.txt" file placed in the pipeline root folder, then prompting the server to reload the study archive from this folder. LabKey Server ignores the contents of studyload.txt, looking only at the file's modification timestamp.

The script follows steps similar to these:

  1. Read dataset, specimen, and other important study data from a master database and/or specimen LIMS system.
  2. Write the data to the file system in the LabKey study archive format.
  3. Touch the studyload.txt file to update the timestamp to the current date/time.
  4. Signal to the LabKey Server that the archive is ready to load using a POST to checkForReload.api. Details are below.

POST to checkForReload.api

This last step in the process causes the server to validate the request and, if successful, queue the reload pipeline job. It must be run as a logged in user with administrator permissions, typically by use of an API Key.

The script signals the server by issuing a POST (including CSRF token) to checkForReload.api in the folder that should receive the reload. There are two optional parameters to this action:

  • queryValidation: If true, it instructs the server to perform a query validation step after the study is reloaded. This process flags query errors that might not otherwise be noticed, but can be time consuming.
  • failOnUndefinedVisits: By default, undefined visits in the upload will trigger creation of new visits in the study schedule. Set this parameter to true to prevent auto-creation of new visits, and instead fail the upload if it contains data associated with unknown visits.

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all