Shared Datasets and Timepoints: /Documentation/Archive/23.11

Shared Datasets and Timepoints

Shared datasets and timepoints are experimental and advanced features. Please contact LabKey if you would like to use these features to their fullest.

Shared datasets and timepoints can be enabled at the project level and used in different ways for cross-study alignment. Either or both may be enabled in a top-level container (project) of type Study or Dataspace, meaning that studies within that project will be able to share definitions and combined data views can be created at the project level.

Overview
Configure Project Level Sharing

Shared Datasets
Shared Timepoints (or Visits)

Share Demographic Data
Create Study Subfolders
The Data Sharing Container

Overview

Shared datasets and timepoints can be used in different ways for cross-study alignment. When enabled in a project, subfolders of type "Study" within it will be able to make use of:

Shared Datasets: When this option is enabled, all studies in the project will see the definitions of any datasets defined in the root folder of this project.

This setting applies to the dataset definition, meaning that studies will all share the structure, name, and metadata associated with each dataset defined at the project level. This means you can ensure that the same table definitions are being used across multiple containers, and control those definitions from a central location.
Optional sharing of demographic data in the shared datasets is separately controlled.
Studies in the project may also have their own additional datasets that are not shared.
Any changes you later make to the shared datasets at the project level will "cascade" into the child studies. For example, fields added to the dataset definition in the project will also appear in the child studies.

Shared Timepoints: This option applies to the sharing of either visits or timepoints, depending on how the top level project is configured. All studies will use the same kind of timepoint.

When this option is enabled, all studies in the project will see the visits/timepoints defined in the root folder of the project.
This enables aligning data across studies in the same time-series "buckets".
Any changes you make to visits/timepoints in the parent project, including additions, will cascade into the child folder.

In addition, when Shared Datasets are enabled, administrators can make use of an additional sharing option:

Share demographic data: For any dataset defined to be demographic, meaning the Data Row Uniqueness is set to Participants only, this setting is available to enable data sharing across studies, instead of only sharing of the dataset definition. Options:

By default, each study folder 'owns' its own data rows in this dataset and the data is not shared.
When you choose Share by ParticipantId, data rows are shared across the project and studies will only see data rows for participants that are part of that study.
Data added to the dataset in a child study is automatically added to the shared dataset in the parent project, creating a union table out of the child datasets.

Take note that if you have a combination of shared datasets and 'nonshared' datasets defined only in a child study, it is possible to "shadow" a shared dataset if you give a child-folder dataset the same name. You also cannot change whether data in a dataset will be shared once you've added any data. It's good practice to design the dataset expectations and configurations prior to adding any data.

Configure Project Level Sharing

First, set up the parent container, or project, which will form the source of the shared information with subfolder studies.

Create a project of type Study.
Once the empty study project is created, click Create Study.
On the Create Study page, define your study properties and scroll down to the Shared Study Properties section. Note that this section is only available when creating a new study in a project; the options will not appear when creating a new study in a folder.
Enable Shared Datasets and Shared Timepoints. While it is possible to use only one or the other, they are most commonly used together.

Once Shared Datasets and/or Shared Timepoints have been enabled, you can change the folder type from Study to Dataspace if desired. This is not necessary, but if desired, should be performed after enabling sharing.

Select (Admin) > Folder > Management
Click the Folder Type tab.
Select Dataspace and click Update Folder.

Shared Datasets

Shared datasets must be defined in the top-level project. You may also have 'child-study-specific' datasets in child folders that will not be shared, but they must be created with dataset IDs that do not conflict with existing parent datasets.

When creating a shared dataset in the project, we recommend manually assigning a dataset id, under Advanced Settings > Dataset ID. This will prevent naming collisions in the future, especially if you plan to create folder-specific, non-shared datasets. Note that the auto-generated dataset id's follow the pattern 5001, 5002, 5003, etc. When manually assigning a dataset id, use a pattern (such as 1001, 1002, 1003, etc.) that will not collide with any auto-generated ids that may be created in the future.

Shared Timepoints (or Visits)

When shared timepoints are enabled, the Manage tab in the top level project will include a link to Manage Shared Timepoints (or Visits). Click to use an interface similar to that for single studies to manage the timepoints or visits.

The timepoints and visits created here will be shared by all study folders in the project.

Share Demographic Data

Once shared datasets and shared timepoints have been enabled, you can enable sharing of demographic data, not just the dataset definitions.

For demographics datasets, this setting is used to enable data sharing across studies. When 'No' is selected (default), each study folder 'owns' its own data rows. If the study has shared visits/timepoints, then 'Share by ParticipantId' means that data rows are shared across the project and studies will only see data rows for participants that are part of that study.

Enabling data sharing means that any individual records entered at the folder level will appear at the project level. In effect, the project level dataset become a union of the data in the child datasets. Note that inserting data directly in the project level dataset is disabled.

Navigate to the dataset definition in the top level project.
Edit the dataset definition.
In the dataset designer, ensure Data Row Uniqueness is set to Participants Only (demographic data).
Click Advanced Settings, then use the dropdown Share Demographic Data:

No (the default) means that each child study folder 'owns' its own data rows. There is no data sharing for this dataset.
Share by Participants means that data rows are shared across the project, and studies will only see data rows for participants that are part of that study.

Create Study Subfolders

Create new subfolders of this project, each of type Study. You can create these 'sub studies' before or after starting to populate the shared structures at the project level.

Note that the parent container has already set the timepoint type and duration, which must match in all child studies, so you will not see those options when you Create Study.

Each of the 'sub-studies' will automatically include any definitions or timepoints you create at the project level. It is best practice to define these shared structures before you begin to add any study data or study-specific datasets.

The Data Sharing Container

Note that the project-level container that shares its datasets and timepoints with children sub-studies does not behave like an "ordinary" study. It is a different container type which does not follow the same rules and constraints that are enforced in regular studies. This is especially true of the uniqueness constraints that are normally associated with demographic datasets. This uniqueness constraint does not apply to datasets in the top-level project, so it is possible to have a demographics table with duplicate ParticipantIds, and similar unexpected behavior.

If the same ParticipantId occurs in multiple studies, participant groups may exhibit unexpected behavior. Participant groups do not track containers, they are merely a list of strings (ParticipantIds), and cannot distinguish the same ParticipantId in two different containers.

When viewed from the project-level study, participants may have multiple demographics datasets that report different information about the same id, there might be different dates or cohort membership for the same visit, etc.

LabKey Support

LabKey Support