Shared datasets and timepoints are experimental and advanced features. Please contact LabKey
if you would like to use these features to their fullest.
Shared datasets and timepoints let you:
- Share the same dataset definitions and timepoints across multiple studies. This lets you define datasets at the project level and use the same definitions in the child folders. On this option, data is not shared across studies, only the dataset definitions. This is similar to defining an assay design at the project level so the same design can be available in child folders. In both cases, you can ensure that the same table definitions are being used across multiple containers, and control those definitions from a central source.
- Share demographic datasets, both the definitions and the actual data within them, with child folders. Demographic data in the child folders is automatically added to the shared dataset in the parent project, creating a union table out of the child datasets.
- View combined data across multiple studies. Combined data views are available at the parent/project-level.
Shared Definitions and Timepoints
Shared dataset definitions and timepoints are defined at the project-level and are available in any "sub-studies", that is, studies in the project's child folders. Any datasets and timepoints you define in the parent project will automatically appear in the child folders. Also any changes you make to the parent definitions and timepoints will cascade into the child folder, for example:
- Any fields added to the dataset definition in the project will also appear in the child studies.
- Any visits added to the project will also appear in the child studies.
All updates to the parent definition will be reflected in the child folders, including the addition of fields, deletion of fields, metadata configurations, etc.
This option does not share any data between studies, only the dataset definitions and timepoint structure are shared.
Note that the datasets in child folders must be created with dataset IDs that do not conflict with existing parent datasets.
To set up:
- Create a project of type Study. This project will form the source of the shared definition and timepoint structure.
- Once the empty study project is created, click Create Study.
- On the Create Study page, define your study properties and scroll down to the Shared Study Properties section. Note that this section is only available when creating a new study in a project; the options will not appear when creating a new study in a folder.
- Enable Shared Datasets and/or Shared Timepoints.
- Once Shared Datasets and/or Shared Timepoints have been enabled, change the folder type from Study to Dataspace.
- Select (Admin) > Folder > Management
- Click the Folder Type tab.
- Select Dataspace and click Update Folder.
- Create subfolders of this project, each of type Study.
- Now any definitions or timepoints in the project will also appear in the child studies.
Creating Shared Datasets
When creating a shared dataset, we recommend manually
assigning a dataset id, under Advanced Settings
> Dataset ID
. This will prevent naming collisions in the future, especially if you plan to create folder-specific, non-shared datasets. Note that the auto-generated dataset id's follow the pattern 5001, 5002, 5003, etc. When manually assigning a dataset id, use a pattern (such as 1001, 1002, 1003, etc) that will not collide with any auto-generated ids that may be created in the future.
Shared Demographic Datasets
Once shared datasets and shared timepoints have been enabled, you can enable shared data, not just shared definitions.
Enable data sharing means that any individual records entered at the folder level will appear at the project level. In effect, the project level dataset become a union of the data in the child datasets. Note that inserting data directly in the project level dataset is disabled
- Navigate to the dataset definition in the top level project.
- Edit the dataset definition.
- In the dataset designer, ensure there is a checkmark next to Demographic Data.
- Use the dropdown Share Demographic Data to enable data sharing:
- When No is selected (the default) each child study folder 'owns' its own data rows.
- If the study has shared visits/timepoints, then Share by Participants means that data rows are shared across the project, and studies will only see data rows for participants that are part of that study.
The Dataspace Container
Note that the project-level container that shares its datasets and timepoints with children sub-studies does not behave like an "ordinary" study. In fact, it is a different container type: a Dataspace container, which does not follow the same rules and constraints that are enforced in regular studies. This is especially true of the uniqueness constraints that are normally associated with demographic datasets. This uniqueness contraint does not apply to datasets in the top-level Dataspace project, so it is possible to have a demographics table with duplicate participant ids, and similar unexpected behavior.
If the same participant id occurs in multiple studies, participants groups may exhibit unexpected behavior. Participant groups do not track containers, they are merely a list of strings (participant ids), and cannot distinguish the same participant id in two different containers.
When viewed from the project-level study, participants may have multiple demographics datasets that report different information about the same id, there might be different dates or cohort membership for the same visit, etc.