From the Manage Datasets
page, administrators can click a dataset name to see and edit its properties and fields. The same interface can be reached by clicking Manage
from the grid itself.
Buttons offer the following options:
- View Data: See the current contents of the dataset.
- Edit Associated Visits/Timepoints: Select the visits/timepoints where data will be collected.
- Manage Datasets: Click to go to the page for managing all datasets.
- Delete Dataset: Delete the selected dataset including its definition and properties as well as all data rows and visitmap entries. You will be asked for confirmation as this action cannot be undone. After deleting a dataset, it is advisable to recalculate visits/dates and/or delete unused visits in order to clear any empty visit and participant references that may remain in the study.
- Delete All Rows: Deletes all data rows, but the dataset definition and properties will remain. You will be asked for confirmation as this action cannot be undone.
- Show Import History: Shows the files that were uploaded to the dataset after its initial creation. If a file was used to create/infer the dataset it will not appear here, nor will records that were added/updated individually.
- Edit Definition: Modify dataset properties and add or modify the dataset fields.
Edit Dataset Properties
From the manage page, click Edit Definition
. You will see and be able to edit the properties:
- Name: Required. This name must be unique. It is used when identifying datasets during data upload.
- Label: The name of the dataset shown to users. If no Label is provided, the Name is used.
- Description: An optional longer description of your dataset.
- Category: Assigning a category to a dataset will group it with similar datasets in the navigator and data browser. Select an existing category from the dropdown, or type a new one in this box to add it. Learn more about categories in this topic: Manage Categories.
Data Row Uniqueness
In the Data Row Uniqueness
section, select how the dataset is keyed, meaning how unique rows are determined. Note that changing the keying of a populated dataset may not be possible (or desireable).
- Participants only (demographic data): There is one row per participant.
- For example, enrollment data is collected once per participant.
- Each participant has a single date of birth.
- Participants and timepoints/visits: There is (at most) one row per participant at each timepoint or visit.
- For example, physical exams would only be performed once for each participant at each visit. Fields here would have one measured value for a subject at a time, but there might be many rows for that subject: one for each time the value was measured.
- Each participant/visit yields a single value for weight.
- Participants, timepoints, and additional key field: If there may be multiple rows for a participant/visit combination, such as assay data from testing a sample taken at an appointment, you would need a third key. Details are below.
- Additional Key Field: Select the field that will uniquely identify rows alongside participant and time.
- For example, each blood sample taken from a participant at a visit is run through a series of different tests with many results for that person on that day.
Additional Key Fields
Some datasets may have more than one row for each participant/visit (i.e. time) pairing. For example, a sample taken from a participant at that visit might be tested for neutralizing antibodies to several different virus strains. Each test (participant/visit/virus combination) could then become a single unique row of a dataset. See example table below. These data rows are not "legal" in a standard dataset because they have the same participant/visit values. An additional key field is needed.
You have several options for this additional key:
- User Managed Key: If you know that there will always be a unique value in a data column, such as VirusId in the above example, you can use that data column to define row uniqueness.
- Select the column to use using the Additional Key Field pulldown, which will list the fields in the dataset.
- You will provide the value for this column and manage row uniqueness yourself.
- The time portion of the date time field in a time-based or continuous study:
- If a test reports a reading every few minutes, you could use the time portion of the date column.
- Depending on how close together your events are, this may or may not provide row uniqueness.
- System Managed Key: If your data does not fit either of the above, you can create a new integer or text field that the server will auto-populate to keep rows unique. For example, you might create a new (empty) field called "ThirdKey", and click the checkbox below:
- Let server manage fields to make entries unique: When checked, the server will add values to the selected field to ensure row uniqueness. Integer fields will be assigned auto-incrementing integer values; Text fields will be assigned globally unique identifiers (GUIDs).
- Note that you should not use this option for existing data columns or if you will be providing values for this field, such as in the above VirusId example.
Note that when creating a new dataset and configuring additional keys, you may see error messages in the wizard as you cannot select a field until you have added it in the Fields
You should also use caution whenever editing an existing dataset's keying. It is possible to get your dataset into an invalid configuration. For example, if you have a three-key dataset and edit the keying to be "demographic", you will see unique-constraint violations.
Using Time as an Additional Key
In a date-based or continuous study, an additional third key option is to use the Time (from Date/Time)
portion of a datestamp field. In cases where multiple measurements happen on a given day or visit (tracking primate weight for example), the time portion of the date field can be used as an additional key without requiring duplication of this information in an additional column.
Miscellaneous options are offered if you click Advanced Settings
for dataset properties:
- Show dataset in overview: Checked by default; uncheck if you don't want to include this dataset in the overview grid.
- DatasetID: By default, dataset IDs are automatically assigned. If you want to specify your own, use this option. All datasets must have a unique dataset ID.
- Cohort Association: If this dataset should be associated with a specific cohort only, select it here, otherwise all datasets are associated with any cohort.
- Tag: Use tags to provide additional ways to categorize your datasets.
Edit Dataset Fields
Below the dataset properties, you will see the set of fields, or columns, in your dataset. To edit fields and their properties, follow the instructions in this topic: Field Editor