Table of Contents |
guest 2025-05-18 |
The following import methods are available:
For bulk imports, you can select Update rows to update existing data rows during import. By default, any data for new rows will cause the update to fail.
Check the box to Allow new rows during update to merge both existing and new rows.
If Update rows and Allow new rows during update are both selected:
Subscribers to premium editions of LabKey Server can learn about adding a Bulk Edit button with the example code in this topic:
Note that SequenceNum values cannot be bulk edited, as they are used in aligning subject and visit uniqueness for datasets.To define the location from which the pipeline will process files, follow the instructions in this topic: Set a Pipeline Override. You may use the standard pipeline root or a pipeline override allows you to load files from the location of your choosing.
A pipeline configuration file controls the operation of the pipeline job. For dataset archives, the configuration file is named with the .dataset extension and contains a set of property/value pairs.
The configuration file specifies how the data should be handled on import. For example, you can indicate whether existing data should be replaced, deleted, or appended-to. You can also specify how to map data files to datasets using file names or a file pattern. The pipeline will then handle importing the data into the appropriate dataset(s).
Each line of a dataset archive contains one property-value pair, where the string to the left of the '=' is the property and the string to the right is the value. The first part of the property name is the id of the dataset to import. In our example the dataset id shown is '1'. The dataset id is always an integer.
The remainder of the property name is used to configure some aspect of the import operation. Each valid property is described in the following section.
The following example shows a simple .dataset file:
1.action=REPLACE
1.deleteAfterImport=FALSE
# map a source tsv column (right side) to a property name or full propertyURI (left)
1.property.ParticipantId=ptid
1.property.SiteId=siteid
1.property.VisitId=visit
1.property.Created=dfcreate
In addition to defining per-dataset properties, you can use the .dataset file to configure default property settings. Use the "default" keyword in the place of the dataset id. For example:
default.property.SiteId=siteid
Also, the "participant" keyword can be used to import a tsv into the participant table using a syntax similar to the dataset syntax. For example:
participant.file=005.tsv
participant.property.SiteId=siteId
The properties and their valid values are described below.
action
This property determines what happens to existing data when the new data is imported. The valid values are REPLACE, APPEND, DELETE. DELETE deletes the existing data without importing any new data. APPEND leaves the existing data and appends the new data. As always, you must be careful to avoid importing duplicate rows (action=MERGE would be helpful, but is not yet supported). REPLACE will first delete all the existing data before importing the new data. REPLACE is the default.
enrollment.action=REPLACE
deleteAfterImport
This property specifies that the source .tsv file should be deleted after the data is successfully imported. The valid values are TRUE or FALSE. The default is FALSE.
enrollment.deleteAfterImport=TRUE
file
This property specifies the name of the tsv (tab-separated values) file which contains the data for the named dataset. This property does not apply to the default dataset. In this example, the file enrollment.tsv contains the data to be imported into the enrollment dataset.
enrollment.file=enrollment.tsv
filePattern
This property applies to the default dataset only. If your dataset files are named consistently, you can use this property to specify how to find the appropriate dataset to match with each file. For instance, assume your data is stored in files with names like plate###.tsv, where ### corresponds to the appropriate DatasetId. In this case you could use the file pattern "plate(\d\d\d).tsv". Files will then be matched against this pattern, so you do not need to configure the source file for each dataset individually. If your files are defined with names like dataset###.tsv, where ### corresponds to the dataset name, you can use the following file pattern "dataset(\w*).tsv".
default.filePattern=plate(\d\d\d).tsv
property
If the column names in the tsv data file do not match the dataset property names, the property property can be used to map columns in the .tsv file to dataset properties. This mapping works for both user-defined and built-in properties. Assume that the ParticipantId value should be loaded from the column labeled ptid in the data file. The following line specifies this mapping:
enrollment.property.ParticipantId=ptid
Note that each dataset property may be specified only once on the left side of the equals sign, and each .tsv file column may be specified only once on the right.
sitelookup
This property applies to the participant dataset only. Upon importing the particpant dataset, the user typically will not know the LabKey internal code of each site. Therefore, one of the other unique columns from the sites must be used. The sitelookup property indicates which column is being used. For instance, to specify a site by name, use participant.sitelookup=label. The possible columns are label, rowid, ldmslabcode, labwarelabcode, and labuploadcode. Note that internal users may use scharpid as well, though that column name may not be supported indefinitely.
The virtual participant dataset is used as a way to import site information associated with a participant. This dataset has three columns in it: ParticipantId, EnrollmentSiteId, and CurrentSiteId. ParticipantId is required, while EnrollmentSiteId and CurrentSiteId are both optional.
As described above, you can use the sitelookup property to import a value from one of the other columns in this table. If any of the imported value are ambiguous, the import will fail.