Automated dataset update

General Server Forum (Inactive)
Automated dataset update Mya Warren  2017-10-26 16:50
Status: Closed
 
I am evaluating LabKey for sharing clinical data in my institution. I am using Labkey17.2-52553. Our clinical data is managed by external databases, and I'm wondering what is the best way to periodically sync LabKey with this data. I thought to use the manage reloading described here:

https://www.labkey.org/Documentation/Archive/17.2/wiki-page.view?name=importExportStudy#reload

My idea was to maintain a study directory on my local file system. I would write a script to periodically update the data in the study/datasets/dataset*.tsv files. I would then compress and upload the file to LabKey server for reload. I have many questions about this, for instance, where is the pipeline root, and how do I upload data there?

As a first step, I thought I would explore the study export/import/reload functionality (https://www.labkey.org/Documentation/Archive/17.2/wiki-page.view?name=importExportStudy&_docid=wiki%3Ab63f65f4-4c8a-1035-a0fc-fe851e084424). I tried to just export and then import the same study to a new directory on LabKey, and I'm already running into trouble. If I upload the zip archive (from local source) exactly as it was exported, then everything works fine. If I extract the archive, then rezip it, then I get this error when I try to import it:

"This archive doesn't contain a folder.xml or study.xml file."

I thought that this may be a problem with the fact that I am using a Mac, however I get the same error when I rezip in Windows.

Questions:
- Are there additional options I need to use with zip to create a file with the correct format for loading to LabKey?
- Does this method of syncing my data make sense (or is there a better way)?
- Where do I store the archive for automatic reloading (the pipeline root)?
 
 
Jason Leadley responded:  2017-10-30 04:30
Hello Mya

Many thanks for contacting us, we appreciate your interests and use of LabKey Server. I will ask my support colleagues to take a look at your questions here and rely accordingly. In the mean time, may I ask for your direct e-mail address please?

Many thanks

Jason
Director, EMEA Operations
LabKey
 
Mya Warren responded:  2017-10-30 08:48
Thanks Jason,
My email is mwarren@bcgsc.ca.
Cheers,
Mya
 
jeckels responded:  2017-10-30 18:15
Hi Mya,

1. Study reloads key off a study.xml or folder.xml file in your ZIP archive (or just as files directly on the file system). You'll also need dataset_manifest.xml and dataset_metadata.xml files. You can find some information on these files here:

https://www.labkey.org/Documentation/wiki-page.view?name=studySerializationFormats

A good set of examples would be to look at the ImportableStudyExport.folder.zip file here:

https://www.labkey.org/Documentation/wiki-page.view?name=setupDemoStudy

The study subdirectory's study.xml file is key, as is the content of the study/datasets directory. The other XML files should be optional, as is the Interactive.dataset file in the study/datasets directory.

2. Yes, this is a reasonable way to handle periodic refreshes, and a number of groups do exactly that. Another approach would be to set up ETLs to pull data more directly from the database (and skip the TSV file generation), which would be a better approach if you wish to have more frequent syncs (more than a couple of times a day), though it would require setting up a module to define the ETLs.

https://www.labkey.org/Documentation/wiki-page.view?name=etlModule

3. You can see the actual path on the web server if you go to Admin->Site->Admin Console->Files and expand the folder you're wanting to import into. It should be shown at the @pipeline child node, and is likely identical to the @files node.

Thanks,
Josh
 
Mya Warren responded:  2017-10-31 15:09
Hi Josh,

Thanks for your insights, I'll definitely look into this documentation. However, I'm still confused. The zip file I tried to upload actually did have study.xml and folder.xml files. This was an archive that I directly downloaded from a study directory in LabKey. If I uploaded the zip file exactly as I downloaded it, the import worked fine. If I instead unzipped it then rezipped it with no other modification, the import did not recognize these files. Have you ever heard of this problem before?

Mya
 
jeckels responded:  2017-10-31 15:51
Hi Mya,

Sorry I missed that part of your initial posting. I have not seen that problem before. Can you share a copy of the modified ZIP file? I wonder if the directory structure might have been accidentally flattened or similar (I've had that happen in the past when trying to modify an existing archive)/

Thanks,
Josh
 
Mya Warren responded:  2017-11-02 09:07
That's going to be complicated, for data privacy reasons. I'll look into it and get back to you.
 
jeckels responded:  2017-11-02 09:37
One option might be to exclude all of the TSV files with the actual data, and just include the XML files that give the metadata. That would likely be sufficient to see why the import isn't initializing correctly.

Thanks,
Josh
 
Mya Warren responded:  2017-11-02 10:20
Done! I've attached two files:
(1) the original study folder where I excluded the dataset data and assay data from the export (Admin -> Folder Management -> Export), and
(2) the folder that I unzipped and then immediately rezipped, with no other changes.
Any input would be greatly appreciated.
 
jeckels responded:  2017-11-02 16:59
I was able to import the archive without errors. Here's what I tried:

Admin->Folder->Management
Select Import tab
Click to browse to the .folder.zip file you shared
Click the Import Folder button

That launched the import job, which ran to completion. Are you following different steps?

If, in the same folder, you go to Admin->Go to Module->FileContent, do you see an "unzip" directory? If so, what does it contain?

Thanks,
Josh
 
Mya Warren responded:  2017-11-06 14:47
Hi Josh,
I had actually attached two files, and only one actually uploaded for some reason (Number 1 from my description above). I can also load that one. The one that I'm having trouble with is now attached! Can you try this one?
Mya
 
jeckels responded:  2017-11-06 15:00
Hi Mya,

Thanks for the file. The issue is that the import code expects folder.xml to be in the root of the ZIP file. In the file you shared, everything is under a Clinical_2017-11-02_09-57-52.folder directory within the ZIP. If you re-compress with a different root, you should be all set.

Thanks,
Josh
 
Mya Warren responded:  2017-11-06 15:03
Thank you! I feel ridiculous. I'll try that.
 
sadams7703 responded:  2019-05-03 16:09
Status: Active
Hi,
I have a related question that I don't think is quite addressed, but maybe I missed it.

I would like to use study "Reloading" as a manual way to update a study, as described above.
Specifically I would plan to:
1. Create the study manually in LabKey.
2. Export the study.
3. Change, save, replace the data files in the datasets folder, and zip up.
4. Reload the study from LabKey.
So, not fully automated, but easy enough.

However, if I understand correctly and based on experimenting, this only seems to work if no new data TSV files are added.
I thought from some of the documentation that LabKey would see "new" TSV files and infer from them. But, they seem to be ignored.
Similarly, new variables added to an existing TSV file seem to be ignored.
This is a little awkward because it is not too uncommon
So-- do I need to in fact, also edit the XML files datasets_metadata and datasets_manifest to make this work?
Is there an easier way-- or some setting that will make LabKey look for new files /variables?

Thanks,
Scott
 
sadams7703 responded:  2019-05-03 17:06
Hi, Can you also tell me why one of these works (with the "a" added on), but the other does not? The one without "a" was created from within Stata using a utility to zip files: https://www.stata.com/help.cgi?zipfile otherwise it seems to me identical. Yet I get an error when I try to use it Reload it to as a study. Created: 2019-05-03 17:08 Modified: 2019-05-03 17:08 Email: sadams@fredhutch.org Status: ERROR Info: Unable to get an instance of StudyDocument from study.xml File Path: /labkey/labkey/files/Sandlot/Auto2/@files/folder_load_2019-05-03_17-07-59.log Thanks, Scott
 
Jon (LabKey DevOps) responded:  2019-05-17 15:28
Status: Closed
Hi Scott,

For future, please submit all new questions as a new post rather than tack onto existing ones.

Regarding your questions, you can technically edit your zip file, but you would have to make full edits, which would include the XML files that map to the various TSV files to setup the datasets and you would also need to do your best to maintain the same format structure. This isn't something we support, so edit at your own risk.

Regards,

Jon