Integrating with an external service

LabKey Support Forum (Inactive)
Integrating with an external service gkericks  2017-02-06 10:18
Status: Closed
 
Hi,

We are planning to use Labkey Server to facilitate reporting needs for several of our researchers. We currently use an external service to collect and store our client's data. This service stores data in a relational database and comes with an API for accessing the stored data. I am currently trying to figure out the best way to automate Labkey integration with our data storage and would like some advice on what feature of Labkey would work best to facilitate automated integration. I can write a script to extract data from the external service fairly easily. The API is available for most major programming languages, so choice of scripting language is not that important (with the notable exception of SQL).

What I would ideally want to be able to do is use an existing Labkey feature to schedule the running of the script and put the output into a Labkey dataset. I've narrowed it down to a few Labkey features that might be able to help, but am not sure which of these is best suited to my task:

 1) ETL module: The ETL module seems to capture the essence of what I want to achieve except that the "Extract" feature appears to be limited to SQL queries and I would need to invoke a script in a different language (R or python or Java etc.).
 2) Develop a custom module: I am aware that Labkey allows developers to make custom modules for particular purposes. This seems powerful, however I feel like I would be reinventing the wheel if I went this approach and would like to know what exists already that I am missing.
 3) Labkey pipelines: This is a way to run a series of scripts. Does anyone know if these can be used to load data from external sources on a particular schedule?
 4) External Schemas (https://www.labkey.org/home/Documentation/wiki-page.view?name=externalSchemas): This appears to require direct access to the relational database that our external service uses to store the data. It might be possible for me to expose those tables to Labkey, although for reasons that should be obvious I would like to avoid doing that.

Finally, it is in my capability to set up a service independent of Labkey that sends data to Labkey via its API. I would prefer to use a Labkey feature if possible (mostly to keep Labkey related code internal to the Labkey server) but will do it this way if there is nothing within Labkey that fits my use case.

Thank you and feel free to reply with some questions if anything I wrote is unclear.
 
 
jeckels responded:  2017-02-10 17:30
Hello,

Your list looks pretty complete to me in terms of possible approaches. Some additional context that may be helpful:

1. ETLs are run as LabKey pipelines under the covers, and you can inject custom tasks, implemented in a custom Java module, into their execution sequence. If you have the LabKey Server source code, you can find a test/example at:

server\test\modules\ETLtest\resources\ETLs\appendAndTaskRefTask.xml

It uses the custom task as implemented here:

server\modules\dataintegration\src\org\labkey\di\steps\TestTaskRefTask.java

However, you're correct that there is not currently a way to pipe in data from a non-SQL source. ETLs have a convenient feature of being able to run on a schedule.

3. You could certainly use a script or custom Java task that pulls from an external source. The pipeline module itself does not currently have a scheduling mechanism though.

4. Yes, this would be possible, but would of course avoid the API that you're looking to leverage for extracting the data.

The server does have the Quartz library behind the scenes, which can be used for cron-like scheduling in custom modules. It's what the ETL system uses underneath to kick off its jobs on the requested schedule.

Thanks,
Josh