ETLs Lind-Thomsen  2016-01-05 05:22
Status: Closed
 
Hi

I'm working with ETL's and have some questions. They spring from the fact that we have a collection of source tables on an external database server, which needs some basic cleaning/transforming before loading them into labkey. e.g. we need to change some',' and '.' to be able to convert a column to floating point.

1: If I use a stored procedure server to populate a source table. How do I make sure that the stored procedure is executed before loading the data into the destination?

2: Is it possible to use a stored procedure which return a table/select query as a source in the ETL step so that it is loaded directly into the destination?

3: Is there a better way to do the transform step than in stored procedures when using Labkey?

4: How does <incrementalFilter> work?

I couldn't find answers to these questions in the documentation, but if I have overlooked something please point me in the right direction.

Best wishes
Allan
 
 
Jon (LabKey DevOps) responded:  2016-01-05 12:41
Hi Allan,

I've supplied the answers to each of your questions along with the link to the resource.

Question 1: If I use a stored procedure server to populate a source table. How do I make sure that the stored procedure is executed before loading the data into the destination?

Answer: The XML code works in a linear fashion. So just like the example mentioned here under "Append with Two Targets" (https://www.labkey.org/home/Documentation/wiki-page.view?name=etlSamples), the first transform Id listed called "step 1" will run first, then the next transform Id called "step 2" will run. If you are going to use a stored procedure and you want that to run before the data loads into the target from the source, you will want to list the stored procedure first in your ETL XML file under the transforms section and then subsequently list the targeting. You can even have the ETL module determine whether to actually run the ETL or not based on whether the stored procedure indicates if there is work to be done - https://www.labkey.org/home/Documentation/wiki-page.view?name=etlstoredproccheck


Question 2: Is it possible to use a stored procedure which return a table/select query as a source in the ETL step so that it is loaded directly into the destination?

Answer: Yes, a stored procedure can be used to move data from the source to the target as well. If you look at the example we provided here, the ETL XML is using a stored procedure to populate a Patients table. https://www.labkey.org/home/Documentation/wiki-page.view?name=etlsproc


Question 3: Is there a better way to do the transform step than in stored procedures when using Labkey?

Answer: Stored procedures are the only method to do the transformative work and puts the defining "T" in ETL. If transformative work needs to be done otherwise, the work would have to be done outside of the ETL module by editing the source or the destination directly.



Question 4: How does <incrementalFilter> work?

Answer: It is a filter strategy that allows the ETL to identify new rows in the source database. The strategy allows a special value on the destination table to be compared to the source and only pulls over newer rows based on that value. You only have three options for filters. Either have no filtering (SelectAllFilterStrategy), filtering based on an increasing integer column (RunFilterStrategy), or use a date/timestamp to identify the new records (ModifiedSinceFilterStrategy). Using this filter allows you to use the append option to add new rows to your target table and not accidentally run into any conflicts (i.e. adding duplicate records). - https://www.labkey.org/home/Documentation/wiki-page.view?name=etlSchedule

Regards,

Jon
 
Lind-Thomsen responded:  2016-01-05 22:33
Dear Jon

Just want to say thanks for some very good answers, it was most appreciated.

Best wishes
Allan