The ETL allows you to have a RemoteQuery source, such as a dataset on another server. If the source table is large and relatively stagnant, doing an incremental merge (based on the modified field) is in theory nice. However, I need to deal with the potential of deletes from the source table. In our case, the source has auditing, so I assume I could use the audit tables to try to figure that out (although queries against this are painfully slow). Is there an example of an existing ETL that selects from datasets, dealing with deletes?
Thanks,
Ben
|
|
Jon (LabKey DevOps) responded: |
2018-04-11 15:50 |
|
|
Ben Bimber responded: |
2018-04-11 15:59 |
So incrementalFilter has deletedRowsSource, which I assume is the select that identifies a list of PKs to remove; however, that needs to come from somewhere. Auditing is really the only can I can think to come up with this list of IDs. Because we're pulling from a fairly standard LK data source (datasets), I thought I'd see if any groups have experience with how to approach this. |
|
Jon (LabKey DevOps) responded: |
2018-04-20 12:13 |
Hi Ben,
You're right, https://www.labkey.org/Documentation/wiki-page.view?name=etlSchedule#deletedRowsSource would work in this case, but this typically is used on external sources and a database trigger would populate a deletedRowsSource (like an audit table) on the deletes.
You could use a query over the LabKey audit table like you're thinking, but it would be slow as you've seen. Indexing could possibly help speed things up here. But another alternative would be to create an afterDelete js trigger script on the dataset to populate a second (custom) table as the deletedRowsSource.
Regards,
Jon |
|
|
|