Lists disappearing?? Serious issue.

LabKey Support Forum (Inactive)
Lists disappearing?? Serious issue. Ben Bimber  2014-08-27 07:59
Status: Closed
 
We has something happen twice now. The first time I chalked up to me being crazy, but this one seems real. We have a regular folder with about 4 lists. The data are not especially active, although this shouldnt matter. One time a couple weeks ago, and more recently this morning, the lists and their DB tables are just gone. When I go through the UI, that container also doesnt report having any lists (as opposed to showing the list, but not finding the physical table).

The only reason I even noticed this is because I have a custom piece of java code that creates an admin summary for me each day, SiteSummaryNotication.getListSummary(). This code calls DbSchema.get("list"), and then iterates each table to get the row count. The odd thing is that getTableNames() still contains the names of these deleted tables (caching?). When it hits c13d367_grants (a deleted table), it blows up when it tries to run SQL because that physical table is no longer there. If this didnt happen I probably would never have known they were gone. Therefore whatever happened occurred between 5AM yesterday and 5AM today.

The odd thing is that this happened to the same container both times. The users with access to this container are not all that saavy, but I am going to talk to them. This part makes no sense to me.

My question is: what steps can I do to troubleshoot? What audit entries and/or system logs will help figure out who/why those lists were deleted? I am looking through the audit tables, but do not see any mention of list deletes, or any real activity around them.

Second, are there other DB queries I can run to see whether pieces of the lists exist somewhere in the DB (ontology manager, for example)? Because the delete of the lists seems so clean and complete, this suggests it happened via LabKey, rather than something dropping the physical table only.

The server is onprc14.2 / sqlserver / RHEL. Thanks for any help.
 
 
Dax responded:  2014-08-27 10:55
Hi Ben,

As a starting point, look in your tomcat access logs and see who accessed the server during the time when the list was deleted. Look for any list activity /list/ URL hits (particularly deleteListDefinition actions) or perhaps even a failed import.
 
adam responded:  2014-08-27 11:03
Also, since you suspect caching may be involved, I would "clear caches" and see if getTableNames() still returns those table names.
 
Ben Bimber responded:  2014-08-27 11:11
Hi Dax,

Thanks. Since the post I learned some more, but still cant explain this:

- domain audit log has a record of this ('The domain Grants was deleted'). It tells me the user and time of day. That person did not consciously do this (thus far i believe them); however, it's tough to figure out how one does this accidentally. The audit event is logged at 2014-08-26 13:05. I'm somewhat glad to at least see this is logged; however, still unsure how it originated.

I looked at the access log for this time period, but nothing stands out to me. List-related stuff was the first thing I grep'd for, but didnt see anything at all. I can email the log if anyone wants to take a look.

Adam - I'm trying the cache idea now. It's not so much that I suspect caching as the root cause. I just noticed this problem only because of the weird DbSchema tablenames cache problem that caused my notification to attempt to run SQL against a deleted table.
 
adam responded:  2014-08-27 11:15
I agree that caching is likely not the underlying problem, but it's a puzzling symptom that could imply something about the code path that deleted this domain.

Adam
 
Ben Bimber responded:  2014-08-27 11:16
yep.

I'm posting the tomcat access logs in the secure forum, here:

https://www.labkey.org/issues/ONPRC/Support%20Tickets/details.view?issueId=21395

in case it helps. nothing is standing out to me though.
 
Ben Bimber responded:  2014-08-27 11:20
clearing the caches did allow the notification to run correctly, fwiw.
 
jeckels responded:  2014-08-27 16:55
Ben and I identified the underlying issue:

https://www.labkey.org/issues/home/Developer/issues/details.view?issueId=21400

Thanks,
Josh