Matthew Bellew responded: |
2011-11-08 11:12 |
Just for sanity I'd break it down into two problems
1) find assignments that overlaps the period of interest
2) find assignments that overlap each other
NOTE
1) (A,B) overlaps (C,D) can be accomplished with the test to B>C AND A<D
2) on postgres only labkey supports the method "overlaps(a,b,c,d)" which translates into the pgsql operator "overlaps"
Query: AssignmentsInPeriod
SELECT * FROM assignments a
WHERE a.startdate < EndDate AND a.enddate > StartDate
Query: OverlappingAssignments
SELECT ...
FROM assignments a1 INNER JOIN assignments a2 ON (a1.id=a2.id and a1.project < a2.project)
WHERE a1.start < a2.end AND a1.end > a2.start
That's pseudo code of course, but I hope that helps.
see also the use of query parameters in
https://www.labkey.org/wiki/home/Documentation/page.view?name=labkeySql |
|
Matthew Bellew responded: |
2011-11-08 11:13 |
Of course, the second query should be "FROM AssignmentsInPeriod a1..." |
|
Ben Bimber responded: |
2011-11-08 11:13 |
hi daniel,
in general, handling date overlaps in EHR queries is a pain. you could also consider looking at the EHR query study.housingRoommates, which does something conceptually similar to what you're talking about (housing overlaps instead of assignments).
that query finds housing overlaps and calculates the duration of the overlap.
study.PerDiems and study.PerDiemsByDay might also be good templates, as these sum the total overlapping assignments over a date range. in fact these might work straight away for your purpose. |
|
Daniel Nicolalde responded: |
2011-11-08 13:40 |
Hi Ben and Matthew,
Thank you for the quick responses, I look at both examples you mentioned and there is not a problem with calculating the overlap between intervals, the problem is with projects that started earlier than the cut of date and continue after the day.
The approach that Matthew recommends seams interesting, the question that I have is how to create the temporary table that will exclude any assignment outside the StartDate and EndDate. I created a table PreStudyOverlap:
PARAMETERS (StartDate TIMESTAMP, EndDate TIMESTAMP)
SELECT *
FROM Assignment h
WHERE h.date < CAST(StartDate as DATE) AND h.enddate > StartDate
and I call it from my main Query to do the inner join, the question that I have is how the parameters as pass between tables? that approach seam to work. I try including it in the main query and did not gave me the same results. I still need to hash out some day calculation but it is working. |
|
Ben Bimber responded: |
2011-11-08 14:04 |
Hi Daniel,
I believe Matt is suggesting write 2 different SQL statements, for the purpose of keeping it cleaner. It doesnt actually make a temp table, which isnt possible in labkey-sql. You can refer to one statement from the other.
If the problem is finding assignments overlapping a date range, look at study.AssignmentOverlaps, which has StartDate/EndDate params and returns any assignments overlapping those dates, accounting for the caveats you note, including null enddates (which i dont think your examples would handle). If you joined that query to itself i think you'll get your result.
Note on Matt's point: rather than trying to copy this SQL from here into your other statement you could just do:
select * from study.AssignmentOverlaps A1 join study.AssignmentOverlaps A2 on (.....)
params and everything should be taken care of. then you only need to worry about the quirks of date overlaps in 1 place. |
|
Daniel Nicolalde responded: |
2011-11-09 13:03 |
Hi Ben,
Thank you for the help, the table is working. I am now trying to calculate the exact number of days two project assignments overlap. I am separating into four categories:
1. Projects that started before and go beyond the time interval of interest
2. Projects that start before and finish with the time interval
3. Projects that started within the time interval and finish during the time interval
4. Project that start during the time interval and finish after the time interval.
For the first option I am using the following CASE, some project don't have endate therefore I am using curdate to set those dates:
CASE
WHEN ((CAST (h.date AS DATE) < CAST(StartDate as DATE)) AND (CAST (COALESCE(curdate(), h.enddate) AS DATE) > CAST(EndDate AS DATE)))
The problem is that it display almost all the alternatives but in some case it displays it adds time intervals that do not adhere to this statement, I am looking for project in 2007 and it show more than two projects that started before 2007 and finish in 2007.
Is there a bug in the way CASE is handle by LabKey or I am having a typo in my statement?
Thank you
Daniel |
|
Matthew Bellew responded: |
2011-11-09 13:17 |
Without look at this too hard, the COALESCE expression seems suspect. Since CURDATE() is never NULL, COALESCE(CURDATE(), h.enddate) will always equal CURDATE().
I think you need to switch the arguments. |
|
Ben Bimber responded: |
2011-11-09 13:38 |
hi daniel,
if i understand your question right, then in the interest of sanity i might normalize start/stop dates in an inner statement like:
select
h.startdate,
h.enddate,
--syntax might not be right
timestampdiff(h.enddate, h.startdate) as timedifference
from (
select
case
when h.date<STATEDATE THEN STATEDATE
else h.date
end as StateDate,
CASE
WHEN (h.enddate is null or h.enddate > ENDDATE) then ENDDATE
else h.enddate
END as enddate
)
this is also an example of why splitting it into multiple SQL statements can help. you have one that generates the overlaps then you can test/validate. you can have a second one to take this input and calculate days between or whatever else you need. |
|