SQL NOT IN using another query's column for filtering

LabKey Support Forum (Inactive)
SQL NOT IN using another query's column for filtering Anthony Corbett  2012-10-03 08:12
Status: Closed
 
I'm working on creating queries that will supply summary statistics for an NIH inclusion report. Data must be reported as count of Participants for each Race grouped by Sex.

The dataset that holds this information is named 'Demographics' and has the following fields defined:

Field: Data Type:
----------- --------------
ParticipantId String
SequenceNum Int // always -1
Sex Int // lookup to list that represents Sex: 0=Male 1=Female
White Int // lookup to list that represents Booleans: 0=False 1=True
Black Int // lookup to list that represents Booleans: 0=False 1=True
Asian Int // lookup to list that represents Booleans: 0=False 1=True
NatAmer Int // lookup to list that represents Booleans: 0=False 1=True
Pacific Int // lookup to list that represents Booleans 0=False 1=True
NotReported Int // lookup to list that represents Booleans 0=False 1=True

Data structure Rules:
1. Each row in the dataset represents a single Participant's Racial makeup
2. Each Participant can have multiple races indicated (Each row can have 1's in multiple columns).


To create the summary statistics for the count of participants for each race grouped by Sex I started with the following query:

SELECT
  demo.Sex,
  SUM(demo.White) as white,
  SUM(demo.Black) as black,
  SUM(demo.Asian) as asian,
  SUM(demo.NatAmer) as nativeAmerican,
  SUM(demo.Pacific) as pacific,
  SUM(demo.NotReported) as notReported,
  Count(demo.ParticipantId) as total,
FROM Demographics as demo
GROUP BY demo.Sex

Since each race value is a Int (0 or 1) I can use SUM to aggregate the count of races grouped by Sex, great.

However, because rule #2 above the counts of each race/sex doesn't match the total (count of Partipcants of each Sex). I will deal with counts of multi-racial participants in another way... Therefore, I need to use a WHERE clause to filter out those Participants with multiple races indicated. There is multiple ways to write this WHERE clause:

1. Computing the number of 1's (True's) for each row and checking if it equals 1 (only one race indicated)
WHERE demo.White + demo.Black + demo.Asian + demo.NatAmer + demo.Pacific + demo.NotReported = 1

2. Refactor the above WHERE computation into a subquery, then use a NOT IN to filter ParticiantId's
WHERE demo.ParticipantId NOT IN (
    SELECT
      Demographics.ParticipantId,
    FROM Demographics
    WHERE (
      Demographics.White +
      Demographics.Black +
      Demographics.Asian +
      Demographics.NatAmer +
      Demographics.Pacific +
      Demographics.NotReported
    ) > 1
)

Both work and produce the same results. I actually prefer the second WHERE clause as I can see other queries I'll need to write where knowing which participants are multi-racial will need to be known. So the next logical step is to put the subquery in it's own named query called 'MultipleRaceParticipants'.

Unfortunately, when I try to use 'MultipleRaceParticipants' in the WHERE's NOT IN clause it can't resolve the ParticipantId column of the named query.

SQL QUERY:
------------------------
SELECT
  demo.Sex,
  SUM(demo.White) as white,
  SUM(demo.Black) as black,
  SUM(demo.Asian) as asian,
  SUM(demo.NatAmer) as nativeAmerican,
  SUM(demo.Pacific) as pacific,
  SUM(demo.NotReported) as notReported,
  COUNT(demo.ParticipantId) as total,
FROM Demographics as demo
WHERE demo.ParticipantId NOT IN ( MultipleRaceParticipants.ParticipantId )
GROUP BY demo.Sex

ERROR:
-------------------
Error on line 11: Could not resolve column: MultipleRaceParticipants/ParticipantId


Do I have a [LABKEY] SQL syntax error or does the query service not allow named queries inside a NOT IN operation?

Any help would be appreciated!

Thanks,

Anthony Corbett
Department of Biostatistics and Computational Biology
University of Rochester
 
 
Anthony Corbett responded:  2012-10-03 08:55
I've attached an example folder export that contains:

the study dataset, lists, and example queries.

I hope this helps.
 
Matthew Bellew responded:  2012-10-03 09:29
This is not valid SQL. I think you're looking for

SELECT
  demo.Sex,
  SUM(demo.White) as white,
  SUM(demo.Black) as black,
  SUM(demo.Asian) as asian,
  SUM(demo.NatAmer) as nativeAmerican,
  SUM(demo.Pacific) as pacific,
  SUM(demo.NotReported) as notReported,
  COUNT(demo.ParticipantId) as total,
FROM Demographics as demo
WHERE demo.ParticipantId NOT IN (SELECT Participant ID FROM MultipleRaceParticipants)
GROUP BY demo.Sex
 
Anthony Corbett responded:  2012-10-03 11:14
Ah, sorry for the oversight.

Thanks so much for the quick reply! Working great now!