A pivot query helps you summarize and re-visualize data in a table. Data can be grouped or aggregated to help you focus on a particular aspect of your data. For example a pivot table can help you see how many data points of a particular kind are present, or it can represent your data by aggregating it into different categories.
Create Query
To try this yourself, you can download this sample file and import it as a Standard assay type. Remember your assay design name; we use "PivotDemo" below.
Create a new SQL query and edit its source.
- Select (Admin) > Go To Module > Query.
- Select a schema. In our example, we used the assay schema, specifically "assay.General.PivotDemo" in our example.
- Click Create New Query.
- Name it and confirm the correct query/table is selected. For our assay, we chose "Data" (the assay results).
- Click Create and Edit Source.
Syntax for PIVOT...BY Query
A PIVOT query is essentially a SELECT specifying which columns you want and how to PIVOT and GROUP them. To write a pivot query, follow these steps.
(1) Start with a base SELECT query.
SELECT ParticipantID, Date, RunName, M1
FROM Data
(2) Identify the data cells you want to pivot and how. In this example, we focus on the values in the
RunName column, to separate M1 values for each.
(3) Select an aggregating function to handle any non-unique data, even if you do not expect to need it. MAX, MIN, AVG, and SUM are possibilities. Here we have only one row for each participant/date/run combination, so all would produce the same result, but in other data you might have several sets with multiple values. When aggregating, we can also give the column a new name, here MaxM1.
(4) Identify columns which remain the same to determine the GROUP BY clause.
SELECT ParticipantId, Date, RunName, Max(M1) as MaxM1
FROM Data
GROUP BY ParticipantId, Date, RunName
(5) Finally, pivot the cells.
SELECT ParticipantId, Date, RunName, Max(M1) as MaxM1
FROM Data
GROUP BY ParticipantId, Date, RunName
PIVOT MaxM1 by RunName
PIVOT...BY...IN Syntax
You can focus on particular values using an
IN clause to identify specific columns (i.e. distinct rows in the original data) to use in the pivot.
(6) In our example from above, perhaps we want to see only data from two of the runs:
SELECT ParticipantId, Date, RunName, Max(M1) as MaxM1
FROM Data
GROUP BY ParticipantId, Date, RunName
PIVOT MaxM1 by RunName IN ('Run1', 'Run2')
Note that pivot column names are case-sensitive. You may need to use lower() or upper() in your query to work around this issue if you have two values who differ only by letter case.
In addition to reducing the set of result columns, specifying column names in an IN clause has a performance benefit, since LabKey won't need to execute the query to determine the column set.
PIVOT...IN using Sub-SELECT
If you won't know the specific column names for an IN clause (i.e. the distinct row values in the original data) at the time of writing your query, you can also use a sub-SELECT statement in the IN clause. You could either pull the column names from a separate list, a simpler query, or as in this simple query, from values in the same table you will pivot. For example:
SELECT Language, Gender, MAX(Height) AS H
FROM Demographics
GROUP BY Language, Gender
PIVOT H By Gender IN (SELECT DISTINCT Gender FROM Demographics)
Grouped Headers
You can add additional aggregators and present two pivoted columns, here we show both an minimum and a maximum M1 value for each participant/date/run combination. In this case they are the same, but the pattern can be applied to varying data.
SELECT ParticipantId, Date, RunName, Min(M1) as MinM1, Max(M1) as MaxM1
FROM Data
GROUP BY ParticipantId, Date, RunName
PIVOT MinM1, MaxM1 by RunName IN ('Run1', 'Run2')
"Summary" Columns
"Summary" columns are those columns not included in the group-by or pivoted list. When generating the pivot, these summary columns are aggregated as follows:
- a COUNT or SUM aggregate summary column is wrapped with SUM
- a MIN or MAX is wrapped with a MIN or MAX
For example:
SELECT
-- group by columns
AssignedTo,
Type,
-- summary columns. Turned into a SUM over the COUNT(*)
COUNT(*) AS Total,
-- pivoted columns
SUM(CASE WHEN Status = 'open' THEN 1 ELSE 0 END) AS "Open",
SUM(CASE WHEN Status = 'resolved' THEN 1 ELSE 0 END) AS Resolved
FROM issues.Issues
WHERE Status != 'closed'
GROUP BY AssignedTo, Type
PIVOT "Open", Resolved BY Type IN ('Defect', 'Performance', 'To Do')
Options for Pivoting by Two Columns
Two levels of PIVOT are not supported, however some options for structuring your query may give you the behavior you need. In this scenario, you have experiment data and want to pivot by both peak intensity (not known up front) and sample condition.
The closest option is to concatenate the two values together and pivot on that "calculated" column.
SELECT
Run.Batch,
Run.Sample,
COUNT(*) AS ResultsCount,
Run.SampleCondition || ' ' || PeakLabel AS ConditionPeak,
AVG(Data.PercTimeCorrArea) AS AvgPercTimeCorrArea,
FROM Data
GROUP BY Run.Batch, Run.Sample, Run.SampleCondition || ' ' || PeakLabel
PIVOT AvgPercTimeCorrArea BY ConditionPeak
If concatenation is not practical, you could use a version like this. It will include duplicate Batch and Sample columns, which you could get rid of by explicitly listing the aggregates from each sample condition sub-query. Syntax like "SELECT NR.* EXCEPT (NR_Batch, NR_Sample)" to get all of the pivot columns except the joining columns is not supported.
SELECT
NR.*,
RED.*
FROM (
SELECT
Run.Batch AS NR_Batch,
Run.Sample AS NR_Sample,
--Run.SampleCondition,
PeakLabel,
AVG(Data.PercTimeCorrArea) AS NR_AvgPercTimeCorrArea,
FROM Data
WHERE Run.SampleCondition = 'NR'
GROUP BY Run.Batch, Run.Sample, Run.SampleCondition, PeakLabel
PIVOT NR_AvgPercTimeCorrArea BY PeakLabel
) NR
INNER JOIN (
SELECT
Run.Batch AS RED_Batch,
Run.Sample AS RED_Sample,
--Run.SampleCondition,
PeakLabel,
AVG(Data.PercTimeCorrArea) AS RED_AvgPercTimeCorrArea,
FROM Data
WHERE Run.SampleCondition = 'RED'
GROUP BY Run.Batch, Run.Sample, Run.SampleCondition, PeakLabel
PIVOT RED_AvgPercTimeCorrArea BY PeakLabel
) RED
ON NR.NR_Batch = RED.RED_Batch AND NR.NR_Sample = RED.RED_Sample
Examples
Example 1 - Open issues by priority for each area
Another practical application of a pivot query is to display a list of how many issues of each priority are open for each feature area.
See the result of this pivot query:
Pivot query on the LabKey Issues tableSELECT Issues.Area, Issues.Priority, Count(Issues.IssueId)
AS CountOfIssues
FROM Issues
GROUP BY Issues.Area, Issues.Priority
PIVOT CountOfIssues BY Priority IN (1,2,3,4)
Example 2 - Open issues by type grouped by who they are assigned to
See the result of this pivot query:
Pivot query on the LabKey Issues tableSELECT
-- Group By columns
AssignedTo,
Type,
-- Summary columns. Turned into a SUM over the COUNT(*)
COUNT(*) AS Total,
-- Pivoted columns
SUM(CASE WHEN Status = 'open' THEN 1 ELSE 0 END) AS "Open",
SUM(CASE WHEN Status = 'resolved' THEN 1 ELSE 0 END) AS Resolved
FROM issues.Issues
WHERE Status != 'closed'
GROUP BY AssignedTo, Type
PIVOT "Open", Resolved BY Type IN ('Defect', 'Performance', 'To Do')
Example 3 - Luminex assay fluorescence intensity by analyte
SELECT ParticipantId, date, Analyte,
Count(Analyte) AS NumberOfValues,
AVG(FI) AS AverageFI,
MAX(FI) AS MaxFI
FROM "Luminex Assay 100"
GROUP BY ParticipantID, date, Analyte
PIVOT AverageFI, MaxFI BY Analyte
Example 4 - Support tickets by priority and status
SELECT SupportTickets.Client AS Client, SupportTickets.Status AS Status,
COUNT(CASE WHEN SupportTickets.Priority = 1 THEN SupportTickets.Status END) AS Pri1,
COUNT(CASE WHEN SupportTickets.Priority = 2 THEN SupportTickets.Status END) AS Pri2,
COUNT(CASE WHEN SupportTickets.Priority = 3 THEN SupportTickets.Status END) AS Pri3,
COUNT(CASE WHEN SupportTickets.Priority = 4 THEN SupportTickets.Status END) AS Pri4,
FROM SupportTickets
WHERE SupportTickets.Created >= (curdate() - 7)
GROUP BY SupportTickets.Client, SupportTickets.Status
PIVOT Pri1, Pri2, Pri3, Pri4 BY Status
Related Topics