The task:
From Termination_Call_Detail (TCD) we need to select the unique identifier of the final row for each ACD call, for the last 6 months.
The "6 months" clause: uses DateTime (indexed).
The "ACD call" clause: we use RouterCallKeyDay > 0 AND RouterCallKey > 0 (both are indexed).
The "unique identifier" part: let's not complicate things and just use RecoveryKey.
The challenge:
The "final row" - we know there are multiple rows per each ACD call in TCD.
Let's not complicate things and just pretend rows are ordered by RecoveryKey.
Analysis:
We've got multiple options:
- GROUP BY
- RANK/DENSE_RANK/ROW_NUMBER
- LAST_VALUE
The good old GROUP BY approach:
SELECT tcd.RouterCallKeyDay, tcd.RouterCallKey, MAX(tcd.RecoveryKey) AS maxTcdRecoveryKey
FROM Termination_Call_Detail tcd
WHERE tcd.DateTime > DATEADD(m,-6,CAST(GETDATE() AS DATE))
AND tcd.RouterCallKeyDay > 0 AND tcd.RouterCallKey > 0
GROUP BY tcd.RouterCallKeyDay, tcd.RouterCallKey
STATISTICS TIME: CPU time = 4297 ms, elapsed time = 17097 ms.
The execution plan is simple. Index seek 85%, Aggregate cost 15%:
The fancy ranking approach - using ROW_NUMBER:
;WITH xTcd AS (
SELECT tcd.RouterCallKeyDay, tcd.RouterCallKey, tcd.RecoveryKey, ROW_NUMBER() OVER (PARTITION BY tcd.RouterCallKeyDay, tcd.RouterCallKey ORDER BY tcd.RecoveryKey DESC) AS rowNumber
FROM Termination_Call_Detail tcd
WHERE tcd.DateTime > DATEADD(m,-6,CAST(GETDATE() AS DATE))
AND tcd.RouterCallKeyDay > 0 AND tcd.RouterCallKey > 0 )
SELECT xTcd.RouterCallKeyDay, xTcd.RouterCallKey, xTcd.RecoveryKey FROM xTcd xTcd WHERE xTcd.rowNumber = 1
Notice we're using a Common Table Expression (CTE) to "store" data temporarily so we can filter out the rows where ROW_NUMBER is 1.
STATISTICS TIME: CPU time = 6986 ms, elapsed time = 10153 ms.
Execution plan: yeah, the sort cost is quite high, 45%:
Finally, the funky analytic approach - using DISTINCT LAST_VALUE:
SELECT DISTINCT tcd.RouterCallKeyDay, tcd.RouterCallKey, LAST_VALUE(tcd.RecoveryKey) OVER (PARTITION BY tcd.RouterCallKeyDay, tcd.RouterCallKey ORDER BY tcd.RecoveryKey DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS RecoveryKey
FROM Termination_Call_Detail tcd
WHERE tcd.DateTime > DATEADD(m,-6,CAST(GETDATE() AS DATE))
AND tcd.RouterCallKeyDay > 0 AND tcd.RouterCallKey > 0
STATISTICS TIME: CPU time = 17375 ms, elapsed time = 13584 ms.
The execution plan is even more complicated, Index seek: 40%, sort cost: 37%:
It's not worth using the analytic approach, because... why?!
Now, comparing GROUP BY and ROW_NUMBER: the first approach uses less CPU time (4297ms) but the total elapsed time is 17097ms which is higher than the total elapsed time of the second approach (10153ms) where the CPU time is higher: 6986ms.
The winner is ROW_NUMBER. 🎉