Tuesday, April 8, 2025

Termination_Call_Detail: identifier of last row of an ACD call

The task:

From Termination_Call_Detail (TCD) we need to select the unique identifier of the final row for each ACD call, for the last 6 months.

The "6 months" clause: uses DateTime (indexed). 

The "ACD call" clause: we use RouterCallKeyDay > 0 AND RouterCallKey > 0 (both are indexed). 

The "unique identifier" part: let's not complicate things and just use RecoveryKey.  

The challenge:

The "final row" - we know there are multiple rows per each ACD call in TCD. 

Let's not complicate things and just pretend rows are ordered by RecoveryKey. 


Analysis:

We've got multiple options: 

  • GROUP BY 
  • RANK/DENSE_RANK/ROW_NUMBER 
  • LAST_VALUE 

 

The good old GROUP BY approach:

SELECT tcd.RouterCallKeyDay, tcd.RouterCallKey, MAX(tcd.RecoveryKey) AS maxTcdRecoveryKey 
FROM Termination_Call_Detail tcd 
WHERE tcd.DateTime > DATEADD(m,-6,CAST(GETDATE() AS DATE)) 
AND tcd.RouterCallKeyDay > 0 AND tcd.RouterCallKey > 0
GROUP BY tcd.RouterCallKeyDay, tcd.RouterCallKey

STATISTICS TIME:  CPU time = 4297 ms,  elapsed time = 17097 ms.

The execution plan is simple. Index seek 85%, Aggregate cost 15%:

 


The fancy ranking approach - using ROW_NUMBER:

;WITH xTcd AS (
SELECT tcd.RouterCallKeyDay, tcd.RouterCallKey, tcd.RecoveryKey, ROW_NUMBER() OVER (PARTITION BY tcd.RouterCallKeyDay, tcd.RouterCallKey ORDER BY tcd.RecoveryKey DESC) AS rowNumber 
FROM Termination_Call_Detail tcd 
WHERE tcd.DateTime > DATEADD(m,-6,CAST(GETDATE() AS DATE)) 
AND tcd.RouterCallKeyDay > 0 AND tcd.RouterCallKey > 0 )
SELECT xTcd.RouterCallKeyDay, xTcd.RouterCallKey, xTcd.RecoveryKey FROM xTcd xTcd WHERE xTcd.rowNumber = 1

Notice we're using a Common Table Expression (CTE) to "store" data temporarily so we can filter out the rows where ROW_NUMBER is 1. 

STATISTICS TIME: CPU time = 6986 ms,  elapsed time = 10153 ms.

Execution plan: yeah, the sort cost is quite high, 45%: 

Finally, the funky analytic approach - using DISTINCT LAST_VALUE:

SELECT DISTINCT tcd.RouterCallKeyDay, tcd.RouterCallKey, LAST_VALUE(tcd.RecoveryKey) OVER (PARTITION BY tcd.RouterCallKeyDay, tcd.RouterCallKey ORDER BY tcd.RecoveryKey DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS RecoveryKey 
FROM Termination_Call_Detail tcd 
WHERE tcd.DateTime > DATEADD(m,-6,CAST(GETDATE() AS DATE)) 
AND tcd.RouterCallKeyDay > 0 AND tcd.RouterCallKey > 0

STATISTICS TIME: CPU time = 17375 ms,  elapsed time = 13584 ms.

The execution plan is even more complicated, Index seek: 40%, sort cost: 37%:

 

 

It's not worth using the analytic approach, because... why?!

Now, comparing GROUP BY and ROW_NUMBER: the first approach uses less CPU time (4297ms) but the total elapsed time is 17097ms which is higher than the total elapsed time of the second approach (10153ms) where the CPU time is higher: 6986ms. 


The winner is ROW_NUMBER. 🎉

 

No comments:

Post a Comment