For this month’s T-SQL Tuesday Steve Jones (blog|twitter) asked us to share something neat using window functions. Who could resist?
Not me for certain. As a self confessed window function addict and regular presenter on the topic I couldn’t stop myself from sharing something.
Serendipitously I’ve been writing a query this week, not just with a window function but with window function features newly introduced with SQL Server 2022 and now available in Azure SQL DB and Managed Instances. What follows is a simplified and sanitised version of the code. Dial your server up to compatibility level 160 if you’d like to follow along!
The Problem With FIRST_VALUE and LAST_VALUE
FIRST_VALUE and LAST_VALUE functions were first introduced back in SQL Server 2012 but they didn’t see much use (or certainly not in my work) for a couple of reasons. Firstly because they didn’t do much with the default window window ranges, without dipping in to the ROWS or RANGES feature LAST_VALUE simply returns the values in the current row. More critically, as far as both functions were concerned the definition of “value” included null.
Take the table below, we record a value over time but often no record is available and we hit a NULL in the list. If I’m looking at row 12 I probably want my last value to be green from back on row 9 but depending on my window clause I’m likely to get either the value from row 12 (null), row 11(null) or possibly row 14 (you guessed it, null). Finding the last non null value in a table was possible but it wasn’t exactly straight forwards and would likely involve a couple of hours buried in Itzik Ben-Gan’s blogs. If you felt particularly daring you might end up taking some some dubious liberties with the query engine (hands up who remembers what a quirky update is).
Finding The Last Non Null Value – Enter SQL Server 2022
With SQL Server 2022 came a much requested additional feature added from the SQL standards – IGNORE_NULLS. You can probably guess what it does. Drop in IGNORE_NULLS after your function and you can blur the non null values over those gaps giving us results like this:
Definitely useful in itself where data is sporadic and you want to see whatever the most recent recorded value, in practice you’re likely to do this over multiple columns. The code for the above looks like this. I’m using the new WINDOW clause to define the window to act over. This has little impact on the size of the code for this example but if you need multiple columns with the same window it will save a lot of unneeded, messy repetition.
SELECT No, Colour,
LAST_VALUE(Colour) IGNORE_NULLS
OVER MyWindow AS LAST_VALUE,
FROM #MyData
WINDOW
MyWindow AS (
ORDER BY No
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW
)
Pivoting Data Over Time
But wait there’s more as they say in the adverts. LAST_VALUE is an aggregate function which means you can combine it with another of my favourite SQL techniques the crosstab/pivot query. Let’s see that in action.
I’m not a big fan of T-SQL’s PIVOT function, it’s too limiting and I can never remember the syntax. My preference has always been to embed an IIF statement inside an aggregate in order to control when the aggregate sees values and when it sees a NULL. Let’s look at an example building up in steps.
The below query takes a set of colours recorded over time. Applying an IIF for each colour we filter the results so we have a column that either contains the requested colour or a null value.
SELECT RecordDate, Colour,
IIF(Colour='Red’
,RecordDate,NULL)
AS RedOnly,
IIF(Colour='Green’
,RecordDate,NULL)
AS GreenOnly,
IIF(Colour='Blue’
,RecordDate,NULL)
AS BlueOnly
FROM #MyData
Giving us a result set looking like the one below. You may notice that each column looks a lot like our single column example from above.
So let’s combine the two and wrap those IIFs inside a LAST_VALUE.
SELECT RecordDate, Colour,
LEAD(RecordDate)
OVER MyOrder AS NextDate,
LAST_VALUE(
IIF(Colour='Red',RecordDate,NULL)
) IGNORE NULLS
OVER MyWindow AS LastSeenRed,
LAST_VALUE(
IIF(Colour='Green',RecordDate,NULL)
) IGNORE NULLS
OVER MyWindow AS LastSeenGreen,
LAST_VALUE(
IIF(Colour='Blue',RecordDate,NULL)
) IGNORE NULLS
OVER MyWindow AS LastSeenBlue
FROM #MyData
WINDOW
MyOrder AS (
ORDER BY RecordDate
),
MyWindow AS (
MyOrder
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW
)
Giving us a final set of results looking like this.
Now every gap is filled and by sneaking in a lead function on the date field we have a set of data with continuous date ranges showing the last seen date for every colour over time.
There’s plenty of room for variations around this concept, anywhere that there’s gaps in records or intermittent data flows this approach can come to your rescue. I’d love to hear about other use cases people have encountered. Feel free to chip in down in the comments.