T-SQL Tuesday #168 – FIRST_VALUE, LAST_VALUE and NULL Handling

FIRST_VALUE and LAST_VALUE had limited uses when they were first released to the world back in 2012, but with the introduced of the ability to ignore nulls in SQL Server 2022 they become a lot more useful. Let’s take a look at some examples of why.

TSQL Tuesday Logo

For this month’s T-SQL Tuesday Steve Jones (blog|twitter) asked us to share something neat using window functions. Who could resist?

Not me for certain. As a self confessed window function addict and regular presenter on the topic I couldn’t stop myself from sharing something.

Serendipitously I’ve been writing a query this week, not just with a window function but with window function features newly introduced with SQL Server 2022 and now available in Azure SQL DB and Managed Instances. What follows is a simplified and sanitised version of the code. Dial your server up to compatibility level 160 if you’d like to follow along!

The Problem With FIRST_VALUE and LAST_VALUE

FIRST_VALUE and LAST_VALUE functions were first introduced back in SQL Server 2012 but they didn’t see much use (or certainly not in my work) for a couple of reasons. Firstly because they didn’t do much with the default window window ranges, without dipping in to the ROWS or RANGES feature LAST_VALUE simply returns the values in the current row. More critically, as far as both functions were concerned the definition of “value” included null.

Take the table below, we record a value over time but often no record is available and we hit a NULL in the list. If I’m looking at row 12 I probably want my last value to be green from back on row 9 but depending on my window clause I’m likely to get either the value from row 12 (null), row 11(null) or possibly row 14 (you guessed it, null). Finding the last non null value in a table was possible but it wasn’t exactly straight forwards and would likely involve a couple of hours buried in Itzik Ben-Gan’s blogs. If you felt particularly daring you might end up taking some some dubious liberties with the query engine (hands up who remembers what a quirky update is).

Finding The Last Non Null Value – Enter SQL Server 2022

With SQL Server 2022 came a much requested additional feature added from the SQL standards – IGNORE_NULLS. You can probably guess what it does. Drop in IGNORE_NULLS after your function and you can blur the non null values over those gaps giving us results like this:

Definitely useful in itself where data is sporadic and you want to see whatever the most recent recorded value, in practice you’re likely to do this over multiple columns. The code for the above looks like this. I’m using the new WINDOW clause to define the window to act over. This has little impact on the size of the code for this example but if you need multiple columns with the same window it will save a lot of unneeded, messy repetition.

SELECT No, Colour,
LAST_VALUE(Colour) IGNORE_NULLS
	OVER MyWindow AS LAST_VALUE,
FROM #MyData
WINDOW 
MyWindow AS (
ORDER BY No
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW
)

Pivoting Data Over Time

But wait there’s more as they say in the adverts. LAST_VALUE is an aggregate function which means you can combine it with another of my favourite SQL techniques the crosstab/pivot query. Let’s see that in action.

I’m not a big fan of T-SQL’s PIVOT function, it’s too limiting and I can never remember the syntax. My preference has always been to embed an IIF statement inside an aggregate in order to control when the aggregate sees values and when it sees a NULL. Let’s look at an example building up in steps.

The below query takes a set of colours recorded over time. Applying an IIF for each colour we filter the results so we have a column that either contains the requested colour or a null value.

SELECT RecordDate, Colour,
IIF(Colour='Red’
	,RecordDate,NULL) 
	AS RedOnly,
IIF(Colour='Green’
	,RecordDate,NULL) 
	AS GreenOnly,
IIF(Colour='Blue’
	,RecordDate,NULL) 
	AS BlueOnly
FROM #MyData

Giving us a result set looking like the one below. You may notice that each column looks a lot like our single column example from above.

So let’s combine the two and wrap those IIFs inside a LAST_VALUE.

SELECT RecordDate, Colour,
LEAD(RecordDate) 
	OVER MyOrder AS NextDate,
LAST_VALUE(
	IIF(Colour='Red',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenRed,
LAST_VALUE(
	IIF(Colour='Green',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenGreen,
LAST_VALUE(
	IIF(Colour='Blue',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenBlue
FROM #MyData
WINDOW 
MyOrder AS (
	ORDER BY RecordDate
),
MyWindow AS (
MyOrder
	ROWS BETWEEN
	UNBOUNDED PRECEDING
	AND CURRENT ROW
)

Giving us a final set of results looking like this.

Now every gap is filled and by sneaking in a lead function on the date field we have a set of data with continuous date ranges showing the last seen date for every colour over time.
There’s plenty of room for variations around this concept, anywhere that there’s gaps in records or intermittent data flows this approach can come to your rescue. I’d love to hear about other use cases people have encountered. Feel free to chip in down in the comments.

SQL Window Functions

Links to resources related to Window Functions and my sessions on them.

This page is here to act as an index to material I’ve produced relating to window functions.

GitHub: https://github.com/BarneyLawrence/Sessions-SQL-Window_Functions
My main repository, this holds PowerPoint presentations, sample queries and workbooks either used in sessions, to back them up or to develop the examples used.

Future Sessions

I’ll be presenting on window functions at:
PASS Data Community Summit – November 15th to 18th. Pre recorded session and live QA on Wednesday 15th 04:00 PM–04:30 PM PST

Further Resources

Itzik Ben-Gan’s Blogs at SQLPerformance.com
T-SQL Beyond the Basics – My 2021 PASS Summit session, includes sections on window functions amongst other intermediate techniques.