T-SQL Tuesday – Barney Lawrence

T-SQL Tuesday #168 – FIRST_VALUE, LAST_VALUE and NULL Handling

FIRST_VALUE and LAST_VALUE had limited uses when they were first released to the world back in 2012, but with the introduced of the ability to ignore nulls in SQL Server 2022 they become a lot more useful. Let’s take a look at some examples of why.

For this month’s T-SQL Tuesday Steve Jones (blog|twitter) asked us to share something neat using window functions. Who could resist?

Not me for certain. As a self confessed window function addict and regular presenter on the topic I couldn’t stop myself from sharing something.

Serendipitously I’ve been writing a query this week, not just with a window function but with window function features newly introduced with SQL Server 2022 and now available in Azure SQL DB and Managed Instances. What follows is a simplified and sanitised version of the code. Dial your server up to compatibility level 160 if you’d like to follow along!

The Problem With FIRST_VALUE and LAST_VALUE

FIRST_VALUE and LAST_VALUE functions were first introduced back in SQL Server 2012 but they didn’t see much use (or certainly not in my work) for a couple of reasons. Firstly because they didn’t do much with the default window window ranges, without dipping in to the ROWS or RANGES feature LAST_VALUE simply returns the values in the current row. More critically, as far as both functions were concerned the definition of “value” included null.

Take the table below, we record a value over time but often no record is available and we hit a NULL in the list. If I’m looking at row 12 I probably want my last value to be green from back on row 9 but depending on my window clause I’m likely to get either the value from row 12 (null), row 11(null) or possibly row 14 (you guessed it, null). Finding the last non null value in a table was possible but it wasn’t exactly straight forwards and would likely involve a couple of hours buried in Itzik Ben-Gan’s blogs. If you felt particularly daring you might end up taking some some dubious liberties with the query engine (hands up who remembers what a quirky update is).

Finding The Last Non Null Value – Enter SQL Server 2022

With SQL Server 2022 came a much requested additional feature added from the SQL standards – IGNORE_NULLS. You can probably guess what it does. Drop in IGNORE_NULLS after your function and you can blur the non null values over those gaps giving us results like this:

Definitely useful in itself where data is sporadic and you want to see whatever the most recent recorded value, in practice you’re likely to do this over multiple columns. The code for the above looks like this. I’m using the new WINDOW clause to define the window to act over. This has little impact on the size of the code for this example but if you need multiple columns with the same window it will save a lot of unneeded, messy repetition.

SELECT No, Colour,
LAST_VALUE(Colour) IGNORE_NULLS
	OVER MyWindow AS LAST_VALUE,
FROM #MyData
WINDOW 
MyWindow AS (
ORDER BY No
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW
)

Pivoting Data Over Time

But wait there’s more as they say in the adverts. LAST_VALUE is an aggregate function which means you can combine it with another of my favourite SQL techniques the crosstab/pivot query. Let’s see that in action.

I’m not a big fan of T-SQL’s PIVOT function, it’s too limiting and I can never remember the syntax. My preference has always been to embed an IIF statement inside an aggregate in order to control when the aggregate sees values and when it sees a NULL. Let’s look at an example building up in steps.

The below query takes a set of colours recorded over time. Applying an IIF for each colour we filter the results so we have a column that either contains the requested colour or a null value.

SELECT RecordDate, Colour,
IIF(Colour='Red’
	,RecordDate,NULL) 
	AS RedOnly,
IIF(Colour='Green’
	,RecordDate,NULL) 
	AS GreenOnly,
IIF(Colour='Blue’
	,RecordDate,NULL) 
	AS BlueOnly
FROM #MyData

Giving us a result set looking like the one below. You may notice that each column looks a lot like our single column example from above.

So let’s combine the two and wrap those IIFs inside a LAST_VALUE.

SELECT RecordDate, Colour,
LEAD(RecordDate) 
	OVER MyOrder AS NextDate,
LAST_VALUE(
	IIF(Colour='Red',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenRed,
LAST_VALUE(
	IIF(Colour='Green',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenGreen,
LAST_VALUE(
	IIF(Colour='Blue',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenBlue
FROM #MyData
WINDOW 
MyOrder AS (
	ORDER BY RecordDate
),
MyWindow AS (
MyOrder
	ROWS BETWEEN
	UNBOUNDED PRECEDING
	AND CURRENT ROW
)

Giving us a final set of results looking like this.

Now every gap is filled and by sneaking in a lead function on the date field we have a set of data with continuous date ranges showing the last seen date for every colour over time.
There’s plenty of room for variations around this concept, anywhere that there’s gaps in records or intermittent data flows this approach can come to your rescue. I’d love to hear about other use cases people have encountered. Feel free to chip in down in the comments.

T-SQL Tuesday #140: Containers for Business and Pleasure

Containers aren’t always for boring work stuff (although they’re great for that).
This post gives some details on how to set up Docker from scratch and create containers for both a SQL Server test lab and a Minecraft Bedrock Server.

For this month’s T-SQL Tuesday Anthony Nocentino (blog|twitter) asked us to share our experiences of using containers.

Not being a DBA type containers aren’t what you would consider my home ground. My work usually starts once an environment is in place and running but lacking my own personal DBA I’ve started to dabble with Docker on my personal laptop. Partly this is to make my life easier but also partly because I expect that sooner or later containerised development environments will be the norm and a bit of experimentation now will hopefully keep me ahead of the game.

To show the variety of things you can get up to with a container I’m going to share not one but two examples; Setting up a SQL Server container to support experimentation and testing code for blog posts and, on the other side of my work life balance setting up a Minecraft server to enable the family to play together.

None of this is my area of expertise and I’ve never had the Linux\Unix experience that would make this easy going but to some extent I see that as an advantage, there are some giants in the field who know far more than I ever will. I’m just going to show the rest of us how you can cling precariously on to their shoulders and get stuff done.

How To Run SQL Server In A Docker Container

Docker is now my preferred route to running SQL Server on a personal machine. It saves the need for a direct install (and potential of an attempted uninstall), it’s a smoother process than a full fat VM type setup with the need to install and configure a whole OS (I do miss the days when Microsoft provided a complete image with a 180 day time limit). It has a relatively light footprint and, gives you room to experiment easily with multiple versions. The only real downside is we’re looking at the Linux version of SQL Server so if you want to work with either SSAS or SSIS you might need to look to the alternatives.

My needs are relatively simple, a recent version of SQL Server, access to the most common sample data sets and to avoid having to rebuild and reinstall on a regular basis. The below gives me all that and it’s been everything I’ve needed while creating and testing my series of posts on querying XML with SQL Server.

I’m not going to retread ground here and as I hinted above this isn’t really my work so I’m going to share links and give as much credit as I can, if I’ve missed someone do let me know and I’ll get this updated. I’ve been through a few iterations of setting up these containers and this is the route that I’m happiest with to date.

Installing Docker

Go get Docker from their web site and install it if you’re running on Windows you will want to be using WSL2, if this is your very first time with Docker you’ll probably need to some details from Microsoft’s guidance on that. I installed the Ubuntu 18 Linux distribution as that appears to be the current preferred option for Docker.

Changing Default Local Storage Location

WSL2 by default puts all data into virtual drive files on your system (c:) drive. This may be a problem if you have a small boot drive and a big data drive like I do. I want to handle some big databases like the Stack Overflow data sets many like to use for query examples.
This answer over on Stack Overflow gives the step by step commands to move everything over to the location of your choice.

Setting Up a SQL Server Container

I’ve tried a few approaches for creating an SQL Server Container and the one that balances ease of creation with covering long term needs comes from the cupboard of Container King and all round nice guy Andrew Pruski (b|t). His guide to SQL Server and Containers gives you pretty much everything you need to get up and running quickly. I’d recommend putting the effort in to follow his more advanced Docker Compose example as it solves a few permissions issues you might encounter when taking simpler approaches and splits folders into neat volumes similar to those you’d see on a physical install.

My current setup is almost exactly the same as this only I’ve switched the container port to 1433 which is the default SSMS looks for. This saves me remembering the correct number when connecting. I also changed the password from Andrew’s default, it’s probably best we don’t all have the same password (make sure it’s complex enough, very sort simple passwords may be rejected and cause the process to fail).

If you’ve followed all of the above then by this point you’ll have an empty instance of SQL Server in a container. Connect up and give it a few test queries, if all you need is an empty server you can stop now.

Adding Some Data

Now we have our server we need to add some data. With WSL2 we have a route to access the drive volumes we’ve created from in windows but finding them can be tricky. The path I use with the above setup is:

\\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes

You can then see sub folders for all the volumes you have created with Docker. You can then copy backups of your preferred data sets into the backup folder (I’m using varieties of Adventure Works and World Wide Importers) and restore as you would anywhere else.

Alternatively copy mdf, ndf and ldf files into the respective data and log folders and attach them (I did this for the Stack Overflow samples Brent Ozar maintains).
I had a few troubles with these, my first attempt with this approach left me with a read only database. This may have been because I copied the files directly from another container volume or it might be something that happens every time. I ended up having to adapt the command from Andrew’s setup that gives permissions to his SQL containers again against these file, detaching and re-attaching.

chown -R mssql:mssql /var/opt/sqlserver/[Whatever the file is that's causing the problems]

And we’re done, we have a shiny SQL Server 2019 instance in Docker, it starts when I need it, I can stop it when I don’t and if I need to upgrade I just re-build the container and all that sample data lives on in the volumes. Those visiting this page in a strictly professional capacity can stop reading now.

How to Run a Minecraft Bedrock Server In Docker

Many of you out there may, like me, find yourselves sharing a home with a miniature Minecraft Addict. I do and in between showing off her latest builds she’s begun pestering me to find a way to play with friends and family. I figured I could maybe save some work and re-use my newfound containerised powers for fun as well as profit and a few quick searches proved me right.

Here’s the container I’ve been using I don’t know who ‘itzg’ is but he’s a hero for setting this up. The server version used is a little out of date but it self updates and thanks to the use of a volume for data it remembers updates, configurations and maps so you don’t lose anything between sessions.

I found a few issues with the docker-compose.yml file so here’s my revised version with a few tweaks, extras and, a nice world seed that drops you in on the edge of a village for added in game friendship. If you didn’t get that SQL Server container up and running in the previous section then follow the steps on installation and setting the storage location before coming back here.

version: '3.8'

services:
  Minecraft:
    image: "itzg/minecraft-bedrock-server"
    environment:
      EULA: "TRUE"
      GAMEMODE: survival
      DIFFICULTY: normal
      LEVEL_SEED: "-850418298"
      ALLOW_CHEATS: "false"
      SERVER_NAME: "My Server Name"
      VIEW_DISTANCE: "64"
      LEVEL_NAME: "My Survival World"
     #  WHITE_LIST: "true"
      DEFAULT_PLAYER_PERMISSION_LEVEL: member
    ports:
      - 19132:19132/udp
    volumes:
      - bds:/data
    stdin_open: true
    tty: true

volumes:
  bds: {}

That’s all you need for a local network game. If you want to play with others than you’ll need to work out how to forward port 19132 to the computer running the container and possibly figure out how to set up the whitelist file in the volume created so there’s no risks of unexpected strangers dropping in on you.

Have fun and look out for those creepers!

T-SQL Tuesday 138: Managing Technology Changes

How Do You Respond When Technology Changes Under You? Grab it with both hands and cling on tight!

In this month’s T-SQL Tuesday Andy Leonard (b|t) asks “How Do You Respond When Technology Changes Under You?” .
My response to this would be that wherever possible embrace it. Despite inevitable teething troubles most software changes are for the better and while you probably can’t prevent change you can make sure it doesn’t leave you behind.

One aspect of T-SQL that everyone should embrace is window functions. They were first added in SQL Server 2005 with simple ranking functions and have built since then with features such as greater control over window size and shape and neat functions like lag, lead .

Despite being around for over a decade many haven’t really embraced window functions as far as they could and probably should.

Here’s an example of how a window function can make your code neater and faster. The example I’ll share is somewhat contrived to fit in the Adventure Works sample databases but the general pattern is one I’ve been fighting against for at least a decade.

Imagine that for some reasonthe Adventure Works Company wants to identify the most fertile customer that has purchased each product sold. Our analyst sits down and thinks “the most fertile customer has the most children” they therefore write a query to find the maximum number of children in the customer dimension of the data warehouse for each product. This doesn’t give us the customer just the number of children they have so we join the result back to the same table to get their name. Unfortunately we get multiple results because more than one person has that number of children.
Out analyst thinks again and decides that as a tie breaker we should pick the customer with the most recent date of birth (they had those children fastest) so picks the maximum date of birth from that list and again joins it to the customer table… only to find that there’s more than one with the same date of birth, finally we pick the lowest customer id and as there are no duplicates in this we get our answer.

The resulting query looks like this, actually I’m being kind here and using common table expressions to make the steps clear, more often than not when I see code like this it would be sub queries nested in sub queries and so on.

USE AdventureWorksDW2017;

WITH MaxChildren AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
GROUP BY S.ProductKey
	),
MaxBirth AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxChildren AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
GROUP BY S.ProductKey
), 
MinCustomer AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth,MIN(C.CustomerKey) AS MinCustomer
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxBirth AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND M.MaxBirth = C.BirthDate
GROUP BY S.ProductKey
)
SELECT S.ProductKey, C.CustomerKey
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MinCustomer AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND C.BirthDate = M.MaxBirth
	AND C.CustomerKey = M.MinCustomer
ORDER BY ProductKey, CustomerKey;

Apart from the length of the code the big problem is the number of times we go though the involved tables. SQL Server isn’t going to do anything clever here, we code in the two tables four times and so those tables will get read four times.

Enter window functions and the most common of them all ROW_NUMBER(), we use PARTITION BY to give us a number sequence that resets per product and then we order our customers by all our criteria in one go. The result is ranked customers with the one coming first in our sequence being the choice we need. Finally we just filter those results to get those top picks.

USE AdventureWorksDW2017;

WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
						PARTITION BY S.ProductKey 
						ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
					) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
	)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;

All this happens in 17 lines of code rather than the 43 above. The intention of the query is clearer as well.
As an added bonus the query runs faster, we go through the table only once, admittedly with a big sort of the data but the gains outweigh the losses by a long distance.

There are many more examples of where window functions make your queries better and faster, it’s something I’m hoping to cover once I’ve finished my blog series on querying XML until then here are some links to some good reading material.

Official Documentation of OVER
Itzik Ben-Gan presenting on the topic from the SQLBits Archive
Looking to the future Koen Verbeeck writes about Snowflake and some of the features that haven’t reached SQL Server yet but that we can hopefully embrace in the future.

Using SQL Notebooks For Knowledge Sharing

Jupyter notebooks aren’t just there for the script commands you struggle to remember (although they’re pretty good for that too).

For this month’s T-SQL Tuesday #137 Steve Jones (b|t) asked to share how we have used, or would like to use, Jupyter notebooks.

I don’t really think about Jupyter Notebooks often, I’ve seen some nice demos of use cases for us SQL Server people (See this session by Rob Sewell for a few examples) but never given them much thought beyond that.

Which possibly shows how unglamorous they can be given that I use one virtually every day at work.
Just the one though, it’s a PowerShell notebook and as PowerShell can be used to run git commands I use it to hold the scripts that either take me too long to type out (like refreshing all the branches for our shared UAT and live environments in one go) or that I forget the syntax for because I don’t use it often enough and there’s no friendly button in Visual Studio to save me.

I thought I’d focus on something a little more interesting for this blog instead and so went back to one of the use cases that interested me when I first encountered SQL Notebooks. Using them to combine both text and code into a nice neat demo of a T-SQL feature. When I first investigated this most of my ideas fell flat because each code block was considered a different session and so temporary tables couldn’t be carried between them, this prevented you building up a decent example data set and then using it in subsequent steps. The good news is that that issue has now been fixed and so I thought I’d try converting my series of blog posts on querying XML in SQL Server into a set of notebooks.

The process was pleasantly simple, it’s just a matter of copying and pasting over and most of the formatting for headings, bullet points and links carried over without issue. A few times I had to delve into the markdown code to strip out a rogue change of font but each post took me 5-10 minutes to adapt.

You can access the results here they’re even viewable online at GitHub although the formatting of results tables isn’t quite as neat as in Azure Data Studio and there’s no way to run the sample code.

There’s a lot to be said for the approach, the long bits of code that set up the sample data can be hidden and results can be optionally saved with the notebook. As long as you have access to SQL Server you can click to run the sample code or start to experiment by changing it.

There are a few down sides. The most obvious is that while temp tables work between blocks of code, intellisense isn’t smart enough to know this and so throws up a lot of red wiggly lines. I’d also love to be able to collapse results sets without wiping them completely in the same way that you can do for code. Sometimes I wanted the results to be there but they were too long winded to want them to be filling up the page by default.
Limits come from Azure Data Studio as well, it’s not an issue for these posts but the lack of query plans will likely send me running back to good old SSMS at some point in the future.

Overall I like the result. I have something that I can easily point people to that allows them to run my example code but it also allows for more control over the surrounding text than is possible in a classic SQL script. Given that it’s relatively easy to convert a blog post into a notebook I’m going to keep at it for now and where appropriate attach a notebook version to each blog post.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form