T-SQL Tuesday – Barney Lawrence

T-SQL Tuesday #168 – FIRST_VALUE, LAST_VALUE and NULL Handling

FIRST_VALUE and LAST_VALUE had limited uses when they were first released to the world back in 2012, but with the introduced of the ability to ignore nulls in SQL Server 2022 they become a lot more useful. Let’s take a look at some examples of why.

For this month’s T-SQL Tuesday Steve Jones (blog|twitter) asked us to share something neat using window functions. Who could resist?

Not me for certain. As a self confessed window function addict and regular presenter on the topic I couldn’t stop myself from sharing something.

Serendipitously I’ve been writing a query this week, not just with a window function but with window function features newly introduced with SQL Server 2022 and now available in Azure SQL DB and Managed Instances. What follows is a simplified and sanitised version of the code. Dial your server up to compatibility level 160 if you’d like to follow along!

The Problem With FIRST_VALUE and LAST_VALUE

FIRST_VALUE and LAST_VALUE functions were first introduced back in SQL Server 2012 but they didn’t see much use (or certainly not in my work) for a couple of reasons. Firstly because they didn’t do much with the default window window ranges, without dipping in to the ROWS or RANGES feature LAST_VALUE simply returns the values in the current row. More critically, as far as both functions were concerned the definition of “value” included null.

Take the table below, we record a value over time but often no record is available and we hit a NULL in the list. If I’m looking at row 12 I probably want my last value to be green from back on row 9 but depending on my window clause I’m likely to get either the value from row 12 (null), row 11(null) or possibly row 14 (you guessed it, null). Finding the last non null value in a table was possible but it wasn’t exactly straight forwards and would likely involve a couple of hours buried in Itzik Ben-Gan’s blogs. If you felt particularly daring you might end up taking some some dubious liberties with the query engine (hands up who remembers what a quirky update is).

Finding The Last Non Null Value – Enter SQL Server 2022

With SQL Server 2022 came a much requested additional feature added from the SQL standards – IGNORE_NULLS. You can probably guess what it does. Drop in IGNORE_NULLS after your function and you can blur the non null values over those gaps giving us results like this:

Definitely useful in itself where data is sporadic and you want to see whatever the most recent recorded value, in practice you’re likely to do this over multiple columns. The code for the above looks like this. I’m using the new WINDOW clause to define the window to act over. This has little impact on the size of the code for this example but if you need multiple columns with the same window it will save a lot of unneeded, messy repetition.

SELECT No, Colour,
LAST_VALUE(Colour) IGNORE_NULLS
	OVER MyWindow AS LAST_VALUE,
FROM #MyData
WINDOW 
MyWindow AS (
ORDER BY No
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW
)

Pivoting Data Over Time

But wait there’s more as they say in the adverts. LAST_VALUE is an aggregate function which means you can combine it with another of my favourite SQL techniques the crosstab/pivot query. Let’s see that in action.

I’m not a big fan of T-SQL’s PIVOT function, it’s too limiting and I can never remember the syntax. My preference has always been to embed an IIF statement inside an aggregate in order to control when the aggregate sees values and when it sees a NULL. Let’s look at an example building up in steps.

The below query takes a set of colours recorded over time. Applying an IIF for each colour we filter the results so we have a column that either contains the requested colour or a null value.

SELECT RecordDate, Colour,
IIF(Colour='Red’
	,RecordDate,NULL) 
	AS RedOnly,
IIF(Colour='Green’
	,RecordDate,NULL) 
	AS GreenOnly,
IIF(Colour='Blue’
	,RecordDate,NULL) 
	AS BlueOnly
FROM #MyData

Giving us a result set looking like the one below. You may notice that each column looks a lot like our single column example from above.

So let’s combine the two and wrap those IIFs inside a LAST_VALUE.

SELECT RecordDate, Colour,
LEAD(RecordDate) 
	OVER MyOrder AS NextDate,
LAST_VALUE(
	IIF(Colour='Red',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenRed,
LAST_VALUE(
	IIF(Colour='Green',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenGreen,
LAST_VALUE(
	IIF(Colour='Blue',RecordDate,NULL)
	) IGNORE NULLS
	OVER MyWindow AS LastSeenBlue
FROM #MyData
WINDOW 
MyOrder AS (
	ORDER BY RecordDate
),
MyWindow AS (
MyOrder
	ROWS BETWEEN
	UNBOUNDED PRECEDING
	AND CURRENT ROW
)

Giving us a final set of results looking like this.

Now every gap is filled and by sneaking in a lead function on the date field we have a set of data with continuous date ranges showing the last seen date for every colour over time.
There’s plenty of room for variations around this concept, anywhere that there’s gaps in records or intermittent data flows this approach can come to your rescue. I’d love to hear about other use cases people have encountered. Feel free to chip in down in the comments.

T-SQL Tuesday #140: Containers for Business and Pleasure

Containers aren’t always for boring work stuff (although they’re great for that).
This post gives some details on how to set up Docker from scratch and create containers for both a SQL Server test lab and a Minecraft Bedrock Server.

For this month’s T-SQL Tuesday Anthony Nocentino (blog|twitter) asked us to share our experiences of using containers.

Not being a DBA type containers aren’t what you would consider my home ground. My work usually starts once an environment is in place and running but lacking my own personal DBA I’ve started to dabble with Docker on my personal laptop. Partly this is to make my life easier but also partly because I expect that sooner or later containerised development environments will be the norm and a bit of experimentation now will hopefully keep me ahead of the game.

To show the variety of things you can get up to with a container I’m going to share not one but two examples; Setting up a SQL Server container to support experimentation and testing code for blog posts and, on the other side of my work life balance setting up a Minecraft server to enable the family to play together.

None of this is my area of expertise and I’ve never had the Linux\Unix experience that would make this easy going but to some extent I see that as an advantage, there are some giants in the field who know far more than I ever will. I’m just going to show the rest of us how you can cling precariously on to their shoulders and get stuff done.

How To Run SQL Server In A Docker Container

Docker is now my preferred route to running SQL Server on a personal machine. It saves the need for a direct install (and potential of an attempted uninstall), it’s a smoother process than a full fat VM type setup with the need to install and configure a whole OS (I do miss the days when Microsoft provided a complete image with a 180 day time limit). It has a relatively light footprint and, gives you room to experiment easily with multiple versions. The only real downside is we’re looking at the Linux version of SQL Server so if you want to work with either SSAS or SSIS you might need to look to the alternatives.

My needs are relatively simple, a recent version of SQL Server, access to the most common sample data sets and to avoid having to rebuild and reinstall on a regular basis. The below gives me all that and it’s been everything I’ve needed while creating and testing my series of posts on querying XML with SQL Server.

I’m not going to retread ground here and as I hinted above this isn’t really my work so I’m going to share links and give as much credit as I can, if I’ve missed someone do let me know and I’ll get this updated. I’ve been through a few iterations of setting up these containers and this is the route that I’m happiest with to date.

Installing Docker

Go get Docker from their web site and install it if you’re running on Windows you will want to be using WSL2, if this is your very first time with Docker you’ll probably need to some details from Microsoft’s guidance on that. I installed the Ubuntu 18 Linux distribution as that appears to be the current preferred option for Docker.

Changing Default Local Storage Location

WSL2 by default puts all data into virtual drive files on your system (c:) drive. This may be a problem if you have a small boot drive and a big data drive like I do. I want to handle some big databases like the Stack Overflow data sets many like to use for query examples.
This answer over on Stack Overflow gives the step by step commands to move everything over to the location of your choice.

Setting Up a SQL Server Container

I’ve tried a few approaches for creating an SQL Server Container and the one that balances ease of creation with covering long term needs comes from the cupboard of Container King and all round nice guy Andrew Pruski (b|t). His guide to SQL Server and Containers gives you pretty much everything you need to get up and running quickly. I’d recommend putting the effort in to follow his more advanced Docker Compose example as it solves a few permissions issues you might encounter when taking simpler approaches and splits folders into neat volumes similar to those you’d see on a physical install.

My current setup is almost exactly the same as this only I’ve switched the container port to 1433 which is the default SSMS looks for. This saves me remembering the correct number when connecting. I also changed the password from Andrew’s default, it’s probably best we don’t all have the same password (make sure it’s complex enough, very sort simple passwords may be rejected and cause the process to fail).

If you’ve followed all of the above then by this point you’ll have an empty instance of SQL Server in a container. Connect up and give it a few test queries, if all you need is an empty server you can stop now.

Adding Some Data

Now we have our server we need to add some data. With WSL2 we have a route to access the drive volumes we’ve created from in windows but finding them can be tricky. The path I use with the above setup is:

\\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes

You can then see sub folders for all the volumes you have created with Docker. You can then copy backups of your preferred data sets into the backup folder (I’m using varieties of Adventure Works and World Wide Importers) and restore as you would anywhere else.

Alternatively copy mdf, ndf and ldf files into the respective data and log folders and attach them (I did this for the Stack Overflow samples Brent Ozar maintains).
I had a few troubles with these, my first attempt with this approach left me with a read only database. This may have been because I copied the files directly from another container volume or it might be something that happens every time. I ended up having to adapt the command from Andrew’s setup that gives permissions to his SQL containers again against these file, detaching and re-attaching.

chown -R mssql:mssql /var/opt/sqlserver/[Whatever the file is that's causing the problems]

And we’re done, we have a shiny SQL Server 2019 instance in Docker, it starts when I need it, I can stop it when I don’t and if I need to upgrade I just re-build the container and all that sample data lives on in the volumes. Those visiting this page in a strictly professional capacity can stop reading now.

How to Run a Minecraft Bedrock Server In Docker

Many of you out there may, like me, find yourselves sharing a home with a miniature Minecraft Addict. I do and in between showing off her latest builds she’s begun pestering me to find a way to play with friends and family. I figured I could maybe save some work and re-use my newfound containerised powers for fun as well as profit and a few quick searches proved me right.

Here’s the container I’ve been using I don’t know who ‘itzg’ is but he’s a hero for setting this up. The server version used is a little out of date but it self updates and thanks to the use of a volume for data it remembers updates, configurations and maps so you don’t lose anything between sessions.

I found a few issues with the docker-compose.yml file so here’s my revised version with a few tweaks, extras and, a nice world seed that drops you in on the edge of a village for added in game friendship. If you didn’t get that SQL Server container up and running in the previous section then follow the steps on installation and setting the storage location before coming back here.

version: '3.8'

services:
  Minecraft:
    image: "itzg/minecraft-bedrock-server"
    environment:
      EULA: "TRUE"
      GAMEMODE: survival
      DIFFICULTY: normal
      LEVEL_SEED: "-850418298"
      ALLOW_CHEATS: "false"
      SERVER_NAME: "My Server Name"
      VIEW_DISTANCE: "64"
      LEVEL_NAME: "My Survival World"
     #  WHITE_LIST: "true"
      DEFAULT_PLAYER_PERMISSION_LEVEL: member
    ports:
      - 19132:19132/udp
    volumes:
      - bds:/data
    stdin_open: true
    tty: true

volumes:
  bds: {}

That’s all you need for a local network game. If you want to play with others than you’ll need to work out how to forward port 19132 to the computer running the container and possibly figure out how to set up the whitelist file in the volume created so there’s no risks of unexpected strangers dropping in on you.

Have fun and look out for those creepers!

Using SQL Notebooks For Knowledge Sharing

Jupyter notebooks aren’t just there for the script commands you struggle to remember (although they’re pretty good for that too).

For this month’s T-SQL Tuesday #137 Steve Jones (b|t) asked to share how we have used, or would like to use, Jupyter notebooks.

I don’t really think about Jupyter Notebooks often, I’ve seen some nice demos of use cases for us SQL Server people (See this session by Rob Sewell for a few examples) but never given them much thought beyond that.

Which possibly shows how unglamorous they can be given that I use one virtually every day at work.
Just the one though, it’s a PowerShell notebook and as PowerShell can be used to run git commands I use it to hold the scripts that either take me too long to type out (like refreshing all the branches for our shared UAT and live environments in one go) or that I forget the syntax for because I don’t use it often enough and there’s no friendly button in Visual Studio to save me.

I thought I’d focus on something a little more interesting for this blog instead and so went back to one of the use cases that interested me when I first encountered SQL Notebooks. Using them to combine both text and code into a nice neat demo of a T-SQL feature. When I first investigated this most of my ideas fell flat because each code block was considered a different session and so temporary tables couldn’t be carried between them, this prevented you building up a decent example data set and then using it in subsequent steps. The good news is that that issue has now been fixed and so I thought I’d try converting my series of blog posts on querying XML in SQL Server into a set of notebooks.

The process was pleasantly simple, it’s just a matter of copying and pasting over and most of the formatting for headings, bullet points and links carried over without issue. A few times I had to delve into the markdown code to strip out a rogue change of font but each post took me 5-10 minutes to adapt.

You can access the results here they’re even viewable online at GitHub although the formatting of results tables isn’t quite as neat as in Azure Data Studio and there’s no way to run the sample code.

There’s a lot to be said for the approach, the long bits of code that set up the sample data can be hidden and results can be optionally saved with the notebook. As long as you have access to SQL Server you can click to run the sample code or start to experiment by changing it.

There are a few down sides. The most obvious is that while temp tables work between blocks of code, intellisense isn’t smart enough to know this and so throws up a lot of red wiggly lines. I’d also love to be able to collapse results sets without wiping them completely in the same way that you can do for code. Sometimes I wanted the results to be there but they were too long winded to want them to be filling up the page by default.
Limits come from Azure Data Studio as well, it’s not an issue for these posts but the lack of query plans will likely send me running back to good old SSMS at some point in the future.

Overall I like the result. I have something that I can easily point people to that allows them to run my example code but it also allows for more control over the surrounding text than is possible in a classic SQL script. Given that it’s relatively easy to convert a blog post into a notebook I’m going to keep at it for now and where appropriate attach a notebook version to each blog post.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

I’ve Got 99.0991 Problems But Floating Point Precision ‘Aint One

In which the blogger finds an excuse to use a title he’s been holding on to for some time and we learn why float probably isn’t the best data type to hold your numerical data in.

For this week’s T-SQL Tuesday, Brent Ozar (b|t) asked us to share either our most or least favorite SQL Server data type. I can’t confess to having any particularly strong feelings on the subject in either direction even if you take into account my keenness to add the number 2 on the end every time I see a datetime.

I do however struggle to resist the opportunity to use a catchy title and this week’s subject was enough to finally give me the motivation to put the furloughed part of my current four day working week to good use and set up a blog.

The float data type can often look like a tempting, easy choice for numerical data types. Why worry about all that precision and scale nonsense when you can just use a data type that will hold literally* any number!

*do not take this claim literally

Let’s test this out by counting problems. We’ll start in a relaxed state and steadily increase the pressure level until we reach 99 problems (one metric Jay-Z).

DECLARE @Problems float(24) = 0.0;

While NOT(@Problems > 99)
BEGIN
	SET @Problems += 0.1
END;

Print 'I''ve got ' + CAST(@Problems as varchar(10)) + ' problems but floating point precision ''aint one';

Unfortunately we end up a little way over 99 at 99.0991 (henceforth known as an imperial Jay-Z).

Why? Because accuracy isn’t the main aim of a float, it’s a way of storing something that’s roughly the right number in a small space, in a way computers find easy to process quickly, there are values that it can get really close to but never accurately store. That’s great if you’re not that worried about precision (imagine streams of IOT readings with an inherent error to them), it’s less good if you want the exact right number (think applying interest rates to your savings).

You might also be caught out by code like this:

DECLARE @Problems float(24) = 0.0;

While @Problems <> 99
BEGIN
	SET @Problems += 0.1;
	PRINT 'I''ve got ' + CAST(@Problems as varchar(10)) + ' problems but floating point precision ''aint one';
END;

It looks innocent enough but because we never actually hit a value of 99 on the nose the loop never ends, unless you want unending loops running on your server then that’s a Bad Thing™

Is using a floating point data type the end of the world? Probably not, most likely you’ll just sooner or later waste a few hours working out why the percentages on a report don’t total to 100.
Probably the biggest benefit I get as a senior developer from the float datatype is code smell. It’s a nice clear flag that that code possibly hasn’t been thought through as carefully as it should and I probably want to take a closer look.

Thanks for reading and do come back in the future, I’m planning on starting up a series over the next few weeks on another data type I’ve been spending a lot of time with lately at work – XML.

Further Reading

Bert Wagner talks about how to recreate floating point maths in SQL.
Check out SQL Server’s data type precedencies to see where you might get an implicit conversion to a float.
Phil Factor gives his take on floating point numbers in SQL Server.
Hugo Kornelis provides a great counter point in his post discussing when float is the best choice to use T-SQL Tuesday #136 – Float does NOT suck!