Using SQL Notebooks For Knowledge Sharing

Jupyter notebooks aren’t just there for the script commands you struggle to remember (although they’re pretty good for that too).

T-SQL Tuesday Logo

For this month’s T-SQL Tuesday #137 Steve Jones (b|t) asked to share how we have used, or would like to use, Jupyter notebooks.

I don’t really think about Jupyter Notebooks often, I’ve seen some nice demos of use cases for us SQL Server people (See this session by Rob Sewell for a few examples) but never given them much thought beyond that.

Which possibly shows how unglamorous they can be given that I use one virtually every day at work.
Just the one though, it’s a PowerShell notebook and as PowerShell can be used to run git commands I use it to hold the scripts that either take me too long to type out (like refreshing all the branches for our shared UAT and live environments in one go) or that I forget the syntax for because I don’t use it often enough and there’s no friendly button in Visual Studio to save me.

I thought I’d focus on something a little more interesting for this blog instead and so went back to one of the use cases that interested me when I first encountered SQL Notebooks. Using them to combine both text and code into a nice neat demo of a T-SQL feature. When I first investigated this most of my ideas fell flat because each code block was considered a different session and so temporary tables couldn’t be carried between them, this prevented you building up a decent example data set and then using it in subsequent steps. The good news is that that issue has now been fixed and so I thought I’d try converting my series of blog posts on querying XML in SQL Server into a set of notebooks.

The process was pleasantly simple, it’s just a matter of copying and pasting over and most of the formatting for headings, bullet points and links carried over without issue. A few times I had to delve into the markdown code to strip out a rogue change of font but each post took me 5-10 minutes to adapt.

You can access the results here they’re even viewable online at GitHub although the formatting of results tables isn’t quite as neat as in Azure Data Studio and there’s no way to run the sample code.

There’s a lot to be said for the approach, the long bits of code that set up the sample data can be hidden and results can be optionally saved with the notebook. As long as you have access to SQL Server you can click to run the sample code or start to experiment by changing it.

There are a few down sides. The most obvious is that while temp tables work between blocks of code, intellisense isn’t smart enough to know this and so throws up a lot of red wiggly lines. I’d also love to be able to collapse results sets without wiping them completely in the same way that you can do for code. Sometimes I wanted the results to be there but they were too long winded to want them to be filling up the page by default.
Limits come from Azure Data Studio as well, it’s not an issue for these posts but the lack of query plans will likely send me running back to good old SSMS at some point in the future.

Overall I like the result. I have something that I can easily point people to that allows them to run my example code but it also allows for more control over the surrounding text than is possible in a classic SQL script. Given that it’s relatively easy to convert a blog post into a notebook I’m going to keep at it for now and where appropriate attach a notebook version to each blog post.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

I’ve Got 99.0991 Problems But Floating Point Precision ‘Aint One

In which the blogger finds an excuse to use a title he’s been holding on to for some time and we learn why float probably isn’t the best data type to hold your numerical data in.

T-SQL Tuesday Logo

For this week’s T-SQL Tuesday, Brent Ozar (b|t) asked us to share either our most or least favorite SQL Server data type. I can’t confess to having any particularly strong feelings on the subject in either direction even if you take into account my keenness to add the number 2 on the end every time I see a datetime.

I do however struggle to resist the opportunity to use a catchy title and this week’s subject was enough to finally give me the motivation to put the furloughed part of my current four day working week to good use and set up a blog.

The float data type can often look like a tempting, easy choice for numerical data types. Why worry about all that precision and scale nonsense when you can just use a data type that will hold literally* any number!

*do not take this claim literally

Let’s test this out by counting problems. We’ll start in a relaxed state and steadily increase the pressure level until we reach 99 problems (one metric Jay-Z).

DECLARE @Problems float(24) = 0.0;

While NOT(@Problems > 99)
	SET @Problems += 0.1

Print 'I''ve got ' + CAST(@Problems as varchar(10)) + ' problems but floating point precision ''aint one';

Unfortunately we end up a little way over 99 at 99.0991 (henceforth known as an imperial Jay-Z).

Why? Because accuracy isn’t the main aim of a float, it’s a way of storing something that’s roughly the right number in a small space, in a way computers find easy to process quickly, there are values that it can get really close to but never accurately store. That’s great if you’re not that worried about precision (imagine streams of IOT readings with an inherent error to them), it’s less good if you want the exact right number (think applying interest rates to your savings).

You might also be caught out by code like this:

DECLARE @Problems float(24) = 0.0;

While @Problems <> 99
	SET @Problems += 0.1;
	PRINT 'I''ve got ' + CAST(@Problems as varchar(10)) + ' problems but floating point precision ''aint one';

It looks innocent enough but because we never actually hit a value of 99 on the nose the loop never ends, unless you want unending loops running on your server then that’s a Bad Thing™

Is using a floating point data type the end of the world? Probably not, most likely you’ll just sooner or later waste a few hours working out why the percentages on a report don’t total to 100.
Probably the biggest benefit I get as a senior developer from the float datatype is code smell. It’s a nice clear flag that that code possibly hasn’t been thought through as carefully as it should and I probably want to take a closer look.

Thanks for reading and do come back in the future, I’m planning on starting up a series over the next few weeks on another data type I’ve been spending a lot of time with lately at work – XML.

Further Reading