PASS Summit Favourites (Part 1 of Many)

PASS Summit has been great this year and I’ve been honoured with the chance to both attend and speak.
This post is my ongoing highlights with links out to the videos of sessions I’ve enjoyed and learned from.

10 Productivity Tools

Rob Sewell and John Morehouse take 10 minutes to race though 10 useful tools to help with your development. A really fun session to watch and by the end you’ll certainly be heading towards the download page for a few of these suggestions.

10 minutes into SQL Notebooks

Another 10-minute session, this time from Deepthi Goguri on notebooks. Summary notebooks are a great way to merge code, results and explanations, you should probably be using them yourself. Watch to find out why.

An Introduction to Delta Lakes and Delta Lake-Houses

Paul Andrew knows both his lakes and his houses. This is the best primer on the subject I’ve watched so far. Great to watch both for those new to the subject and those looking for a good clear introduction to share with others.

Business Benefits of Good Governance

Victoria Holt finds Data Governance exciting and I’m really glad there are people that do, it’s important! Well worth a watch to find out why and what to do about it.

MS SQL New Functions, Syntaxes, Tips & Tricks

If you know me then you probably know that I’m a big T-SQL fan and that I’ll be in my element with a session title like the one above. Damir Matešić nails it in this session covering the key new T-SQL features added since SQL 2016.

Modern Data Warehousing with Azure Synapse Analytics

A few steps up I said Paul Andrew had the best primer on Lakes and Lakehouses. The good news for us is he has competition! Unfortunately there are technical issues with the video at present (it cuts off at about 13 minutes) but having listened to Ginger Grant’s live Q&A session I’ sure that as soon as it’s fixed her session on a similar topic will be just as good.

Practical Experiences from Working with Synapse

Sessions that give real world experience are always valuable and this session from Brian Bønk and Mathias Halkjær shows why. Synapse can be intimidating and stories from those who have hit pain points and explaining how to avoid them are always welcome.

Data Transformation Magic with Power Query

For me personally this session didn’t teach me many new tricks, but it was a really good, worked example of how to put together a data extract with Power Query. I can see myself sharing this with others as a resource if/when it comes out from behind the Summit paywall. Watch it here.

Building a Regret-free Foundation for your Data Factory

I didn’t know I needed Meagan Longoria’s session until I started watching it but it’s one I’m sure I’ll be coming back to over and over as I start on new projects. One of the best things a session can do is provide warnings from past experience to help others avoid mistakes they’ll later regret. This session is over an hour’s worth of suggestions to make sure that the next person (usually future you) doesn’t regret what you did in Data Factory.

The Dream Team: Synapse Analytics Serverless SQL Pools and Pipelines

Andy Cutler takes us through how to make the most of serverless pools and pipelines in Synapse. As an old school SSIS and SQL Server type BI developer who’s starting in on Synapse projects I can see myself having this session running on repeat over the next few weeks.

An Introduction to Window Functions

What’s this Barney Lawrence guy doing on this list? Perhaps it’s a little self-indulgent but presenting at PASS Summit is always going to be a highlight and I’m not going to miss a chance to promote my own session. Feedback has been good so far and I’m pleased with what I’ve produced.

And More…

I aim to add to this list over the next few weeks. There are several sessions I haven’t had time to watch yet that I’m sure will be brilliant and PASS have promised to make available many of the sessions that weren’t live streamed in the near future.

SQLBits Buddies

Looking to get the most out of SQLBits? Why not find yourself a Bits Buddy.

Big conferences like SQLBits can be intimidating, they can feel like you’re walking into a room of a thousand strangers all of whom have known each other for years. That at least was how my first SQLBits felt. I’d driven down to Telford early one Saturday morning and slipped into a session about five minutes after it had started, dropped in to a few more, awkwardly ate lunch in a corner, talked briefly to the one former colleague I recognised and then headed for home. It wasn’t terrible but equally I didn’t feel like I’d had the full SQLBits experience.

The year after I was attending for the full conference and determined to get the most out of it, I knew a few people from other events and had managed to secure the funding to bring along a few my team from work.
Keen to make an impact early we’d travelled into central London the day before the conference for a Q&A session with some of the Microsoft team. I figured now was the time to put on my best impersonation of a sociable individual and headed over to someone who occasionally dropped into the local user group who was chatting with an unfamiliar face in the corner.
Reaching a suitable pause, I mentioned that I was looking forwards to my training days, how exciting it was having face to face training with someone whose book had sat on my desk for the last year and enquiring of my acquaintance’s conversation partner (I’d failed to catch his name) which sessions he was booked in to. He politely let me know his name and explained he was leading one of the training days, mine in fact.

Chances are you won’t have to put your foot in it quite as far as I did to start up a conversation, but to make getting to know a few more people at SQLBits even easier this year they’ve introduced the Bits Buddies team and I’m one of them. We’ll be sporting the usual volunteer’s orange shirt but with an extra armband to show that we’re extra keen to have a chat, help find you a team for the Thursday night quiz or, point you towards others who are heading out in the evening for your food or drink of preference. Do say hello if you see us about between sessions or hanging out at the community corner. There are also online drop-in sessions planned for the next week if you have any questions beforehand.

You’re also welcome to get in touch with me in advance, you can find me at twitter or LinkedIn or keep an eye out for me in person in London. I’ll talk about pretty much anything if you’re after a conversation starter I’m a BI developer by trade with time working in the NHS, social care, and travel industries. Outside of work subjects that will likely spark a interest are films, running, living with young humans (3 & 9), comics, the joys and pains of adopting a child, board games and dad jokes.
If none of those are your thing, then pick another subject or alternatively there’s plenty more of us to choose from.

T-SQL Tuesday #140: Containers for Business and Pleasure

Containers aren’t always for boring work stuff (although they’re great for that).
This post gives some details on how to set up Docker from scratch and create containers for both a SQL Server test lab and a Minecraft Bedrock Server.

TSQL Tuesday Logo

For this month’s T-SQL Tuesday Anthony Nocentino (blog|twitter) asked us to share our experiences of using containers.

Not being a DBA type containers aren’t what you would consider my home ground. My work usually starts once an environment is in place and running but lacking my own personal DBA I’ve started to dabble with Docker on my personal laptop. Partly this is to make my life easier but also partly because I expect that sooner or later containerised development environments will be the norm and a bit of experimentation now will hopefully keep me ahead of the game.

To show the variety of things you can get up to with a container I’m going to share not one but two examples; Setting up a SQL Server container to support experimentation and testing code for blog posts and, on the other side of my work life balance setting up a Minecraft server to enable the family to play together.

None of this is my area of expertise and I’ve never had the Linux\Unix experience that would make this easy going but to some extent I see that as an advantage, there are some giants in the field who know far more than I ever will. I’m just going to show the rest of us how you can cling precariously on to their shoulders and get stuff done.

How To Run SQL Server In A Docker Container

Docker is now my preferred route to running SQL Server on a personal machine. It saves the need for a direct install (and potential of an attempted uninstall), it’s a smoother process than a full fat VM type setup with the need to install and configure a whole OS (I do miss the days when Microsoft provided a complete image with a 180 day time limit). It has a relatively light footprint and, gives you room to experiment easily with multiple versions. The only real downside is we’re looking at the Linux version of SQL Server so if you want to work with either SSAS or SSIS you might need to look to the alternatives.

My needs are relatively simple, a recent version of SQL Server, access to the most common sample data sets and to avoid having to rebuild and reinstall on a regular basis. The below gives me all that and it’s been everything I’ve needed while creating and testing my series of posts on querying XML with SQL Server.

I’m not going to retread ground here and as I hinted above this isn’t really my work so I’m going to share links and give as much credit as I can, if I’ve missed someone do let me know and I’ll get this updated. I’ve been through a few iterations of setting up these containers and this is the route that I’m happiest with to date.

Installing Docker

Go get Docker from their web site and install it if you’re running on Windows you will want to be using WSL2, if this is your very first time with Docker you’ll probably need to some details from Microsoft’s guidance on that. I installed the Ubuntu 18 Linux distribution as that appears to be the current preferred option for Docker.

Changing Default Local Storage Location

WSL2 by default puts all data into virtual drive files on your system (c:) drive. This may be a problem if you have a small boot drive and a big data drive like I do. I want to handle some big databases like the Stack Overflow data sets many like to use for query examples.
This answer over on Stack Overflow gives the step by step commands to move everything over to the location of your choice.

Setting Up a SQL Server Container

I’ve tried a few approaches for creating an SQL Server Container and the one that balances ease of creation with covering long term needs comes from the cupboard of Container King and all round nice guy Andrew Pruski (b|t). His guide to SQL Server and Containers gives you pretty much everything you need to get up and running quickly. I’d recommend putting the effort in to follow his more advanced Docker Compose example as it solves a few permissions issues you might encounter when taking simpler approaches and splits folders into neat volumes similar to those you’d see on a physical install.

My current setup is almost exactly the same as this only I’ve switched the container port to 1433 which is the default SSMS looks for. This saves me remembering the correct number when connecting. I also changed the password from Andrew’s default, it’s probably best we don’t all have the same password (make sure it’s complex enough, very sort simple passwords may be rejected and cause the process to fail).

If you’ve followed all of the above then by this point you’ll have an empty instance of SQL Server in a container. Connect up and give it a few test queries, if all you need is an empty server you can stop now.

Adding Some Data

Now we have our server we need to add some data. With WSL2 we have a route to access the drive volumes we’ve created from in windows but finding them can be tricky. The path I use with the above setup is:

\\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes

You can then see sub folders for all the volumes you have created with Docker. You can then copy backups of your preferred data sets into the backup folder (I’m using varieties of Adventure Works and World Wide Importers) and restore as you would anywhere else.

Alternatively copy mdf, ndf and ldf files into the respective data and log folders and attach them (I did this for the Stack Overflow samples Brent Ozar maintains).
I had a few troubles with these, my first attempt with this approach left me with a read only database. This may have been because I copied the files directly from another container volume or it might be something that happens every time. I ended up having to adapt the command from Andrew’s setup that gives permissions to his SQL containers again against these file, detaching and re-attaching.

chown -R mssql:mssql /var/opt/sqlserver/[Whatever the file is that's causing the problems]

And we’re done, we have a shiny SQL Server 2019 instance in Docker, it starts when I need it, I can stop it when I don’t and if I need to upgrade I just re-build the container and all that sample data lives on in the volumes. Those visiting this page in a strictly professional capacity can stop reading now.

How to Run a Minecraft Bedrock Server In Docker

Many of you out there may, like me, find yourselves sharing a home with a miniature Minecraft Addict. I do and in between showing off her latest builds she’s begun pestering me to find a way to play with friends and family. I figured I could maybe save some work and re-use my newfound containerised powers for fun as well as profit and a few quick searches proved me right.

Here’s the container I’ve been using I don’t know who ‘itzg’ is but he’s a hero for setting this up. The server version used is a little out of date but it self updates and thanks to the use of a volume for data it remembers updates, configurations and maps so you don’t lose anything between sessions.

I found a few issues with the docker-compose.yml file so here’s my revised version with a few tweaks, extras and, a nice world seed that drops you in on the edge of a village for added in game friendship. If you didn’t get that SQL Server container up and running in the previous section then follow the steps on installation and setting the storage location before coming back here.

version: '3.8'

services:
  Minecraft:
    image: "itzg/minecraft-bedrock-server"
    environment:
      EULA: "TRUE"
      GAMEMODE: survival
      DIFFICULTY: normal
      LEVEL_SEED: "-850418298"
      ALLOW_CHEATS: "false"
      SERVER_NAME: "My Server Name"
      VIEW_DISTANCE: "64"
      LEVEL_NAME: "My Survival World"
     #  WHITE_LIST: "true"
      DEFAULT_PLAYER_PERMISSION_LEVEL: member
    ports:
      - 19132:19132/udp
    volumes:
      - bds:/data
    stdin_open: true
    tty: true

volumes:
  bds: {}

That’s all you need for a local network game. If you want to play with others than you’ll need to work out how to forward port 19132 to the computer running the container and possibly figure out how to set up the whitelist file in the volume created so there’s no risks of unexpected strangers dropping in on you.

Have fun and look out for those creepers!

T-SQL Tuesday 138: Managing Technology Changes

How Do You Respond When Technology Changes Under You? Grab it with both hands and cling on tight!

T-SQL Tuesday Logo

In this month’s T-SQL Tuesday Andy Leonard (b|t) asks How Do You Respond When Technology Changes Under You? .
My response to this would be that wherever possible embrace it. Despite inevitable teething troubles most software changes are for the better and while you probably can’t prevent change you can make sure it doesn’t leave you behind.

One aspect of T-SQL that everyone should embrace is window functions. They were first added in SQL Server 2005 with simple ranking functions and have built since then with features such as greater control over window size and shape and neat functions like lag, lead .

Despite being around for over a decade many haven’t really embraced window functions as far as they could and probably should.

Here’s an example of how a window function can make your code neater and faster. The example I’ll share is somewhat contrived to fit in the Adventure Works sample databases but the general pattern is one I’ve been fighting against for at least a decade.

Imagine that for some reasonthe Adventure Works Company wants to identify the most fertile customer that has purchased each product sold. Our analyst sits down and thinks “the most fertile customer has the most children” they therefore write a query to find the maximum number of children in the customer dimension of the data warehouse for each product. This doesn’t give us the customer just the number of children they have so we join the result back to the same table to get their name. Unfortunately we get multiple results because more than one person has that number of children.
Out analyst thinks again and decides that as a tie breaker we should pick the customer with the most recent date of birth (they had those children fastest) so picks the maximum date of birth from that list and again joins it to the customer table… only to find that there’s more than one with the same date of birth, finally we pick the lowest customer id and as there are no duplicates in this we get our answer.

The resulting query looks like this, actually I’m being kind here and using common table expressions to make the steps clear, more often than not when I see code like this it would be sub queries nested in sub queries and so on.

USE AdventureWorksDW2017;

WITH MaxChildren AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
GROUP BY S.ProductKey
	),
MaxBirth AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxChildren AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
GROUP BY S.ProductKey
), 
MinCustomer AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth,MIN(C.CustomerKey) AS MinCustomer
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxBirth AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND M.MaxBirth = C.BirthDate
GROUP BY S.ProductKey
)
SELECT S.ProductKey, C.CustomerKey
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MinCustomer AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND C.BirthDate = M.MaxBirth
	AND C.CustomerKey = M.MinCustomer
ORDER BY ProductKey, CustomerKey;

Apart from the length of the code the big problem is the number of times we go though the involved tables. SQL Server isn’t going to do anything clever here, we code in the two tables four times and so those tables will get read four times.

Enter window functions and the most common of them all ROW_NUMBER(), we use PARTITION BY to give us a number sequence that resets per product and then we order our customers by all our criteria in one go. The result is ranked customers with the one coming first in our sequence being the choice we need. Finally we just filter those results to get those top picks.

USE AdventureWorksDW2017;

WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
						PARTITION BY S.ProductKey 
						ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
					) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
	)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;

All this happens in 17 lines of code rather than the 43 above. The intention of the query is clearer as well.
As an added bonus the query runs faster, we go through the table only once, admittedly with a big sort of the data but the gains outweigh the losses by a long distance.

There are many more examples of where window functions make your queries better and faster, it’s something I’m hoping to cover once I’ve finished my blog series on querying XML until then here are some links to some good reading material.

Using SQL Notebooks For Knowledge Sharing

Jupyter notebooks aren’t just there for the script commands you struggle to remember (although they’re pretty good for that too).

T-SQL Tuesday Logo

For this month’s T-SQL Tuesday #137 Steve Jones (b|t) asked to share how we have used, or would like to use, Jupyter notebooks.

I don’t really think about Jupyter Notebooks often, I’ve seen some nice demos of use cases for us SQL Server people (See this session by Rob Sewell for a few examples) but never given them much thought beyond that.

Which possibly shows how unglamorous they can be given that I use one virtually every day at work.
Just the one though, it’s a PowerShell notebook and as PowerShell can be used to run git commands I use it to hold the scripts that either take me too long to type out (like refreshing all the branches for our shared UAT and live environments in one go) or that I forget the syntax for because I don’t use it often enough and there’s no friendly button in Visual Studio to save me.

I thought I’d focus on something a little more interesting for this blog instead and so went back to one of the use cases that interested me when I first encountered SQL Notebooks. Using them to combine both text and code into a nice neat demo of a T-SQL feature. When I first investigated this most of my ideas fell flat because each code block was considered a different session and so temporary tables couldn’t be carried between them, this prevented you building up a decent example data set and then using it in subsequent steps. The good news is that that issue has now been fixed and so I thought I’d try converting my series of blog posts on querying XML in SQL Server into a set of notebooks.

The process was pleasantly simple, it’s just a matter of copying and pasting over and most of the formatting for headings, bullet points and links carried over without issue. A few times I had to delve into the markdown code to strip out a rogue change of font but each post took me 5-10 minutes to adapt.

You can access the results here they’re even viewable online at GitHub although the formatting of results tables isn’t quite as neat as in Azure Data Studio and there’s no way to run the sample code.

There’s a lot to be said for the approach, the long bits of code that set up the sample data can be hidden and results can be optionally saved with the notebook. As long as you have access to SQL Server you can click to run the sample code or start to experiment by changing it.

There are a few down sides. The most obvious is that while temp tables work between blocks of code, intellisense isn’t smart enough to know this and so throws up a lot of red wiggly lines. I’d also love to be able to collapse results sets without wiping them completely in the same way that you can do for code. Sometimes I wanted the results to be there but they were too long winded to want them to be filling up the page by default.
Limits come from Azure Data Studio as well, it’s not an issue for these posts but the lack of query plans will likely send me running back to good old SSMS at some point in the future.

Overall I like the result. I have something that I can easily point people to that allows them to run my example code but it also allows for more control over the surrounding text than is possible in a classic SQL script. Given that it’s relatively easy to convert a blog post into a notebook I’m going to keep at it for now and where appropriate attach a notebook version to each blog post.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form