Blog

PASS Summit Favourites (Part 1 of Many)

PASS Summit has been great this year and I’ve been honoured with the chance to both attend and speak.
This post is my ongoing highlights with links out to the videos of sessions I’ve enjoyed and learned from.

10 Productivity Tools

Rob Sewell and John Morehouse take 10 minutes to race though 10 useful tools to help with your development. A really fun session to watch and by the end you’ll certainly be heading towards the download page for a few of these suggestions.

10 minutes into SQL Notebooks

Another 10-minute session, this time from Deepthi Goguri on notebooks. Summary notebooks are a great way to merge code, results and explanations, you should probably be using them yourself. Watch to find out why.

An Introduction to Delta Lakes and Delta Lake-Houses

Paul Andrew knows both his lakes and his houses. This is the best primer on the subject I’ve watched so far. Great to watch both for those new to the subject and those looking for a good clear introduction to share with others.

Business Benefits of Good Governance

Victoria Holt finds Data Governance exciting and I’m really glad there are people that do, it’s important! Well worth a watch to find out why and what to do about it.

MS SQL New Functions, Syntaxes, Tips & Tricks

If you know me then you probably know that I’m a big T-SQL fan and that I’ll be in my element with a session title like the one above. Damir Matešić nails it in this session covering the key new T-SQL features added since SQL 2016.

Modern Data Warehousing with Azure Synapse Analytics

A few steps up I said Paul Andrew had the best primer on Lakes and Lakehouses. The good news for us is he has competition! Unfortunately there are technical issues with the video at present (it cuts off at about 13 minutes) but having listened to Ginger Grant’s live Q&A session I’ sure that as soon as it’s fixed her session on a similar topic will be just as good.

Practical Experiences from Working with Synapse

Sessions that give real world experience are always valuable and this session from Brian Bønk and Mathias Halkjær shows why. Synapse can be intimidating and stories from those who have hit pain points and explaining how to avoid them are always welcome.

Data Transformation Magic with Power Query

For me personally this session didn’t teach me many new tricks, but it was a really good, worked example of how to put together a data extract with Power Query. I can see myself sharing this with others as a resource if/when it comes out from behind the Summit paywall. Watch it here.

Building a Regret-free Foundation for your Data Factory

I didn’t know I needed Meagan Longoria’s session until I started watching it but it’s one I’m sure I’ll be coming back to over and over as I start on new projects. One of the best things a session can do is provide warnings from past experience to help others avoid mistakes they’ll later regret. This session is over an hour’s worth of suggestions to make sure that the next person (usually future you) doesn’t regret what you did in Data Factory.

The Dream Team: Synapse Analytics Serverless SQL Pools and Pipelines

Andy Cutler takes us through how to make the most of serverless pools and pipelines in Synapse. As an old school SSIS and SQL Server type BI developer who’s starting in on Synapse projects I can see myself having this session running on repeat over the next few weeks.

An Introduction to Window Functions

What’s this Barney Lawrence guy doing on this list? Perhaps it’s a little self-indulgent but presenting at PASS Summit is always going to be a highlight and I’m not going to miss a chance to promote my own session. Feedback has been good so far and I’m pleased with what I’ve produced.

And More…

I aim to add to this list over the next few weeks. There are several sessions I haven’t had time to watch yet that I’m sure will be brilliant and PASS have promised to make available many of the sessions that weren’t live streamed in the near future.

SQL Window Functions

Links to resources related to Window Functions and my sessions on them.

This page is here to act as an index to material I’ve produced relating to window functions.

GitHub: https://github.com/BarneyLawrence/Sessions-SQL-Window_Functions
My main repository, this holds PowerPoint presentations, sample queries and workbooks either used in sessions, to back them up or to develop the examples used.

Future Sessions

I’ll be presenting on window functions at:
PASS Data Community Summit – November 15th to 18th. Pre recorded session and live QA on Wednesday 15th 04:00 PM–04:30 PM PST

Further Resources

Itzik Ben-Gan’s Blogs at SQLPerformance.com
T-SQL Beyond the Basics – My 2021 PASS Summit session, includes sections on window functions amongst other intermediate techniques.

How to Add a “None” Option to Power BI Field Parameters

Field Parameters are a great new addition to Power BI but a little rough around the edges. Let’s look at getting down and dirty with DAX to add a ‘none’ option.

Field Parameters are one of my favourite recent additions to Power BI. The ability to turn a single chart into potentially dozens changes the way we think about putting variations of visuals on the page. It was a real wow moment for a client recently when I showed how field parameters for 5 fields and 5 measures could produce a single report page that replaced 25 of their existing reports.

While they theoretically don’t allow you to do much that you couldn’t previously with a disconnected slicer and a lot of DAX they build it faster and without the need to get heavily in to coding DAX. Anything that lowers the difficulty bar for users trying to make the most out of Power BI is a good thing in my book.

Unfortunately being a new feature and still in preview they’re still a little rough around the edges. The big two issues for me are:

  • There’s no interface to make subsequent edits.
  • Once you pick one or more options in a field parameter there’s no way to unselect all of them or switch to a ‘none’ option.

The first can be bypassed for now by editing the DAX that appears when you select the parameter in the fields list. The second is officially not possible but I stumbled on a couple of options recently.

Option 1 – Create A None Dimension

Option 1 is a bit more work but also a little more ‘official’ as an approach. Simply add an extra dimension to the field parameter called ‘None’ based on a new single valued column.
In my test run I downloaded a copy of the Contoso sample dataset and added the following calculated column to the Calendar table (calendar seemed like a good choice as in most models it will be connected to all your fact tables).

None = "None"

After it’s in place just add it to your field parameter as you would any other field.

The up side to this approach is that it’s all above board, the down sides are that you’ve added a new column to your model and that None doesn’t actually mean nothing is selected and instead you have something called “nothing” this means with none selected your legends don’t vanish and instead you have a single item called none.

The DAX for the field parameter now looks like this:

Chart Legend = {
    ("None", NAMEOF('Calendar'[None]), 0),
    ("ChannelName", NAMEOF('Channel'[ChannelName]), 1),
    ("ContinentName", NAMEOF('Geography'[ContinentName]), 2),
    ("StoreType", NAMEOF('Stores'[StoreType]), 3),
    ("ProductCategory", NAMEOF('ProductCategory'[ProductCategory]), 4)
}

Option 2 – Getting Dirty With The Dax

I stumbled on this approach while dealing with the issue mentioned above that there’s no edit interface for the list of fields in your parameter. The only way round this if you want to add another field or change the ordering is to pull up your sleeves, select the parameter in the fields list and start changing the definition.

While I was doing this a few days ago I wondered what would happen if I changed the NAMEOF() function to be a fixed value. Lo and behold we have a none option that behaves just as you’d expect it to, pick it and we get an empty legend\column\wherever you’re using your parameter. You don’t even have to create a placeholder column.

The code from my Contoso sample now looks like this:

Chart Legend = {
    ("None", "None", 0),
    ("ChannelName", NAMEOF('Channel'[ChannelName]), 1),
    ("ContinentName", NAMEOF('Geography'[ContinentName]), 2),
    ("StoreType", NAMEOF('Stores'[StoreType]), 3),
    ("ProductCategory", NAMEOF('ProductCategory'[ProductCategory]), 4)
}

This option comes with the warning that it’s in no way a supported approach. We’re basically breaking the parameter by making it look for a field that doesn’t exist and there could be consequences I’m not aware of. If you fel this is too much of a risk then stick with option 1.

SQLBits Buddies

Looking to get the most out of SQLBits? Why not find yourself a Bits Buddy.

Big conferences like SQLBits can be intimidating, they can feel like you’re walking into a room of a thousand strangers all of whom have known each other for years. That at least was how my first SQLBits felt. I’d driven down to Telford early one Saturday morning and slipped into a session about five minutes after it had started, dropped in to a few more, awkwardly ate lunch in a corner, talked briefly to the one former colleague I recognised and then headed for home. It wasn’t terrible but equally I didn’t feel like I’d had the full SQLBits experience.

The year after I was attending for the full conference and determined to get the most out of it, I knew a few people from other events and had managed to secure the funding to bring along a few my team from work.
Keen to make an impact early we’d travelled into central London the day before the conference for a Q&A session with some of the Microsoft team. I figured now was the time to put on my best impersonation of a sociable individual and headed over to someone who occasionally dropped into the local user group who was chatting with an unfamiliar face in the corner.
Reaching a suitable pause, I mentioned that I was looking forwards to my training days, how exciting it was having face to face training with someone whose book had sat on my desk for the last year and enquiring of my acquaintance’s conversation partner (I’d failed to catch his name) which sessions he was booked in to. He politely let me know his name and explained he was leading one of the training days, mine in fact.

Chances are you won’t have to put your foot in it quite as far as I did to start up a conversation, but to make getting to know a few more people at SQLBits even easier this year they’ve introduced the Bits Buddies team and I’m one of them. We’ll be sporting the usual volunteer’s orange shirt but with an extra armband to show that we’re extra keen to have a chat, help find you a team for the Thursday night quiz or, point you towards others who are heading out in the evening for your food or drink of preference. Do say hello if you see us about between sessions or hanging out at the community corner. There are also online drop-in sessions planned for the next week if you have any questions beforehand.

You’re also welcome to get in touch with me in advance, you can find me at twitter or LinkedIn or keep an eye out for me in person in London. I’ll talk about pretty much anything if you’re after a conversation starter I’m a BI developer by trade with time working in the NHS, social care, and travel industries. Outside of work subjects that will likely spark a interest are films, running, living with young humans (3 & 9), comics, the joys and pains of adopting a child, board games and dad jokes.
If none of those are your thing, then pick another subject or alternatively there’s plenty more of us to choose from.

SSMS Tips and Tricks: The Amazing Alt Key

Once you know how to use it the Alt key is almost magical, reducing many tedious typing tasks down to a handful of clicks and keystrokes.

I went years without knowing how useful the Alt key was, it’s not well documented but I can guarantee that once you know about it and give it a little practice you won’t be able to live without it.

While I’m filing this under SSMS Tips and Tricks but it works equally in Visual Studio, VS Code, Azure Data Studio and even Notepad ++ (but not plain old vanilla notepad). It’s worth a test in other places too.

What does the Alt Key Do In SSMS?

Holding down the Alt allows you to select an arbitrary block in a text entry area. With that done you can act on the entire selection e.g. by typing, copying, or pasting.

My most common use is adding commas. Have you ever had a list of field names or values that you need to be comma separated? Perhaps you’ve used the Alt + F1 keyboard shortcut to list fields in a table and now you want them in a SELECT statement. Perhaps you have a list of row identifiers of test cases in a large data set that you want to drop into an IN operator so you can see the results. Whatever the need, at some point you’ve probably sat pressing the down arrow and comma key over and over thinking there must be an easier approach (I used to do this kind of stuff in Excel before I knew better).

With the Alt key you can just select every row and type your text into all of them at the same time. A picture is worth a thousand words so here’s an animated gif of the process in action.

Using the Alt key to select multiple rows and type the same values on all of them.

How Does the Alt Key Affect Copying and Pasting?

Alt also allows you to vary how you copy and paste. A normal multi line copy in the clipboard combined with Alt to select multiple rows results in those multiple rows being pasted multiple times. Not often useful unless you want to generate a lot of lines of text quickly.
If you hold down the Alt key to select your copy area then the multiple select is remembered and it fills downward when pasting. Again, this is easier illustrated than described.

Using the Alt key to select multiple rows of text and then paste them to a different location.

Putting It All Together

Once you understand the behaviour you can start to put this together to do something more advanced, this comes in handy where you need to apply the same function to multiple fields and then also need to give those fields unique names. It’s a common and tedious process without the Alt key. With it we can type on multiple lines at the same time for the function and then paste to multiple rows for the field name. The final process looks like this.

Using the Alt key to wrap text in a function and then add the copy and paste the field name for each row.

I couldn’t live without the Alt key now and after a bit of practice doing things like the above are now so embedded into my muscle memory they just happen as I write queries. It’s well worth the effort to get to that point in my opinion, I’m a big fan of anything that reduces a dull task to something quick.

SSMS Tips and Tricks: Custom Keyboard Shortcuts

Why limit yourself to the built in SSMS shortcuts when you can build new ones for yourself?

One of the first things I do when faced with a fresh install of SQL Server Management Studio is to set up a couple of custom keyboard shortcuts.
You’ll find this feature in Options under the Tools menu and then by navigating through Environment > Keyboard > Query Shortcuts

Screenshot of query shortcuts section of SSMS options.

The screenshot above shows my usual additions to the defaults:

Under Ctrl + F1 ‘SELECT TOP 100 * FROM ‘
Under Ctrl + 3 ‘SELECT COUNT(*) FROM ‘

The trailing spaces are important, this won’t work without them.

Feel free to fill in any of the gaps if you’d prefer a different shortcut or one of the above causes issues (I like Ctrl + F1 as it’s easy to hit with one hand but on some machines it ignores the Ctrl and opens up a help page). I’ll assume you’re using the same as me for the rest of this blog.

Once set up, we can open up a new query window and give them a test (they may not be applied to windows already open). How these work is that specified value gets appended to whatever text you have and run as a query. So selecting a table inside a query and hitting our new shortcut gives us either the top 100 rows or a count of rows.

Peeking At Data With Ctrl + F1

Here’s my Ctrl + F1 in action, note how I can use anything that’s valid after a from clause, it can be multiple columns and WHERE clauses too.

Using our new Ctrl + F1 shortcut to peek ad data sets

I use this any time I need to peek at what’s in some table(s) while I’m writing a query to get a feel for the data and to remind me of what’s where.

You don’t have to stick to a TOP 100 in this shortcut but be warned if you get any bigger then things will get slower as there appears to be some sort of performance optimisation applied to values of 100 or below.

Grabbing Metadata With Alt + F1

Recently I’ve become a fan of the built in Alt + F1 option as well, it calls sp_help which amongst other things lists indexes, column names and data types. Don’t have a fancy tool to expand SELECT * into column names? Just use this shortcut and copy and paste from the results.

Counting Rows With Ctrl + 3

The need to count rows comes up a lot for me. Is this a big table I need to be careful with or a smaller table but the real power of this shortcut comes from it’s ability to count across joins.
Have you ever been in a scenario where you have multiple joins and you know one of them is introducing duplicate rows but not which one? Use this to count your starting table and then steadily expand your selection to include more tables until the count changes.
Here’s a simple demo, imagine I wanted customers and their first purchase but it hadn’t dawned on me that a customer might buy more than one item at the same time.

Using our new Ctrl + 3 to get row counts from parts of a query

The above can of course become quite slow if you’re counting a lot of rows but that’s what the cancel query button is for right?

Ctrl + 4 to Ctrl + 0

That leaves a lot of empty spaces that I’m sure can be put to a creative use, I’d love to hear suggestions for other useful shortcuts. The only real limitations are the text needs to fit on a single line and as it must be prefixed on to the selected text and executed we’re looking at simple SQL statements and stored procedures that take a single (or no) parameter.

Feel free to drop suggestions into the comments.

T-SQL Tuesday #140: Containers for Business and Pleasure

Containers aren’t always for boring work stuff (although they’re great for that).
This post gives some details on how to set up Docker from scratch and create containers for both a SQL Server test lab and a Minecraft Bedrock Server.

TSQL Tuesday Logo

For this month’s T-SQL Tuesday Anthony Nocentino (blog|twitter) asked us to share our experiences of using containers.

Not being a DBA type containers aren’t what you would consider my home ground. My work usually starts once an environment is in place and running but lacking my own personal DBA I’ve started to dabble with Docker on my personal laptop. Partly this is to make my life easier but also partly because I expect that sooner or later containerised development environments will be the norm and a bit of experimentation now will hopefully keep me ahead of the game.

To show the variety of things you can get up to with a container I’m going to share not one but two examples; Setting up a SQL Server container to support experimentation and testing code for blog posts and, on the other side of my work life balance setting up a Minecraft server to enable the family to play together.

None of this is my area of expertise and I’ve never had the Linux\Unix experience that would make this easy going but to some extent I see that as an advantage, there are some giants in the field who know far more than I ever will. I’m just going to show the rest of us how you can cling precariously on to their shoulders and get stuff done.

How To Run SQL Server In A Docker Container

Docker is now my preferred route to running SQL Server on a personal machine. It saves the need for a direct install (and potential of an attempted uninstall), it’s a smoother process than a full fat VM type setup with the need to install and configure a whole OS (I do miss the days when Microsoft provided a complete image with a 180 day time limit). It has a relatively light footprint and, gives you room to experiment easily with multiple versions. The only real downside is we’re looking at the Linux version of SQL Server so if you want to work with either SSAS or SSIS you might need to look to the alternatives.

My needs are relatively simple, a recent version of SQL Server, access to the most common sample data sets and to avoid having to rebuild and reinstall on a regular basis. The below gives me all that and it’s been everything I’ve needed while creating and testing my series of posts on querying XML with SQL Server.

I’m not going to retread ground here and as I hinted above this isn’t really my work so I’m going to share links and give as much credit as I can, if I’ve missed someone do let me know and I’ll get this updated. I’ve been through a few iterations of setting up these containers and this is the route that I’m happiest with to date.

Installing Docker

Go get Docker from their web site and install it if you’re running on Windows you will want to be using WSL2, if this is your very first time with Docker you’ll probably need to some details from Microsoft’s guidance on that. I installed the Ubuntu 18 Linux distribution as that appears to be the current preferred option for Docker.

Changing Default Local Storage Location

WSL2 by default puts all data into virtual drive files on your system (c:) drive. This may be a problem if you have a small boot drive and a big data drive like I do. I want to handle some big databases like the Stack Overflow data sets many like to use for query examples.
This answer over on Stack Overflow gives the step by step commands to move everything over to the location of your choice.

Setting Up a SQL Server Container

I’ve tried a few approaches for creating an SQL Server Container and the one that balances ease of creation with covering long term needs comes from the cupboard of Container King and all round nice guy Andrew Pruski (b|t). His guide to SQL Server and Containers gives you pretty much everything you need to get up and running quickly. I’d recommend putting the effort in to follow his more advanced Docker Compose example as it solves a few permissions issues you might encounter when taking simpler approaches and splits folders into neat volumes similar to those you’d see on a physical install.

My current setup is almost exactly the same as this only I’ve switched the container port to 1433 which is the default SSMS looks for. This saves me remembering the correct number when connecting. I also changed the password from Andrew’s default, it’s probably best we don’t all have the same password (make sure it’s complex enough, very sort simple passwords may be rejected and cause the process to fail).

If you’ve followed all of the above then by this point you’ll have an empty instance of SQL Server in a container. Connect up and give it a few test queries, if all you need is an empty server you can stop now.

Adding Some Data

Now we have our server we need to add some data. With WSL2 we have a route to access the drive volumes we’ve created from in windows but finding them can be tricky. The path I use with the above setup is:

\\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes

You can then see sub folders for all the volumes you have created with Docker. You can then copy backups of your preferred data sets into the backup folder (I’m using varieties of Adventure Works and World Wide Importers) and restore as you would anywhere else.

Alternatively copy mdf, ndf and ldf files into the respective data and log folders and attach them (I did this for the Stack Overflow samples Brent Ozar maintains).
I had a few troubles with these, my first attempt with this approach left me with a read only database. This may have been because I copied the files directly from another container volume or it might be something that happens every time. I ended up having to adapt the command from Andrew’s setup that gives permissions to his SQL containers again against these file, detaching and re-attaching.

chown -R mssql:mssql /var/opt/sqlserver/[Whatever the file is that's causing the problems]

And we’re done, we have a shiny SQL Server 2019 instance in Docker, it starts when I need it, I can stop it when I don’t and if I need to upgrade I just re-build the container and all that sample data lives on in the volumes. Those visiting this page in a strictly professional capacity can stop reading now.

How to Run a Minecraft Bedrock Server In Docker

Many of you out there may, like me, find yourselves sharing a home with a miniature Minecraft Addict. I do and in between showing off her latest builds she’s begun pestering me to find a way to play with friends and family. I figured I could maybe save some work and re-use my newfound containerised powers for fun as well as profit and a few quick searches proved me right.

Here’s the container I’ve been using I don’t know who ‘itzg’ is but he’s a hero for setting this up. The server version used is a little out of date but it self updates and thanks to the use of a volume for data it remembers updates, configurations and maps so you don’t lose anything between sessions.

I found a few issues with the docker-compose.yml file so here’s my revised version with a few tweaks, extras and, a nice world seed that drops you in on the edge of a village for added in game friendship. If you didn’t get that SQL Server container up and running in the previous section then follow the steps on installation and setting the storage location before coming back here.

version: '3.8'

services:
  Minecraft:
    image: "itzg/minecraft-bedrock-server"
    environment:
      EULA: "TRUE"
      GAMEMODE: survival
      DIFFICULTY: normal
      LEVEL_SEED: "-850418298"
      ALLOW_CHEATS: "false"
      SERVER_NAME: "My Server Name"
      VIEW_DISTANCE: "64"
      LEVEL_NAME: "My Survival World"
     #  WHITE_LIST: "true"
      DEFAULT_PLAYER_PERMISSION_LEVEL: member
    ports:
      - 19132:19132/udp
    volumes:
      - bds:/data
    stdin_open: true
    tty: true

volumes:
  bds: {}

That’s all you need for a local network game. If you want to play with others than you’ll need to work out how to forward port 19132 to the computer running the container and possibly figure out how to set up the whitelist file in the volume created so there’s no risks of unexpected strangers dropping in on you.

Have fun and look out for those creepers!

Querying XML in SQL Server – A Cheat Sheet

Need to learn as much as you can about XML in SQL Server as quickly as possible or just looking for a helpful reminder of the methods? Here’s the place to start.

I’ve been giving a few lightening talks on the topic of querying XML in SQL Server recently and this has really forced me to squeeze my blog content down into as small a space as possible. Part of that process was gathering examples that cover the maximum number of concepts in the minimum number of examples.

To make sure all my examples actually worked I gathered them together in another SQL Notebook this was for my own benefit to start with but I quickly realised it also to gives others the chance to run or change the examples themselves, it’s missing the full detail of my other blog posts and the notebooks I copied them into but if all you need is a few quick examples of values, nodes, exist or query methods it’s a great thing to have to hand.

Here’s the link:

A Cheat Sheet For Querying XML In SQL Server

It’s viewable directly in GitHub but best used in Azure Data Studio, preferably connected to an instance of SQL Server so you can run the queries.

Have fun with it and do let me know if there are any key examples you feel are missing.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

T-SQL Tuesday 138: Managing Technology Changes

How Do You Respond When Technology Changes Under You? Grab it with both hands and cling on tight!

T-SQL Tuesday Logo

In this month’s T-SQL Tuesday Andy Leonard (b|t) asks How Do You Respond When Technology Changes Under You? .
My response to this would be that wherever possible embrace it. Despite inevitable teething troubles most software changes are for the better and while you probably can’t prevent change you can make sure it doesn’t leave you behind.

One aspect of T-SQL that everyone should embrace is window functions. They were first added in SQL Server 2005 with simple ranking functions and have built since then with features such as greater control over window size and shape and neat functions like lag, lead .

Despite being around for over a decade many haven’t really embraced window functions as far as they could and probably should.

Here’s an example of how a window function can make your code neater and faster. The example I’ll share is somewhat contrived to fit in the Adventure Works sample databases but the general pattern is one I’ve been fighting against for at least a decade.

Imagine that for some reasonthe Adventure Works Company wants to identify the most fertile customer that has purchased each product sold. Our analyst sits down and thinks “the most fertile customer has the most children” they therefore write a query to find the maximum number of children in the customer dimension of the data warehouse for each product. This doesn’t give us the customer just the number of children they have so we join the result back to the same table to get their name. Unfortunately we get multiple results because more than one person has that number of children.
Out analyst thinks again and decides that as a tie breaker we should pick the customer with the most recent date of birth (they had those children fastest) so picks the maximum date of birth from that list and again joins it to the customer table… only to find that there’s more than one with the same date of birth, finally we pick the lowest customer id and as there are no duplicates in this we get our answer.

The resulting query looks like this, actually I’m being kind here and using common table expressions to make the steps clear, more often than not when I see code like this it would be sub queries nested in sub queries and so on.

USE AdventureWorksDW2017;

WITH MaxChildren AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
GROUP BY S.ProductKey
	),
MaxBirth AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxChildren AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
GROUP BY S.ProductKey
), 
MinCustomer AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth,MIN(C.CustomerKey) AS MinCustomer
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxBirth AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND M.MaxBirth = C.BirthDate
GROUP BY S.ProductKey
)
SELECT S.ProductKey, C.CustomerKey
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MinCustomer AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND C.BirthDate = M.MaxBirth
	AND C.CustomerKey = M.MinCustomer
ORDER BY ProductKey, CustomerKey;

Apart from the length of the code the big problem is the number of times we go though the involved tables. SQL Server isn’t going to do anything clever here, we code in the two tables four times and so those tables will get read four times.

Enter window functions and the most common of them all ROW_NUMBER(), we use PARTITION BY to give us a number sequence that resets per product and then we order our customers by all our criteria in one go. The result is ranked customers with the one coming first in our sequence being the choice we need. Finally we just filter those results to get those top picks.

USE AdventureWorksDW2017;

WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
						PARTITION BY S.ProductKey 
						ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
					) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
	)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;

All this happens in 17 lines of code rather than the 43 above. The intention of the query is clearer as well.
As an added bonus the query runs faster, we go through the table only once, admittedly with a big sort of the data but the gains outweigh the losses by a long distance.

There are many more examples of where window functions make your queries better and faster, it’s something I’m hoping to cover once I’ve finished my blog series on querying XML until then here are some links to some good reading material.

Querying XML in SQL Server – Part 7 – Creating XML

A brief break from querying XML this week while I prepare some test data. This naturally turns our eyes to building XML.

Last time I promised to start looking at performance and XML indexes. I’ll get there but I realised I needed a decent test data set and couldn’t easily lay my hands on one that matched my needs so had to start building one myself. Part way through the process I figured I may as well squeeze an extra post out of the process and blog the creation process.

In my experience creating XML is a less common task than reading it with one semi-exception (that being the old trick of using FOR XML PATH(”) to concatenate strings) but it can be an important one. The biggest piece of XML I’ve built was for a government mandated return, it was the only XML I built on that job but a sizeable proportion of the entire department’s work culminated in it. The switch from a set of Excel macros which took over eight hours to run to something that could rebuilt in minutes had a huge impact on our efficiency, particularly when there would be multiple rounds of validation failures against initial attempts at submission.

There are a few variants when it comes to converting queries into XML but in my opinion the only one you really need to know is the XML PATH approach, the others being either fiddly or not providing enough control.

Because all the cool kids are using it and I had a copy available from his recent live steamed courses I’m going to use a copy of the Stack Overflow database that Brent Ozar maintains. You can get your own copy here. I’m using the 2013 version but if you want to follow along any variant should work here. I’ve used TOP to limit results returned so that they’re short enough to place in the Blog.

How Do I Build XML In SQL Server?

At the simplest level we just drop FOR XML PATH after a query, we give a value to the path which becomes the element each row is nested in and give a root value that wraps the whole output.

SELECT TOP 10
	U.Id AS 'UserId',
	U.DisplayName,
	U.UpVotes,
	U.DownVotes
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

Which gets something like this (this is going to be a post where the SQL notebook version and its embedded query results is going to be really helpful, you can find these here):

<Users>
  <User>
    <UserId>17</UserId>
    <DisplayName>Nick Berardi</DisplayName>
    <UpVotes>884</UpVotes>
    <DownVotes>215</DownVotes>
  </User>
  <User>
    <UserId>19</UserId>
    <DisplayName>Mads Kristiansen</DisplayName>
    <UpVotes>36</UpVotes>
    <DownVotes>12</DownVotes>
  </User>
</Users>

How do I Create Attributes In XML With SQL Server?

Adding Attributes is easy, we just give the relevant column a name commencing with an @ symbol.

SELECT TOP 2
	U.Id AS '@UserId',
	U.DisplayName,
	U.UpVotes,
	U.DownVotes
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

And now the XML looks like this:

<Users>
  <User UserId="17">
    <DisplayName>Nick Berardi</DisplayName>
    <UpVotes>884</UpVotes>
    <DownVotes>215</DownVotes>
  </User>
  <User UserId="19">
    <DisplayName>Mads Kristiansen</DisplayName>
    <UpVotes>36</UpVotes>
    <DownVotes>12</DownVotes>
  </User>
</Users>

How Do I Nest Elements In XML With SQL Server?

If we want to start grouping elements under parent tags then we just name the columns with the needed path. If two consecutive fields have a common element to the path they get grouped together. Here we nest up and down votes in a votes element.

SELECT TOP 2
	U.Id AS '@UserId',
	U.DisplayName,
	U.UpVotes AS 'Votes/UpVotes',
	U.DownVotes AS 'Votes/DownVotes'
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

And now we have this:

<Users>
  <User UserId="17">
    <DisplayName>Nick Berardi</DisplayName>
    <Votes>
      <UpVotes>884</UpVotes>
      <DownVotes>215</DownVotes>
    </Votes>
  </User>
  <User UserId="19">
    <DisplayName>Mads Kristiansen</DisplayName>
    <Votes>
      <UpVotes>36</UpVotes>
      <DownVotes>12</DownVotes>
    </Votes>
  </User>
</Users>

How Do I Add Repeating Sub Elements to XML With SQL Server

Things get a little messier when it comes to adding repeating elements but the approach is relatively simple. An XML element can contain element’s itself so all we need to do is push in a sub query that generates that XML. We’ve already seen how to do this above.
I’m going to get some details on badges for each user, limited and grouped in this case to keep the results presentable.

Note in the sub query we specify the TYPE directive. This tells the server to keep the results as XML, if we don’t do this then the server assumes you want the results formatted as text, escapes all the special characters and makes a big mess.

SELECT TOP 2
U.Id AS '@UserId',
U.DisplayName,
U.UpVotes AS 'Votes/UpVotes',
U.DownVotes AS 'Votes/DownVotes',
(	SELECT TOP 2
		B.Name, MIN(B.Date) AS FirstAwarded FROM dbo.Badges AS B
	WHERE B.UserId = U.Id
	GROUP BY B.Name
	FOR XML PATH ('Badge'), TYPE
) AS 'Badges'
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

And now we get this:

<Users>
  <User UserId="17">
    <DisplayName>Nick Berardi</DisplayName>
    <Votes>
      <UpVotes>884</UpVotes>
      <DownVotes>215</DownVotes>
    </Votes>
    <Badges>
      <Badge>
        <Name>.net</Name>
        <FirstAwarded>2009-02-10T05:11:54.607</FirstAwarded>
      </Badge>
      <Badge>
        <Name>asp.net</Name>
        <FirstAwarded>2009-04-15T03:45:30.720</FirstAwarded>
      </Badge>
    </Badges>
  </User>
  <User UserId="19">
    <DisplayName>Mads Kristiansen</DisplayName>
    <Votes>
      <UpVotes>36</UpVotes>
      <DownVotes>12</DownVotes>
    </Votes>
    <Badges>
      <Badge>
        <Name>Beta</Name>
        <FirstAwarded>2008-09-16T00:00:00</FirstAwarded>
      </Badge>
      <Badge>
        <Name>Citizen Patrol</Name>
        <FirstAwarded>2008-09-15T08:55:05.157</FirstAwarded>
      </Badge>
    </Badges>
  </User>
</Users>

I’m not a fan of sub queries directly in the SELECT statement of a query and possibly a little too keen on using APPLY so I’d probably re-write the above like the following.
The ‘*’ on the field name is a place holder for nothing so we don’t force in another element level. Equally we could exclude the Root of ‘users’ and use that as the field name and get the same result.

SELECT TOP 2
U.Id AS '@UserId',
U.DisplayName,
U.UpVotes AS 'Votes/UpVotes',
U.DownVotes AS 'Votes/DownVotes',
B.x AS '*'
FROM dbo.Users AS U
OUTER APPLY
(
	SELECT B.Name, MIN(B.Date) AS FirstAwarded FROM dbo.Badges AS B
	WHERE B.UserId = U.Id
	GROUP BY B.Name
	FOR XML PATH ('Badge'), ROOT('Badges'), TYPE
) B(x)
FOR XML PATH('User'), ROOT('Users')
;

And that should cover 99% of needs when it comes to building XML. You might need to take the sub query concept a little further, potentially needing sub queries in your sub queries but with the above you can put together pretty much anything.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form