Querying XML in SQL Server – A Cheat Sheet

Need to learn as much as you can about XML in SQL Server as quickly as possible or just looking for a helpful reminder of the methods? Here’s the place to start.

I’ve been giving a few lightening talks on the topic of querying XML in SQL Server recently and this has really forced me to squeeze my blog content down into as small a space as possible. Part of that process was gathering examples that cover the maximum number of concepts in the minimum number of examples.

To make sure all my examples actually worked I gathered them together in another SQL Notebook this was for my own benefit to start with but I quickly realised it also to gives others the chance to run or change the examples themselves, it’s missing the full detail of my other blog posts and the notebooks I copied them into but if all you need is a few quick examples of values, nodes, exist or query methods it’s a great thing to have to hand.

Here’s the link:

A Cheat Sheet For Querying XML In SQL Server

It’s viewable directly in GitHub but best used in Azure Data Studio, preferably connected to an instance of SQL Server so you can run the queries.

Have fun with it and do let me know if there are any key examples you feel are missing.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

T-SQL Tuesday 138: Managing Technology Changes

How Do You Respond When Technology Changes Under You? Grab it with both hands and cling on tight!

T-SQL Tuesday Logo

In this month’s T-SQL Tuesday Andy Leonard (b|t) asks How Do You Respond When Technology Changes Under You? .
My response to this would be that wherever possible embrace it. Despite inevitable teething troubles most software changes are for the better and while you probably can’t prevent change you can make sure it doesn’t leave you behind.

One aspect of T-SQL that everyone should embrace is window functions. They were first added in SQL Server 2005 with simple ranking functions and have built since then with features such as greater control over window size and shape and neat functions like lag, lead .

Despite being around for over a decade many haven’t really embraced window functions as far as they could and probably should.

Here’s an example of how a window function can make your code neater and faster. The example I’ll share is somewhat contrived to fit in the Adventure Works sample databases but the general pattern is one I’ve been fighting against for at least a decade.

Imagine that for some reasonthe Adventure Works Company wants to identify the most fertile customer that has purchased each product sold. Our analyst sits down and thinks “the most fertile customer has the most children” they therefore write a query to find the maximum number of children in the customer dimension of the data warehouse for each product. This doesn’t give us the customer just the number of children they have so we join the result back to the same table to get their name. Unfortunately we get multiple results because more than one person has that number of children.
Out analyst thinks again and decides that as a tie breaker we should pick the customer with the most recent date of birth (they had those children fastest) so picks the maximum date of birth from that list and again joins it to the customer table… only to find that there’s more than one with the same date of birth, finally we pick the lowest customer id and as there are no duplicates in this we get our answer.

The resulting query looks like this, actually I’m being kind here and using common table expressions to make the steps clear, more often than not when I see code like this it would be sub queries nested in sub queries and so on.

USE AdventureWorksDW2017;

WITH MaxChildren AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
GROUP BY S.ProductKey
	),
MaxBirth AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxChildren AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
GROUP BY S.ProductKey
), 
MinCustomer AS
(
SELECT S.ProductKey, MAX(C.TotalChildren) AS MaxChildren, MAX(C.BirthDate) as MaxBirth,MIN(C.CustomerKey) AS MinCustomer
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MaxBirth AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND M.MaxBirth = C.BirthDate
GROUP BY S.ProductKey
)
SELECT S.ProductKey, C.CustomerKey
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
INNER JOIN MinCustomer AS M
	ON S.ProductKey = M.ProductKey
	AND C.TotalChildren = M.MaxChildren
	AND C.BirthDate = M.MaxBirth
	AND C.CustomerKey = M.MinCustomer
ORDER BY ProductKey, CustomerKey;

Apart from the length of the code the big problem is the number of times we go though the involved tables. SQL Server isn’t going to do anything clever here, we code in the two tables four times and so those tables will get read four times.

Enter window functions and the most common of them all ROW_NUMBER(), we use PARTITION BY to give us a number sequence that resets per product and then we order our customers by all our criteria in one go. The result is ranked customers with the one coming first in our sequence being the choice we need. Finally we just filter those results to get those top picks.

USE AdventureWorksDW2017;

WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
						PARTITION BY S.ProductKey 
						ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
					) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
	ON S.CustomerKey = C.CustomerKey
	)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;

All this happens in 17 lines of code rather than the 43 above. The intention of the query is clearer as well.
As an added bonus the query runs faster, we go through the table only once, admittedly with a big sort of the data but the gains outweigh the losses by a long distance.

There are many more examples of where window functions make your queries better and faster, it’s something I’m hoping to cover once I’ve finished my blog series on querying XML until then here are some links to some good reading material.

Querying XML in SQL Server – Part 7 – Creating XML

A brief break from querying XML this week while I prepare some test data. This naturally turns our eyes to building XML.

Last time I promised to start looking at performance and XML indexes. I’ll get there but I realised I needed a decent test data set and couldn’t easily lay my hands on one that matched my needs so had to start building one myself. Part way through the process I figured I may as well squeeze an extra post out of the process and blog the creation process.

In my experience creating XML is a less common task than reading it with one semi-exception (that being the old trick of using FOR XML PATH(”) to concatenate strings) but it can be an important one. The biggest piece of XML I’ve built was for a government mandated return, it was the only XML I built on that job but a sizeable proportion of the entire department’s work culminated in it. The switch from a set of Excel macros which took over eight hours to run to something that could rebuilt in minutes had a huge impact on our efficiency, particularly when there would be multiple rounds of validation failures against initial attempts at submission.

There are a few variants when it comes to converting queries into XML but in my opinion the only one you really need to know is the XML PATH approach, the others being either fiddly or not providing enough control.

Because all the cool kids are using it and I had a copy available from his recent live steamed courses I’m going to use a copy of the Stack Overflow database that Brent Ozar maintains. You can get your own copy here. I’m using the 2013 version but if you want to follow along any variant should work here. I’ve used TOP to limit results returned so that they’re short enough to place in the Blog.

How Do I Build XML In SQL Server?

At the simplest level we just drop FOR XML PATH after a query, we give a value to the path which becomes the element each row is nested in and give a root value that wraps the whole output.

SELECT TOP 10
	U.Id AS 'UserId',
	U.DisplayName,
	U.UpVotes,
	U.DownVotes
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

Which gets something like this (this is going to be a post where the SQL notebook version and its embedded query results is going to be really helpful, you can find these here):

<Users>
  <User>
    <UserId>17</UserId>
    <DisplayName>Nick Berardi</DisplayName>
    <UpVotes>884</UpVotes>
    <DownVotes>215</DownVotes>
  </User>
  <User>
    <UserId>19</UserId>
    <DisplayName>Mads Kristiansen</DisplayName>
    <UpVotes>36</UpVotes>
    <DownVotes>12</DownVotes>
  </User>
</Users>

How do I Create Attributes In XML With SQL Server?

Adding Attributes is easy, we just give the relevant column a name commencing with an @ symbol.

SELECT TOP 2
	U.Id AS '@UserId',
	U.DisplayName,
	U.UpVotes,
	U.DownVotes
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

And now the XML looks like this:

<Users>
  <User UserId="17">
    <DisplayName>Nick Berardi</DisplayName>
    <UpVotes>884</UpVotes>
    <DownVotes>215</DownVotes>
  </User>
  <User UserId="19">
    <DisplayName>Mads Kristiansen</DisplayName>
    <UpVotes>36</UpVotes>
    <DownVotes>12</DownVotes>
  </User>
</Users>

How Do I Nest Elements In XML With SQL Server?

If we want to start grouping elements under parent tags then we just name the columns with the needed path. If two consecutive fields have a common element to the path they get grouped together. Here we nest up and down votes in a votes element.

SELECT TOP 2
	U.Id AS '@UserId',
	U.DisplayName,
	U.UpVotes AS 'Votes/UpVotes',
	U.DownVotes AS 'Votes/DownVotes'
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

And now we have this:

<Users>
  <User UserId="17">
    <DisplayName>Nick Berardi</DisplayName>
    <Votes>
      <UpVotes>884</UpVotes>
      <DownVotes>215</DownVotes>
    </Votes>
  </User>
  <User UserId="19">
    <DisplayName>Mads Kristiansen</DisplayName>
    <Votes>
      <UpVotes>36</UpVotes>
      <DownVotes>12</DownVotes>
    </Votes>
  </User>
</Users>

How Do I Add Repeating Sub Elements to XML With SQL Server

Things get a little messier when it comes to adding repeating elements but the approach is relatively simple. An XML element can contain element’s itself so all we need to do is push in a sub query that generates that XML. We’ve already seen how to do this above.
I’m going to get some details on badges for each user, limited and grouped in this case to keep the results presentable.

Note in the sub query we specify the TYPE directive. This tells the server to keep the results as XML, if we don’t do this then the server assumes you want the results formatted as text, escapes all the special characters and makes a big mess.

SELECT TOP 2
U.Id AS '@UserId',
U.DisplayName,
U.UpVotes AS 'Votes/UpVotes',
U.DownVotes AS 'Votes/DownVotes',
(	SELECT TOP 2
		B.Name, MIN(B.Date) AS FirstAwarded FROM dbo.Badges AS B
	WHERE B.UserId = U.Id
	GROUP BY B.Name
	FOR XML PATH ('Badge'), TYPE
) AS 'Badges'
FROM dbo.Users AS U
FOR XML PATH('User'), ROOT('Users');

And now we get this:

<Users>
  <User UserId="17">
    <DisplayName>Nick Berardi</DisplayName>
    <Votes>
      <UpVotes>884</UpVotes>
      <DownVotes>215</DownVotes>
    </Votes>
    <Badges>
      <Badge>
        <Name>.net</Name>
        <FirstAwarded>2009-02-10T05:11:54.607</FirstAwarded>
      </Badge>
      <Badge>
        <Name>asp.net</Name>
        <FirstAwarded>2009-04-15T03:45:30.720</FirstAwarded>
      </Badge>
    </Badges>
  </User>
  <User UserId="19">
    <DisplayName>Mads Kristiansen</DisplayName>
    <Votes>
      <UpVotes>36</UpVotes>
      <DownVotes>12</DownVotes>
    </Votes>
    <Badges>
      <Badge>
        <Name>Beta</Name>
        <FirstAwarded>2008-09-16T00:00:00</FirstAwarded>
      </Badge>
      <Badge>
        <Name>Citizen Patrol</Name>
        <FirstAwarded>2008-09-15T08:55:05.157</FirstAwarded>
      </Badge>
    </Badges>
  </User>
</Users>

I’m not a fan of sub queries directly in the SELECT statement of a query and possibly a little too keen on using APPLY so I’d probably re-write the above like the following.
The ‘*’ on the field name is a place holder for nothing so we don’t force in another element level. Equally we could exclude the Root of ‘users’ and use that as the field name and get the same result.

SELECT TOP 2
U.Id AS '@UserId',
U.DisplayName,
U.UpVotes AS 'Votes/UpVotes',
U.DownVotes AS 'Votes/DownVotes',
B.x AS '*'
FROM dbo.Users AS U
OUTER APPLY
(
	SELECT B.Name, MIN(B.Date) AS FirstAwarded FROM dbo.Badges AS B
	WHERE B.UserId = U.Id
	GROUP BY B.Name
	FOR XML PATH ('Badge'), ROOT('Badges'), TYPE
) B(x)
FOR XML PATH('User'), ROOT('Users')
;

And that should cover 99% of needs when it comes to building XML. You might need to take the sub query concept a little further, potentially needing sub queries in your sub queries but with the above you can put together pretty much anything.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML In SQL Server – Part 6 – Sequencing Elements and Finding Positions

XML Elements aren’t always neatly labelled with a unique identifier which makes joining different levels back together in the database tricky. In this post we look at a number of solutions to this issue.

This post looks at a problem that can come up in XML messages that have perhaps not been thought out as well as we’d like and where we can’t uniquely identify one instance of a repeating element. The example we’ve used so far avoids this problem so we’ll switch to a sample that illustrates this problem.

CREATE TABLE #MyXML (MyXML XML NOT NULL);
 
INSERT INTO #MyXML(MyXML)
VALUES
(
'
<Order name="XML User Group">
	<Pizza size="medium">
		<Topping>Mushrooms</Topping>
		<Topping>Ham</Topping>
		<Topping>Jalapenos</Topping>
	</Pizza>
	<Pizza size="medium">
		<Topping>Pineapple</Topping>
		<Topping>Onion</Topping>
		<Topping>Jalapenos</Topping>
	</Pizza>
	<Pizza size="large">
		<Topping>Pepperoni</Topping>
		<Topping>Extra Cheese</Topping>
	</Pizza>
</Order>
'
);
 
SELECT 
X.MyXML.value('Order[1]/@name', 'varchar(50)') AS Name,
Pizza.PizzaXML.value('./@size', 'varchar(50)') AS Size,
Topping.ToppingXML.value('.', 'varchar(50)') AS Topping
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('Order/Pizza') AS Pizza(PizzaXML)
CROSS APPLY Pizza.PizzaXML.nodes('Topping') AS Topping(ToppingXML);

The XML looks understandable but when we query it we get something that looks like this:

SizeTopping
mediumMushrooms
mediumHam
mediumJalapenos
mediumPineapple
mediumOnion
mediumJalapenos
largePepperoni
largeExtra Cheese

How many Pizzas do we have? Which toppings go on which pizza? Will the vegetarians of the XML User Group go hungry?
The problem here is we have two elements with the same value (medium) and when we convert to rows in the table we lose the structure that would allow us to identify which toppings belong to which.
I’d love to say the solution is to just ask the developers of the source system to change the XML, but the answer to that request in most cases will be a two letter word beginning with N so what can we do instead?

How do I Sequence XML Elements in SQL Server With a Numbers Table?

One common solution has already appeared back in part 2, if we have a list of numbers we can push them in to the query to pick out the pizzas one at a time.
The query looks something like this:

With MyNumbers AS
(
SELECT N
FROM (VALUES(1),(2),(3),(4),(5),(6),(7)) AS N(N)
)
SELECT
MyNumbers.N,
X.MyXML.value('(Order/Pizza[sql:column("MyNumbers.N")]/@size)[1]' ,'varchar(50)')  AS Size
,Topping.ToppingXML.value('.', 'varchar(50)') AS Topping
FROM  #MyXML AS X
CROSS JOIN MyNumbers
OUTER APPLY X.MyXML.nodes('Order/Pizza[sql:column("MyNumbers.N")]/Topping') AS Topping(ToppingXML)
ORDER BY N;

And we have our numbering. This is the solution that you’ll probably find if you hit Google or StackOverflow. It works, it looks neat and it’s nicely self contained. There are possible issues with it unfortunately. The first is that we’ve specified our numbers. What if we have more than seven pizzas? We loose the rest. The observant might also notice I’ve swapped to OUTER APPLY to illustrate another issue, SQL Server doesn’t know how many nodes are in the XML and so just keeps on trying with all our numbers giving us empty results when there’s no match.
My hands on experience suggests that while this approach works for small sized XML messages, if you have thousands of messages each with potentially hundreds of elements then performance nosedives fast particularly if you have a high variance in the numbers (100 every time is less of a problem than a random number between 1 and 1000).
There are ways to mitigate this (use the query method to count the nodes and then limit the numbers pushed in) but they get messy and one begins to question if the work put into the fix outweighs the benefits of it.

How do I Sequence XML Elements in SQL Server With a Window Function?

We could look to using window functions to sequence the XML elements like this:

WITH Pizzas AS
(
SELECT 
X.MyXML.value('Order[1]/@name', 'varchar(50)') AS Name,
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS PizzaSequence,
Pizza.PizzaXML.value('./@size', 'varchar(50)') AS Size,
Pizza.PizzaXML.query('.') AS PizzaXML
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('Order/Pizza') AS Pizza(PizzaXML)
)
SELECT 
Pizzas.PizzaSequence,
Pizzas.Size,
Topping.ToppingXML.value('.', 'varchar(50)') AS Topping,
PizzaXML
FROM Pizzas
OUTER APPLY Pizzas.PizzaXML.nodes('Pizza/Topping') AS Topping(ToppingXML);

We have to run in two parts because we aren’t allowed to partition on the XML data type and so we can’t drop the results of the first nodes expression in directly, even then what happens if we have two 100% identical pizzas, toppings and all? We need to sequence the pizzas and then use the nodes method to break out the toppings.
Unfortunately that’s enough to break the logical pointer we’ve used in the examples above, we have to effectively materialise each pizza as a separate piece of XML with the query method and then apply the nodes method to that. Here’s a copy of the query plan over at Paste the Plan I need to work out a larger data set to prove it (watch this space) but those estimated vs actual numbers make me nervous.
It also makes our code about twice as long without improving readability.

If you’ve been following along with this series then you’ll know that often there’s a solution inside XPath or XQuery that is either neater, faster or both because we can go into the XML, get a single answer and then come back out again without having to break the data into rows.

How do I Sequence XML Elements in SQL Server With XQuery?

Here’s my preferred solution to allow nodes in XML to be assigned a sequence number. It isn’t quite as elegant as I’d like as there are limits to XQuery’s position function and a few features missing from SQL Server’s implementation of XQuery (handling of siblings).
This approach makes use of the << comparison operator in XQuery which in essence means ‘before’, we assign our current element to a variable, move up to the parent level and then count the elements before the current one.
This leads us to a query that looks like this:

SELECT 
X.MyXML.value('Order[1]/@name', 'varchar(50)') AS Name,
Pizza.PizzaXML.value('let $i := . return count(../Pizza[. << $i])', 'int') + 1  AS PizzaSequence,
Pizza.PizzaXML.value('./@size', 'varchar(50)') AS Size,
Topping.ToppingXML.value('.', 'varchar(50)') AS Topping,
Topping.ToppingXML.value('let $i := . return count(../Topping[. << $i])', 'int') + 1  AS ToppingSequence,
Topping.ToppingXML.value('let $i := . return count(../../Pizza/Topping[. << $i])', 'int') + 1  AS ToppingSequenceAbsolute,
Topping.ToppingXML.value('let $i := .. return count(../../Pizza[. << $i])', 'int') + 1  AS ToppingPizzaSequence
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('Order/Pizza') AS Pizza(PizzaXML)
CROSS APPLY Pizza.PizzaXML.nodes('Topping') AS Topping(ToppingXML);

This gives us what we need in a single line of code and while not perfectly performant I’ve used this approach over tens of thousands of XML messages containing millions of elements in total with better results than the above two approaches.
The last three lines of the above query give variants on this technique to give us a sequence to the toppings within each pizza, an absolute sequence to the toppings across all pizzas and finally our pizza sequence but calculated starting from the topping level nodes rather than the pizza (if we wanted this would allow us to drop to one cross apply going directly down to toppings).

Next I’m going to turn my eye to performance by putting together a larger example data set and taking a look at the impact of XML Indexes.

See this post in SQL Notebook form.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML In SQL Server – Part 5 – An Introduction to the Query Method and XQuery

Did you know there’s a whole extra query language hidden inside SQL Server. Time to see XQuery in action with our XML.

Last time round we looked at exists, this time round it’s the query method. Similar to exists there’s rarely anything that you can do that you can’t also handle in SQL but there’s a lot of cases where it makes things either easier, neater or faster.
There is probably as much if not more to say about the query method as there the preceding methods put together so this post will be more of a guide on how to use it and some inspiration rather than comprehensive coverage. I’ll add more posts with examples as and when I’m inspired.

The query method does what you’d expect, it allows you to run a query against a piece of XML. Documentation for the method is here, it takes a string as the input and behind that string is XQuery an entire* new language hidden away in your SQL Server instance. Documentation for the XQuery language as implemented in SQL Server is here. It’s worth a browse, particularly the coded examples which often go a long way to providing models you can adapt to your needs.
You can think of XQuery as a superset of the XPath we’ve been using so far to navigate the levels of our XML, it does that and then a whole lot more on top to manipulate and apply functions to those paths. I’m not intending this to be a full tutorial on XQuery, that’s too big a job for a single post. Consider this more a recipe book and inspiration. I’m also going to focus on getting data out of XML. You can use XQuery to build XML as well but I’m not sure there are a lot of use cases for that inside SQL Server (do share in the comments if you have some).
It returns the XML data type, sometimes that’s actual XML other times it’s just text or a number that we’ll want to convert appropriately.

Let’s start as ever with our sample XML.

CREATE TABLE #MyXML (MyXML XML NOT NULL);
 
INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);
 
SELECT * FROM #MyXML;

We encountered the simplest form of XQuery earlier when looking at nodes.
The below gives a neat way to view the XML in the current context and up at the parent level. I use this often as a help during development. If the results are going in to a view I’ll probably leave them there as the test team can find them helpful too but it’s not something I’d want filling up disk space.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('.','varchar(max)') AS ResultLevelXMLValue,
cast(R.ResultXML.query('.') as varchar(max)) AS ResultLevelXMLCast,
R.ResultXML.query('..') AS PollLevelXML
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML);

Note how in this case the value method gives us nothing but we can cast the query result into text.

Most other uses for the query method fall into similar categories as the exists method these are things you can do in SQL but that would require you to drop down to a level in the structure below the one you want in your final output. Let’s look at a few examples.

How Do I Aggregate XML Values In SQL Server?

We can use Xquery to sum values without the need to break them down at row level and group them up again. This can give us a result in the select statement or could be used in a WHERE clause. The format is simple and looks just like a SQL aggregate with a path inside.
Note the data types, even though maximum and minimum are only handling integers I get a type conversion error if I ask for an integer result.

SELECT
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers,
R.ResultXML.query('sum(./result/@numvotes)') AS SumVotesQuery,
R.ResultXML.value('sum(./result/@numvotes)','int') AS SumVotesValue,
R.ResultXML.value('max(./result/@numvotes)','float') AS MaxVotesValue,
R.ResultXML.value('min(./result/@numvotes)','float') AS MinVotesValue,
R.ResultXML.value('count(./result/@numvotes)','int') AS CountVotesValue,
R.ResultXML.value('avg(./result/@numvotes)','numeric(10,2)') AS SumVotesValue,
R.ResultXML.query('.') AS ResultsXML
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML)
WHERE
R.ResultXML.value('sum(./result/@numvotes)','int') 	> 50;

If we really want that integer then we need to explicitly convert the @numvotes attribute to an integer.

SELECT
--R.ResultXML.value('min(xs:integer(./result/@numvotes))','int') AS MinVotesValueBroken,
R.ResultXML.value('min(for $v in ./result/@numvotes return xs:integer($v))','int') AS MinVotesValue,
R.ResultXML.query('.') AS ResultsXML
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML);

That’s not simple, the commented out version looks like a viable approach but that’s not asking to convert the values into an integer it’s trying to convert all the elements as one.
Instead we need to iterate through each item converting it. It’s unlikely to be worth the effort in most cases but there’s a small risk of floating point errors if you leave things to the defaults which use the double data type.
The above is a nice introduction to looping through data in XQuery which follows a fairly common pattern in programming languages:

for $MyOneThing in AllTheThings return DoSomethingOneAtATimeTo($MyOneThing)

This sets us up for the next example.

How Do I Concatenate XML Values Into a Single String?

The ability to loop through sub nodes gives us a route to a concatenated string because sometimes (e.g. for report outputs) that’s all we need. Here’s an example concatenating all the alternative names for our example game into one.

SELECT
X.MyXML.query(
	'for $name in 
		/items/item/name 
	return 
		concat($name/@type,'': '',$name/@value, '' aka '')'
		).value('.','varchar(max)') AS AllTheNames
FROM #MyXML AS X;

The value method doesn’t work directly here but we can stack methods, query gives us the result as XML type and then we apply the value method to get us to text. We could also do this in SQL and might want to as we probably want to remove that stray ‘aka’ at the end. This next example shows STUFF () being used to solve the extra aka issue and also the text conversion running via CAST().

SELECT
STUFF(
	CAST(
		X.MyXML.query(
		'for $name in 
			/items/item/name 
		return 
			concat( '' aka '', $name/@type,'': '',$name/@value)'
			) 
	AS varchar(max))
,1,5,'') AS 	
		AllTheNames
FROM #MyXML AS X;

How to Do Almost Anything Else In SQL Server With XQuery?

The catch all solution applies here… Google it (other search engines are available). You’ll probably find someone has asked how to do the same thing on Stack Overflow (or tech forum of choice) and dropping that solution into the query method has a good chance of solving your problem or getting you most of the way there.
Other good resources are the aforementioned examples in Microsoft’s XQuery documentation and the W3 School’s tutorials both of which I’ve bounced in and out of while writing these blogs and the worktime activities that inspired them.

Observant readers may have noticed there was a somewhat ominous footnote promised earlier on, here it is.

*Why Won’t My Entirely Valid XQuery Work in SQL Server?

Long term users of SQL Server may be wholly unsurprised to discover that XQuery falls into the rather large category of “cool sounding features in SQL Server that haven’t been developed further since they were sufficiently functional to tick all the right boxes on a list of features but no more” (see elsewhere for temporal tables, graph queries and many others). To quote Microsoft on the subject:

This XQuery implementation is aligned with the July 2004 Working Draft of XQuery

Some Person At Microsoft

That’s a draft of version 1.0 and we’re now up to version 3.1, don’t be surprised if features are missing.

One set of features I’ve missed personally are the more advanced abilities to move around nodes in the XML specifying preceding or following siblings in a group to work out relative or absolute positions. Expect to see some alternative solutions to that in a future post.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Using SQL Notebooks For Knowledge Sharing

Jupyter notebooks aren’t just there for the script commands you struggle to remember (although they’re pretty good for that too).

T-SQL Tuesday Logo

For this month’s T-SQL Tuesday #137 Steve Jones (b|t) asked to share how we have used, or would like to use, Jupyter notebooks.

I don’t really think about Jupyter Notebooks often, I’ve seen some nice demos of use cases for us SQL Server people (See this session by Rob Sewell for a few examples) but never given them much thought beyond that.

Which possibly shows how unglamorous they can be given that I use one virtually every day at work.
Just the one though, it’s a PowerShell notebook and as PowerShell can be used to run git commands I use it to hold the scripts that either take me too long to type out (like refreshing all the branches for our shared UAT and live environments in one go) or that I forget the syntax for because I don’t use it often enough and there’s no friendly button in Visual Studio to save me.

I thought I’d focus on something a little more interesting for this blog instead and so went back to one of the use cases that interested me when I first encountered SQL Notebooks. Using them to combine both text and code into a nice neat demo of a T-SQL feature. When I first investigated this most of my ideas fell flat because each code block was considered a different session and so temporary tables couldn’t be carried between them, this prevented you building up a decent example data set and then using it in subsequent steps. The good news is that that issue has now been fixed and so I thought I’d try converting my series of blog posts on querying XML in SQL Server into a set of notebooks.

The process was pleasantly simple, it’s just a matter of copying and pasting over and most of the formatting for headings, bullet points and links carried over without issue. A few times I had to delve into the markdown code to strip out a rogue change of font but each post took me 5-10 minutes to adapt.

You can access the results here they’re even viewable online at GitHub although the formatting of results tables isn’t quite as neat as in Azure Data Studio and there’s no way to run the sample code.

There’s a lot to be said for the approach, the long bits of code that set up the sample data can be hidden and results can be optionally saved with the notebook. As long as you have access to SQL Server you can click to run the sample code or start to experiment by changing it.

There are a few down sides. The most obvious is that while temp tables work between blocks of code, intellisense isn’t smart enough to know this and so throws up a lot of red wiggly lines. I’d also love to be able to collapse results sets without wiping them completely in the same way that you can do for code. Sometimes I wanted the results to be there but they were too long winded to want them to be filling up the page by default.
Limits come from Azure Data Studio as well, it’s not an issue for these posts but the lack of query plans will likely send me running back to good old SSMS at some point in the future.

Overall I like the result. I have something that I can easily point people to that allows them to run my example code but it also allows for more control over the surrounding text than is possible in a classic SQL script. Given that it’s relatively easy to convert a blog post into a notebook I’m going to keep at it for now and where appropriate attach a notebook version to each blog post.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML In SQL Server – Part 4 – Filtering With The Exist Method

In part four of our series on querying XML in SQL server we look at the exist method and how it can help to filter results or flag for missing elements.

Back in Part 3 of this series we covered what I consider to be the essentials of querying XML. Between the values and nodes methods you can break up XML into rows and extract the values into columns and from there you can do anything you could with data lifted from a standard table in the database.

However there’s a lot that you can do that might perform better or look neater if a bit more of the query work was handled inside the XML and a bit less on the SQL side of the equation.

You can see what Microsoft has to say about the exist method here but you can probably guess a lot from the name and what we’ve seen of the values and nodes methods so far. Exist runs a query and returns a bit field, giving a 1 if the query produces a result and 0 if the result is empty.

Let’s start out as we always do with our sample XML.

CREATE TABLE #MyXML (MyXML XML NOT NULL);

INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);

SELECT * FROM #MyXML;

Imagine we want to find all the results elements that had a result with more than 15 votes. We might write it like this.

SELECT
R2.ResultsXML.value('../@numplayers','varchar(50)') AS NumPlayers
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results/result') AS R2(ResultsXML)
WHERE R2.ResultsXML.value('./@numvotes','int') > 15;

But if you run that you’ll notice we get an extra row because Numplayers =4 has two different results that meet this condition. We could of course sweep this issue under the carpet with DISTINCT but that’s not going to perform well with larger data sets. We might also be able to produce something more complicated using EXISTS like this.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML)
WHERE EXISTS (
SELECT 1
FROM R.ResultXML.nodes('./result') AS R2(ResultsXML)
WHERE R2.ResultsXML.value('./@numvotes','int') > 15
);

That’s a lot more code to type out, it’s harder to read and a quick peek at the query plans for the above two makes me think that second query is going to get ugly if you threw more data at it.

How can I use the exist method to filter results from XML in SQL Server?

We can use exist to test a piece of xml to see if a matching element exists inside it, any time a result is returned we get a 1 and if nothing is found we get a 0. This can then be used to test within the WHERE clause of a TSQL statement like the one below.

SELECT
R2.ResultsXML.value('./@numplayers','varchar(50)') AS NumPlayers
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R2(ResultsXML)
WHERE ResultsXML.exist('./result[@numvotes > 15]') = 1;

This looks a lot neater to me both in code view and as a query plan. Hopefully I’ll get the chance to do some performance tests in a later post in this series once I’ve built a suitable data set to test against. Anecdotally I can confirm I’ve had no issues using this approach in a work environment.

The exist method doesn’t have to be used in the where clause we can use it to give us useful columns in our results too.

How can I use the exist method to test for existence of XML elements in SQL Server?

The exist method can be added to a column definition to show if a particular element or value exists within the XML without filtering it out. Consider this example that lists all results and gives us a flag to show if any of the result level elements had 30 or more votes.

SELECT
R2.ResultsXML.value('./@numplayers','varchar(50)') AS NumPlayers,
ResultsXML.exist('./result[@numvotes > 30]') AS HasThirtyVotes
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R2(ResultsXML);

This approach can be really useful if there are optional elements to a piece of XML. I’ve used this where the XML represents a process performed in distinct steps and there’s a need to see which of those steps of the process have been started without needing any more detail. Exist allows you to quickly dip in and see what’s there and what isn’t without going any further than that.

Why Am I Getting Errors When Using The Node and Exist Methods Together?

A word of warning, while testing the above code I received syntax errors in SSMS like the one in the image below.
The good news is this appears to be an issue with SSMS rather than the code, it runs just fine but it is a little off putting to be told there’s a problem.

That’s all for this post, next time round we’re going to delve deeper into what more we can do inside XML with the query method.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML In SQL Server – Part 3 – Handling Repeating Regions With the Nodes Method

In part 3 of our XML series we meet the nodes method and see how it can be used in a query to split data over multiple rows.

Last time we looked at how to extract a specific single value from XML in a variety of ways. We’ll build on that now to look at how to handle XML that has repeating regions and we want to obtain all the values rather than just one of them.

Lets start out again by loading our sample data into a temporary table.

CREATE TABLE #MyXML (MyXML XML NOT NULL);
 
INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);
 
SELECT * FROM #MyXML;

We dipped in to the poll section last time round to experiment with XMLPath queries but this time further we want the whole lot.

How Do I Use the Nodes Method to Shred Repeating Regions in XML?

The nodes method is used in conjunction with CROSS APPLY to create repeating rows each holding a copy of the parent XML for each node within the path specified. To quote the documentation exactly:

The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In these logical copies, the context node of every row instance is set to one of the nodes that is identified with the query expression. This way, later queries can navigate relative to these context nodes.

Here’s our first example.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers,
R.ResultXML.value('(./result[@value = "Best"]/@numvotes)[1]','int') AS BestVotes,
R.ResultXML.value('(./result[@value = "Recommended"]/@numvotes)[1]','int') AS RecommendedVotes,
R.ResultXML.value('(./result[@value = "Not Recommended"]/@numvotes)[1]','int') AS NotRecommendedVotes
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML);

Nodes takes us down to the results level, creates a row per result element and as we have six results elements we get six rows.
The first column in the data set uses the query method to obtain ‘.’ which simply means the current element. This method will have a whole post to itself later in the series but I’m dropping this in now as it’s a really helpful approach when debugging your code. Just click on the XML to see what you’re currently working with. Note how we just get a single snippet of XML holding one results element and its associated children.
The remaining columns should be familiar if you’ve been following the series so far. The value method pulls the number of players from the attribute, because each of the three result elements have a unique attribute value and we know what they are we can query for that to add each element to a column.

This works well if you have well defined children in your XML but what if we want to do the same again for the children of each node?

How Do I Extract Repeating Sub Elements of XML With The Nodes Method?

There are a few ways of approaching this. The first is to build on our work above and use the nodes method a second time on the results of the first. That approach looks as follows.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers,
R2.ResultsXML.value('./@value','varchar(50)') AS PollValue,
R2.ResultsXML.value('./@numvotes','int') AS NumberOfVotes
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML)
CROSS APPLY R.ResultXML.nodes('./result') AS R2(ResultsXML);

R from the first query gives us XML and we repeat the process to give us R2 which repeats for each repetition of the full XML document. Six rows from the first CROSS APPLY each hold three result values so we end up with 18 rows in total. We pick some values from each level to give us the full picture on each row.

We can instead go directly to the bottom level rather than using CROSS APPLY twice but how do we then get those values (NumPlayers in our case)?
This is where the detail of what the nodes method does becomes important, it’s a common misconception (one I held anyway) that nodes cuts up the XML, splits it between rows and throws away everything else but that’s not accurate. Look back up to the official definition and note the word “logical” and the final sentence “This way, later queries can navigate relative to these context nodes“.
Nodes is less like a pair of scissors and more like an open all sub folders option in a file explorer. We get one new window for each sub folder but any files or folders above that aren’t gone and we can always navigate back up to them.

With that in mind and an awareness that we can use ‘..’ within XPath to move up to the parent of an object we can re-write to give this.

SELECT
R2.ResultsXML.query('.') AS ResultLevelXML,
R2.ResultsXML.query('..') AS ResultsLevelXML,
R2.ResultsXML.value('../@numplayers','varchar(50)') AS NumPlayers,
R2.ResultsXML.value('./@value','varchar(50)') AS PollValue,
R2.ResultsXML.value('./@numvotes','int') AS NumberOfVotes,
R2.ResultsXML.value('../../@title','varchar(50)') AS Poll
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results/result') AS R2(ResultsXML);

Results in the same eighteen rows. To demonstrate the concept a step further we go up two levels to obtain the poll title as well.

Note: I’ve seen suggestions that the second approach can lead to performance issues when you make heavy use of navigating back up the XML structure. I’ve never seen this myself but it’s worth doing some tests to decide which approach you prefer.

That covers the nodes method. Next time round we’ll take a look at the exists method and how it can be used to filter results or check inside XML without having to fully shred it.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML In SQL Server – Part 2 – Simple Paths and Attributes With the Value Method

In part 2 of our series we’ll start to learn how to pull values out of XML with an SQL query and how we can use the XPath language to navigate a tree based on position or values within the document.

Last week we met some XML and got a feel for its structure, this week we’ll run some simple queries using XPath the XML Path Language. Anyone who has ever used a file system should feel at home here because XPath simply allows you to navigate the tree structure of a piece of XML in the same way you would the files and folders on a disk.

First off I’m going to create a temporary table and insert the XML into it. We could use a simple variable but a table makes it a little easier to run examples a piece at a time.

CREATE TABLE #MyXML (MyXML XML NOT NULL);

INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);

SELECT * FROM #MyXML;

Run the above and we have our table. Querying the table gives us our single row, in SSMS you can even click on the result and have it pop out in a new window in its full glory. Out in the real world you may well have a table with hundreds of rows each with a slightly different piece of XML, all of the below continues to apply. Why not try yourself by adding in the follow up game Rhino Hero Super Battle.

How Do I Query XML Attributes and Elements in SQL Server?

We can start to pull the XML apart or ‘shred’ it as it’s commonly described using the value method and XPath. Here is a query with some examples.

SELECT
--I can get the value inside an element
X.MyXML.value('(/items/item/description)[1]','varchar(500)') AS GameDescription,
--I can get an attribute on a tag
X.MyXML.value('(/items/item/yearpublished/@value)[1]','int') AS YearPublished,
--I can go straight to the first instance of an element
X.MyXML.value('(/items/item/poll/results/result/@numvotes)[1]','int') AS Poll1,
--Or I can pick a path
X.MyXML.value('(/items/item/poll/results[1]/result[2]/@numvotes)[1]','int') AS Poll2,
--three different routes to the same value
X.MyXML.value('(/items/item/poll/results[2]/result[3]/@numvotes)[1]','int') AS Poll3a,
X.MyXML.value('(/items/item/poll/results/result/@numvotes)[6]','int') AS Poll3b,
X.MyXML.value('(//@numvotes)[6]','int') AS Poll3c
FROM #MyXML AS X;

We can see a few variants above depending on whether we’re obtaining an element or an attribute. We also need to specify the data type to convert to, it’s your call as to if you feel confident enough in your inputs to go straight to the more fragile data types or if you want to grab everything as text and then convert when you have the luxury of something like TRY_CAST to clear up issues. If you’re handling dates do yourself a favour and use datetime2, sooner or later something will throw a 0001-01-01 style date your way and datetime won’t be happy.

The other key feature is the numbers in square brackets. Because elements can repeat we need to know which repetition to pick. Even if there’s only one we have to drop in a [1] because we can’t be certain there’s only one occurrence (the blessing and curse of XML being that it’s so much less rigid than a set of tables in a database).
When we specify that path we can either be very specific or more general. See the Poll3 columns.

  • Poll3a gives us a precise path down to our value
  • Poll3b gets us down to the right section of the XML and then grabs the 6th
  • Poll3c goes a step further and just picks the 6th occurrence of the attribute. That gives us the same result in this case but if we hade the same attribute inside different elements all of them would be considered.

We’re not just limited to specifying numerical positions, the next query gives some more advanced approaches. We have an example of picking the last possible element, how to pick based on attributes of parent elements and how to pick a parent based on values in the child elements.

SELECT
--Find the last value
X.MyXML.value('(//@numvotes)[last()]','int') AS LastPoll,
--Search based on parent attributes
X.MyXML.value('(/items/item/poll/results[@numplayers = 2]/result[@value = "Not Recommended"]/@numvotes)[1]','int')  AS SearchPoll,
--search based on child attributes
X.MyXML.value('(/items/item/poll/results[result/@numvotes = 24]/@numplayers)[1]','varchar(10)')  AS ChildPollPoll
FROM #MyXML AS X;

We may not want to be so static in our choices and fortunately we have options for that too.

How Can I Use an SQL Server Variable to Obtain an XML Attribute?

We can use sql:variable to take SQL variables and use them inside our path.

DECLARE @MyNumPlayers int = 2, @MyValue varchar(50) = 'Not Recommended';

SELECT
X.MyXML.value(
	'(/items/item/poll/
		results[@numplayers = sql:variable("@MyNumPlayers")]/
		result[@value = sql:variable("@MyValue")]/@numvotes)[1]'
	,'int')  AS SearchPoll
FROM #MyXML AS X;

How Can I Use an SQL Server Column to Obtain an XML Attribute?

We can a use sql:column to push values from a table and into the path of the values method. This time we query the XML six times, getting a different result each time.

With MyNumbers AS
(
SELECT N
FROM (VALUES(1),(2),(3),(4),(5),(6),(7)) AS N(N)
)
SELECT 
MyNumbers.N,
X.MyXML.value(
	'(/items/item/poll/
	results[sql:column("MyNumbers.N")]/
	result[@value = "Best"]/
	@numvotes)[1]'
	,'int'
		)  AS BestPollVotes
FROM  #MyXML AS X
CROSS JOIN MyNumbers
ORDER BY N;

The row for N=7 gives a NULL because there is no 7th Element, SQL Server doesn’t know this though and will take a look for you anyway.

Don’t go wild with the above. If you want to split repeating XML elements into rows of a table there’s a better way. Tune in next week when we’ll cover the nodes method.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML in SQL Server – Part 1 – Introduction

Time to meet our new best friend XML and some of the things we can do with it.

Sooner or later we all encounter XML, much like the urban legend that you’re never more than 6ft away from a rat in London the same is true of data and XML.
It’s understandable, XML is easy to read, easy to put together and unconstrained by the need to mess around with difficult tasks such as defining your data structure up front. JSON may be XMLs cooler younger brother but in my day to day life as a BI developer it’s XML that tends to be sent my way to make sense of and it’s XML I’m asked to pull together for things like the stacks of data returns I fed back to the government in my Health and Social Care days.

I’ve been working with it a lot again lately to either receive messages from company systems or to share customer summaries from the data warehouse back to those systems. As I can never quite remember the correct syntax to use and I’ve learnt a few neat tricks over the last few months I thought I’d get this all in writing. At the very least it puts it all in one place for myself and it may well help someone else.

Today I’m going to keep things light (this is only my third post after all) and give a brief tour of a piece of XML as a way of introducing the key terms.

Below is a copy of the XML api output from BoardGameGeek for family favourite game Rhino Hero. I’ve edited it down a little for brevity, you can see the full output here.

<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
		Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
	  </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>

One of the benefits of XML is that in most cases you can read it by eye, I won’t go as far as describing it as self documenting but you can get a fair idea of how and what any given piece of XML is doing.

So what is XML made up of?

Firstly we have tags like <description> these are like the bones of XML. They either come in pairs like this:

<description> </description>

Or as a single item like this:

<description/>

A pair of tags and everything in between is an element, elements can contain further elements and those can contain further elements. Note this part of the XML above.

         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>

We have multiple <result> tags inside a <results> tag but if you look back up you’ll also note there are multiple <results> nested inside a <poll> tag.

Finally we have attibutes, these are extra pieces of information we can attach to give us further properties of a tag, sometimes this is all the tag needs:

<name type=”alternate” sortindex=”1″ value=”Super Rhino!”/>

Other times we’ll have both attributes and further information or tags inside the element (see the <results> tag example above).

That’s all for our whirlwind tour of XML, next we’ll start looking at how we use T-SQL to pull information out of a piece of XML and into something that will look a little more normal to the average SQL Server user.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form