Querying XML In SQL Server – Part 6 – Sequencing Elements and Finding Positions

XML Elements aren’t always neatly labelled with a unique identifier which makes joining different levels back together in the database tricky. In this post we look at a number of solutions to this issue.

This post looks at a problem that can come up in XML messages that have perhaps not been thought out as well as we’d like and where we can’t uniquely identify one instance of a repeating element. The example we’ve used so far avoids this problem so we’ll switch to a sample that illustrates this problem.

CREATE TABLE #MyXML (MyXML XML NOT NULL);
 
INSERT INTO #MyXML(MyXML)
VALUES
(
'
<Order name="XML User Group">
	<Pizza size="medium">
		<Topping>Mushrooms</Topping>
		<Topping>Ham</Topping>
		<Topping>Jalapenos</Topping>
	</Pizza>
	<Pizza size="medium">
		<Topping>Pineapple</Topping>
		<Topping>Onion</Topping>
		<Topping>Jalapenos</Topping>
	</Pizza>
	<Pizza size="large">
		<Topping>Pepperoni</Topping>
		<Topping>Extra Cheese</Topping>
	</Pizza>
</Order>
'
);
 
SELECT 
X.MyXML.value('Order[1]/@name', 'varchar(50)') AS Name,
Pizza.PizzaXML.value('./@size', 'varchar(50)') AS Size,
Topping.ToppingXML.value('.', 'varchar(50)') AS Topping
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('Order/Pizza') AS Pizza(PizzaXML)
CROSS APPLY Pizza.PizzaXML.nodes('Topping') AS Topping(ToppingXML);

The XML looks understandable but when we query it we get something that looks like this:

SizeTopping
mediumMushrooms
mediumHam
mediumJalapenos
mediumPineapple
mediumOnion
mediumJalapenos
largePepperoni
largeExtra Cheese

How many Pizzas do we have? Which toppings go on which pizza? Will the vegetarians of the XML User Group go hungry?
The problem here is we have two elements with the same value (medium) and when we convert to rows in the table we lose the structure that would allow us to identify which toppings belong to which.
I’d love to say the solution is to just ask the developers of the source system to change the XML, but the answer to that request in most cases will be a two letter word beginning with N so what can we do instead?

How do I Sequence XML Elements in SQL Server With a Numbers Table?

One common solution has already appeared back in part 2, if we have a list of numbers we can push them in to the query to pick out the pizzas one at a time.
The query looks something like this:

With MyNumbers AS
(
SELECT N
FROM (VALUES(1),(2),(3),(4),(5),(6),(7)) AS N(N)
)
SELECT
MyNumbers.N,
X.MyXML.value('(Order/Pizza[sql:column("MyNumbers.N")]/@size)[1]' ,'varchar(50)')  AS Size
,Topping.ToppingXML.value('.', 'varchar(50)') AS Topping
FROM  #MyXML AS X
CROSS JOIN MyNumbers
OUTER APPLY X.MyXML.nodes('Order/Pizza[sql:column("MyNumbers.N")]/Topping') AS Topping(ToppingXML)
ORDER BY N;

And we have our numbering. This is the solution that you’ll probably find if you hit Google or StackOverflow. It works, it looks neat and it’s nicely self contained. There are possible issues with it unfortunately. The first is that we’ve specified our numbers. What if we have more than seven pizzas? We loose the rest. The observant might also notice I’ve swapped to OUTER APPLY to illustrate another issue, SQL Server doesn’t know how many nodes are in the XML and so just keeps on trying with all our numbers giving us empty results when there’s no match.
My hands on experience suggests that while this approach works for small sized XML messages, if you have thousands of messages each with potentially hundreds of elements then performance nosedives fast particularly if you have a high variance in the numbers (100 every time is less of a problem than a random number between 1 and 1000).
There are ways to mitigate this (use the query method to count the nodes and then limit the numbers pushed in) but they get messy and one begins to question if the work put into the fix outweighs the benefits of it.

How do I Sequence XML Elements in SQL Server With a Window Function?

We could look to using window functions to sequence the XML elements like this:

WITH Pizzas AS
(
SELECT 
X.MyXML.value('Order[1]/@name', 'varchar(50)') AS Name,
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS PizzaSequence,
Pizza.PizzaXML.value('./@size', 'varchar(50)') AS Size,
Pizza.PizzaXML.query('.') AS PizzaXML
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('Order/Pizza') AS Pizza(PizzaXML)
)
SELECT 
Pizzas.PizzaSequence,
Pizzas.Size,
Topping.ToppingXML.value('.', 'varchar(50)') AS Topping,
PizzaXML
FROM Pizzas
OUTER APPLY Pizzas.PizzaXML.nodes('Pizza/Topping') AS Topping(ToppingXML);

We have to run in two parts because we aren’t allowed to partition on the XML data type and so we can’t drop the results of the first nodes expression in directly, even then what happens if we have two 100% identical pizzas, toppings and all? We need to sequence the pizzas and then use the nodes method to break out the toppings.
Unfortunately that’s enough to break the logical pointer we’ve used in the examples above, we have to effectively materialise each pizza as a separate piece of XML with the query method and then apply the nodes method to that. Here’s a copy of the query plan over at Paste the Plan I need to work out a larger data set to prove it (watch this space) but those estimated vs actual numbers make me nervous.
It also makes our code about twice as long without improving readability.

If you’ve been following along with this series then you’ll know that often there’s a solution inside XPath or XQuery that is either neater, faster or both because we can go into the XML, get a single answer and then come back out again without having to break the data into rows.

How do I Sequence XML Elements in SQL Server With XQuery?

Here’s my preferred solution to allow nodes in XML to be assigned a sequence number. It isn’t quite as elegant as I’d like as there are limits to XQuery’s position function and a few features missing from SQL Server’s implementation of XQuery (handling of siblings).
This approach makes use of the << comparison operator in XQuery which in essence means ‘before’, we assign our current element to a variable, move up to the parent level and then count the elements before the current one.
This leads us to a query that looks like this:

SELECT 
X.MyXML.value('Order[1]/@name', 'varchar(50)') AS Name,
Pizza.PizzaXML.value('let $i := . return count(../Pizza[. << $i])', 'int') + 1  AS PizzaSequence,
Pizza.PizzaXML.value('./@size', 'varchar(50)') AS Size,
Topping.ToppingXML.value('.', 'varchar(50)') AS Topping,
Topping.ToppingXML.value('let $i := . return count(../Topping[. << $i])', 'int') + 1  AS ToppingSequence,
Topping.ToppingXML.value('let $i := . return count(../../Pizza/Topping[. << $i])', 'int') + 1  AS ToppingSequenceAbsolute,
Topping.ToppingXML.value('let $i := .. return count(../../Pizza[. << $i])', 'int') + 1  AS ToppingPizzaSequence
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('Order/Pizza') AS Pizza(PizzaXML)
CROSS APPLY Pizza.PizzaXML.nodes('Topping') AS Topping(ToppingXML);

This gives us what we need in a single line of code and while not perfectly performant I’ve used this approach over tens of thousands of XML messages containing millions of elements in total with better results than the above two approaches.
The last three lines of the above query give variants on this technique to give us a sequence to the toppings within each pizza, an absolute sequence to the toppings across all pizzas and finally our pizza sequence but calculated starting from the topping level nodes rather than the pizza (if we wanted this would allow us to drop to one cross apply going directly down to toppings).

Next I’m going to turn my eye to performance by putting together a larger example data set and taking a look at the impact of XML Indexes.

See this post in SQL Notebook form.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

One thought on “Querying XML In SQL Server – Part 6 – Sequencing Elements and Finding Positions”

Leave a comment