Querying XML In SQL Server – Part 3 – Handling Repeating Regions With the Nodes Method

In part 3 of our XML series we meet the nodes method and see how it can be used in a query to split data over multiple rows.

Last time we looked at how to extract a specific single value from XML in a variety of ways. We’ll build on that now to look at how to handle XML that has repeating regions and we want to obtain all the values rather than just one of them.

Lets start out again by loading our sample data into a temporary table.

CREATE TABLE #MyXML (MyXML XML NOT NULL);
 
INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);
 
SELECT * FROM #MyXML;

We dipped in to the poll section last time round to experiment with XMLPath queries but this time further we want the whole lot.

How Do I Use the Nodes Method to Shred Repeating Regions in XML?

The nodes method is used in conjunction with CROSS APPLY to create repeating rows each holding a copy of the parent XML for each node within the path specified. To quote the documentation exactly:

The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In these logical copies, the context node of every row instance is set to one of the nodes that is identified with the query expression. This way, later queries can navigate relative to these context nodes.

Here’s our first example.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers,
R.ResultXML.value('(./result[@value = "Best"]/@numvotes)[1]','int') AS BestVotes,
R.ResultXML.value('(./result[@value = "Recommended"]/@numvotes)[1]','int') AS RecommendedVotes,
R.ResultXML.value('(./result[@value = "Not Recommended"]/@numvotes)[1]','int') AS NotRecommendedVotes
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML);

Nodes takes us down to the results level, creates a row per result element and as we have six results elements we get six rows.
The first column in the data set uses the query method to obtain ‘.’ which simply means the current element. This method will have a whole post to itself later in the series but I’m dropping this in now as it’s a really helpful approach when debugging your code. Just click on the XML to see what you’re currently working with. Note how we just get a single snippet of XML holding one results element and its associated children.
The remaining columns should be familiar if you’ve been following the series so far. The value method pulls the number of players from the attribute, because each of the three result elements have a unique attribute value and we know what they are we can query for that to add each element to a column.

This works well if you have well defined children in your XML but what if we want to do the same again for the children of each node?

How Do I Extract Repeating Sub Elements of XML With The Nodes Method?

There are a few ways of approaching this. The first is to build on our work above and use the nodes method a second time on the results of the first. That approach looks as follows.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers,
R2.ResultsXML.value('./@value','varchar(50)') AS PollValue,
R2.ResultsXML.value('./@numvotes','int') AS NumberOfVotes
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML)
CROSS APPLY R.ResultXML.nodes('./result') AS R2(ResultsXML);

R from the first query gives us XML and we repeat the process to give us R2 which repeats for each repetition of the full XML document. Six rows from the first CROSS APPLY each hold three result values so we end up with 18 rows in total. We pick some values from each level to give us the full picture on each row.

We can instead go directly to the bottom level rather than using CROSS APPLY twice but how do we then get those values (NumPlayers in our case)?
This is where the detail of what the nodes method does becomes important, it’s a common misconception (one I held anyway) that nodes cuts up the XML, splits it between rows and throws away everything else but that’s not accurate. Look back up to the official definition and note the word “logical” and the final sentence “This way, later queries can navigate relative to these context nodes“.
Nodes is less like a pair of scissors and more like an open all sub folders option in a file explorer. We get one new window for each sub folder but any files or folders above that aren’t gone and we can always navigate back up to them.

With that in mind and an awareness that we can use ‘..’ within XPath to move up to the parent of an object we can re-write to give this.

SELECT
R2.ResultsXML.query('.') AS ResultLevelXML,
R2.ResultsXML.query('..') AS ResultsLevelXML,
R2.ResultsXML.value('../@numplayers','varchar(50)') AS NumPlayers,
R2.ResultsXML.value('./@value','varchar(50)') AS PollValue,
R2.ResultsXML.value('./@numvotes','int') AS NumberOfVotes,
R2.ResultsXML.value('../../@title','varchar(50)') AS Poll
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results/result') AS R2(ResultsXML);

Results in the same eighteen rows. To demonstrate the concept a step further we go up two levels to obtain the poll title as well.

Note: I’ve seen suggestions that the second approach can lead to performance issues when you make heavy use of navigating back up the XML structure. I’ve never seen this myself but it’s worth doing some tests to decide which approach you prefer.

That covers the nodes method. Next time round we’ll take a look at the exists method and how it can be used to filter results or check inside XML without having to fully shred it.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s