Last week we met some XML and got a feel for its structure, this week we’ll run some simple queries using XPath the XML Path Language. Anyone who has ever used a file system should feel at home here because XPath simply allows you to navigate the tree structure of a piece of XML in the same way you would the files and folders on a disk.
First off I’m going to create a temporary table and insert the XML into it. We could use a simple variable but a table makes it a little easier to run examples a piece at a time.
CREATE TABLE #MyXML (MyXML XML NOT NULL);
INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
<item type="boardgame" id="91514">
<thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
<image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
<name type="primary" sortindex="1" value="Rhino Hero" />
<name type="alternate" sortindex="1" value="Rino Ercolino" />
<name type="alternate" sortindex="1" value="Super Rhino!" />
<name type="alternate" sortindex="1" value="Super Rino!" />
<description>
Super Rhino! presents players with an incredibly heroic &ndash; and regrettably heavy &ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound.
</description>
<yearpublished value="2011" />
<minplayers value="2" />
<maxplayers value="5" />
<poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
<results numplayers="1">
<result value="Best" numvotes="0" />
<result value="Recommended" numvotes="7" />
<result value="Not Recommended" numvotes="29" />
</results>
<results numplayers="2">
<result value="Best" numvotes="10" />
<result value="Recommended" numvotes="40" />
<result value="Not Recommended" numvotes="3" />
</results>
<results numplayers="3">
<result value="Best" numvotes="44" />
<result value="Recommended" numvotes="11" />
<result value="Not Recommended" numvotes="0" />
</results>
<results numplayers="4">
<result value="Best" numvotes="18" />
<result value="Recommended" numvotes="30" />
<result value="Not Recommended" numvotes="1" />
</results>
<results numplayers="5">
<result value="Best" numvotes="5" />
<result value="Recommended" numvotes="25" />
<result value="Not Recommended" numvotes="13" />
</results>
<results numplayers="5+">
<result value="Best" numvotes="0" />
<result value="Recommended" numvotes="3" />
<result value="Not Recommended" numvotes="24" />
</results>
</poll>
<playingtime value="15" />
<minplaytime value="5" />
<maxplaytime value="15" />
<minage value="5" />
</item>
</items>'
);
SELECT * FROM #MyXML;
Run the above and we have our table. Querying the table gives us our single row, in SSMS you can even click on the result and have it pop out in a new window in its full glory. Out in the real world you may well have a table with hundreds of rows each with a slightly different piece of XML, all of the below continues to apply. Why not try yourself by adding in the follow up game Rhino Hero Super Battle.
How Do I Query XML Attributes and Elements in SQL Server?
We can start to pull the XML apart or ‘shred’ it as it’s commonly described using the value method and XPath. Here is a query with some examples.
SELECT
--I can get the value inside an element
X.MyXML.value('(/items/item/description)[1]','varchar(500)') AS GameDescription,
--I can get an attribute on a tag
X.MyXML.value('(/items/item/yearpublished/@value)[1]','int') AS YearPublished,
--I can go straight to the first instance of an element
X.MyXML.value('(/items/item/poll/results/result/@numvotes)[1]','int') AS Poll1,
--Or I can pick a path
X.MyXML.value('(/items/item/poll/results[1]/result[2]/@numvotes)[1]','int') AS Poll2,
--three different routes to the same value
X.MyXML.value('(/items/item/poll/results[2]/result[3]/@numvotes)[1]','int') AS Poll3a,
X.MyXML.value('(/items/item/poll/results/result/@numvotes)[6]','int') AS Poll3b,
X.MyXML.value('(//@numvotes)[6]','int') AS Poll3c
FROM #MyXML AS X;
We can see a few variants above depending on whether we’re obtaining an element or an attribute. We also need to specify the data type to convert to, it’s your call as to if you feel confident enough in your inputs to go straight to the more fragile data types or if you want to grab everything as text and then convert when you have the luxury of something like TRY_CAST to clear up issues. If you’re handling dates do yourself a favour and use datetime2, sooner or later something will throw a 0001-01-01 style date your way and datetime won’t be happy.
The other key feature is the numbers in square brackets. Because elements can repeat we need to know which repetition to pick. Even if there’s only one we have to drop in a [1] because we can’t be certain there’s only one occurrence (the blessing and curse of XML being that it’s so much less rigid than a set of tables in a database).
When we specify that path we can either be very specific or more general. See the Poll3 columns.
- Poll3a gives us a precise path down to our value
- Poll3b gets us down to the right section of the XML and then grabs the 6th
- Poll3c goes a step further and just picks the 6th occurrence of the attribute. That gives us the same result in this case but if we hade the same attribute inside different elements all of them would be considered.
We’re not just limited to specifying numerical positions, the next query gives some more advanced approaches. We have an example of picking the last possible element, how to pick based on attributes of parent elements and how to pick a parent based on values in the child elements.
SELECT
--Find the last value
X.MyXML.value('(//@numvotes)[last()]','int') AS LastPoll,
--Search based on parent attributes
X.MyXML.value('(/items/item/poll/results[@numplayers = 2]/result[@value = "Not Recommended"]/@numvotes)[1]','int') AS SearchPoll,
--search based on child attributes
X.MyXML.value('(/items/item/poll/results[result/@numvotes = 24]/@numplayers)[1]','varchar(10)') AS ChildPollPoll
FROM #MyXML AS X;
We may not want to be so static in our choices and fortunately we have options for that too.
How Can I Use an SQL Server Variable to Obtain an XML Attribute?
We can use sql:variable to take SQL variables and use them inside our path.
DECLARE @MyNumPlayers int = 2, @MyValue varchar(50) = 'Not Recommended';
SELECT
X.MyXML.value(
'(/items/item/poll/
results[@numplayers = sql:variable("@MyNumPlayers")]/
result[@value = sql:variable("@MyValue")]/@numvotes)[1]'
,'int') AS SearchPoll
FROM #MyXML AS X;
How Can I Use an SQL Server Column to Obtain an XML Attribute?
We can a use sql:column to push values from a table and into the path of the values method. This time we query the XML six times, getting a different result each time.
With MyNumbers AS
(
SELECT N
FROM (VALUES(1),(2),(3),(4),(5),(6),(7)) AS N(N)
)
SELECT
MyNumbers.N,
X.MyXML.value(
'(/items/item/poll/
results[sql:column("MyNumbers.N")]/
result[@value = "Best"]/
@numvotes)[1]'
,'int'
) AS BestPollVotes
FROM #MyXML AS X
CROSS JOIN MyNumbers
ORDER BY N;
The row for N=7 gives a NULL because there is no 7th Element, SQL Server doesn’t know this though and will take a look for you anyway.
Don’t go wild with the above. If you want to split repeating XML elements into rows of a table there’s a better way. Tune in next week when we’ll cover the nodes method.
Querying XML In SQL Server Series