Querying XML In SQL Server – Part 4 – Filtering With The Exist Method

In part four of our series on querying XML in SQL server we look at the exist method and how it can help to filter results or flag for missing elements.

Back in Part 3 of this series we covered what I consider to be the essentials of querying XML. Between the values and nodes methods you can break up XML into rows and extract the values into columns and from there you can do anything you could with data lifted from a standard table in the database.

However there’s a lot that you can do that might perform better or look neater if a bit more of the query work was handled inside the XML and a bit less on the SQL side of the equation.

You can see what Microsoft has to say about the exist method here but you can probably guess a lot from the name and what we’ve seen of the values and nodes methods so far. Exist runs a query and returns a bit field, giving a 1 if the query produces a result and 0 if the result is empty.

Let’s start out as we always do with our sample XML.

CREATE TABLE #MyXML (MyXML XML NOT NULL);

INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);

SELECT * FROM #MyXML;

Imagine we want to find all the results elements that had a result with more than 15 votes. We might write it like this.

SELECT
R2.ResultsXML.value('../@numplayers','varchar(50)') AS NumPlayers
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results/result') AS R2(ResultsXML)
WHERE R2.ResultsXML.value('./@numvotes','int') > 15;

But if you run that you’ll notice we get an extra row because Numplayers =4 has two different results that meet this condition. We could of course sweep this issue under the carpet with DISTINCT but that’s not going to perform well with larger data sets. We might also be able to produce something more complicated using EXISTS like this.

SELECT
R.ResultXML.query('.') AS ResultLevelXML,
R.ResultXML.value('./@numplayers','varchar(50)') AS NumPlayers
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R(ResultXML)
WHERE EXISTS (
SELECT 1
FROM R.ResultXML.nodes('./result') AS R2(ResultsXML)
WHERE R2.ResultsXML.value('./@numvotes','int') > 15
);

That’s a lot more code to type out, it’s harder to read and a quick peek at the query plans for the above two makes me think that second query is going to get ugly if you threw more data at it.

How can I use the exist method to filter results from XML in SQL Server?

We can use exist to test a piece of xml to see if a matching element exists inside it, any time a result is returned we get a 1 and if nothing is found we get a 0. This can then be used to test within the WHERE clause of a TSQL statement like the one below.

SELECT
R2.ResultsXML.value('./@numplayers','varchar(50)') AS NumPlayers
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R2(ResultsXML)
WHERE ResultsXML.exist('./result[@numvotes > 15]') = 1;

This looks a lot neater to me both in code view and as a query plan. Hopefully I’ll get the chance to do some performance tests in a later post in this series once I’ve built a suitable data set to test against. Anecdotally I can confirm I’ve had no issues using this approach in a work environment.

The exist method doesn’t have to be used in the where clause we can use it to give us useful columns in our results too.

How can I use the exist method to test for existence of XML elements in SQL Server?

The exist method can be added to a column definition to show if a particular element or value exists within the XML without filtering it out. Consider this example that lists all results and gives us a flag to show if any of the result level elements had 30 or more votes.

SELECT
R2.ResultsXML.value('./@numplayers','varchar(50)') AS NumPlayers,
ResultsXML.exist('./result[@numvotes > 30]') AS HasThirtyVotes
FROM #MyXML AS X
CROSS APPLY X.MyXML.nodes('/items/item/poll/results') AS R2(ResultsXML);

This approach can be really useful if there are optional elements to a piece of XML. I’ve used this where the XML represents a process performed in distinct steps and there’s a need to see which of those steps of the process have been started without needing any more detail. Exist allows you to quickly dip in and see what’s there and what isn’t without going any further than that.

Why Am I Getting Errors When Using The Node and Exist Methods Together?

A word of warning, while testing the above code I received syntax errors in SSMS like the one in the image below.
The good news is this appears to be an issue with SSMS rather than the code, it runs just fine but it is a little off putting to be told there’s a problem.

That’s all for this post, next time round we’re going to delve deeper into what more we can do inside XML with the query method.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s