Querying XML In SQL Server – Part 2 – Simple Paths and Attributes With the Value Method

In part 2 of our series we’ll start to learn how to pull values out of XML with an SQL query and how we can use the XPath language to navigate a tree based on position or values within the document.

Last week we met some XML and got a feel for its structure, this week we’ll run some simple queries using XPath the XML Path Language. Anyone who has ever used a file system should feel at home here because XPath simply allows you to navigate the tree structure of a piece of XML in the same way you would the files and folders on a disk.

First off I’m going to create a temporary table and insert the XML into it. We could use a simple variable but a table makes it a little easier to run examples a piece at a time.

CREATE TABLE #MyXML (MyXML XML NOT NULL);

INSERT INTO #MyXML(MyXML)
VALUES
(
'<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
        Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
      </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>'
);

SELECT * FROM #MyXML;

Run the above and we have our table. Querying the table gives us our single row, in SSMS you can even click on the result and have it pop out in a new window in its full glory. Out in the real world you may well have a table with hundreds of rows each with a slightly different piece of XML, all of the below continues to apply. Why not try yourself by adding in the follow up game Rhino Hero Super Battle.

How Do I Query XML Attributes and Elements in SQL Server?

We can start to pull the XML apart or ‘shred’ it as it’s commonly described using the value method and XPath. Here is a query with some examples.

SELECT
--I can get the value inside an element
X.MyXML.value('(/items/item/description)[1]','varchar(500)') AS GameDescription,
--I can get an attribute on a tag
X.MyXML.value('(/items/item/yearpublished/@value)[1]','int') AS YearPublished,
--I can go straight to the first instance of an element
X.MyXML.value('(/items/item/poll/results/result/@numvotes)[1]','int') AS Poll1,
--Or I can pick a path
X.MyXML.value('(/items/item/poll/results[1]/result[2]/@numvotes)[1]','int') AS Poll2,
--three different routes to the same value
X.MyXML.value('(/items/item/poll/results[2]/result[3]/@numvotes)[1]','int') AS Poll3a,
X.MyXML.value('(/items/item/poll/results/result/@numvotes)[6]','int') AS Poll3b,
X.MyXML.value('(//@numvotes)[6]','int') AS Poll3c
FROM #MyXML AS X;

We can see a few variants above depending on whether we’re obtaining an element or an attribute. We also need to specify the data type to convert to, it’s your call as to if you feel confident enough in your inputs to go straight to the more fragile data types or if you want to grab everything as text and then convert when you have the luxury of something like TRY_CAST to clear up issues. If you’re handling dates do yourself a favour and use datetime2, sooner or later something will throw a 0001-01-01 style date your way and datetime won’t be happy.

The other key feature is the numbers in square brackets. Because elements can repeat we need to know which repetition to pick. Even if there’s only one we have to drop in a [1] because we can’t be certain there’s only one occurrence (the blessing and curse of XML being that it’s so much less rigid than a set of tables in a database).
When we specify that path we can either be very specific or more general. See the Poll3 columns.

  • Poll3a gives us a precise path down to our value
  • Poll3b gets us down to the right section of the XML and then grabs the 6th
  • Poll3c goes a step further and just picks the 6th occurrence of the attribute. That gives us the same result in this case but if we hade the same attribute inside different elements all of them would be considered.

We’re not just limited to specifying numerical positions, the next query gives some more advanced approaches. We have an example of picking the last possible element, how to pick based on attributes of parent elements and how to pick a parent based on values in the child elements.

SELECT
--Find the last value
X.MyXML.value('(//@numvotes)[last()]','int') AS LastPoll,
--Search based on parent attributes
X.MyXML.value('(/items/item/poll/results[@numplayers = 2]/result[@value = "Not Recommended"]/@numvotes)[1]','int')  AS SearchPoll,
--search based on child attributes
X.MyXML.value('(/items/item/poll/results[result/@numvotes = 24]/@numplayers)[1]','varchar(10)')  AS ChildPollPoll
FROM #MyXML AS X;

We may not want to be so static in our choices and fortunately we have options for that too.

How Can I Use an SQL Server Variable to Obtain an XML Attribute?

We can use sql:variable to take SQL variables and use them inside our path.

DECLARE @MyNumPlayers int = 2, @MyValue varchar(50) = 'Not Recommended';

SELECT
X.MyXML.value(
	'(/items/item/poll/
		results[@numplayers = sql:variable("@MyNumPlayers")]/
		result[@value = sql:variable("@MyValue")]/@numvotes)[1]'
	,'int')  AS SearchPoll
FROM #MyXML AS X;

How Can I Use an SQL Server Column to Obtain an XML Attribute?

We can a use sql:column to push values from a table and into the path of the values method. This time we query the XML six times, getting a different result each time.

With MyNumbers AS
(
SELECT N
FROM (VALUES(1),(2),(3),(4),(5),(6),(7)) AS N(N)
)
SELECT 
MyNumbers.N,
X.MyXML.value(
	'(/items/item/poll/
	results[sql:column("MyNumbers.N")]/
	result[@value = "Best"]/
	@numvotes)[1]'
	,'int'
		)  AS BestPollVotes
FROM  #MyXML AS X
CROSS JOIN MyNumbers
ORDER BY N;

The row for N=7 gives a NULL because there is no 7th Element, SQL Server doesn’t know this though and will take a look for you anyway.

Don’t go wild with the above. If you want to split repeating XML elements into rows of a table there’s a better way. Tune in next week when we’ll cover the nodes method.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

Querying XML in SQL Server – Part 1 – Introduction

Time to meet our new best friend XML and some of the things we can do with it.

Sooner or later we all encounter XML, much like the urban legend that you’re never more than 6ft away from a rat in London the same is true of data and XML.
It’s understandable, XML is easy to read, easy to put together and unconstrained by the need to mess around with difficult tasks such as defining your data structure up front. JSON may be XMLs cooler younger brother but in my day to day life as a BI developer it’s XML that tends to be sent my way to make sense of and it’s XML I’m asked to pull together for things like the stacks of data returns I fed back to the government in my Health and Social Care days.

I’ve been working with it a lot again lately to either receive messages from company systems or to share customer summaries from the data warehouse back to those systems. As I can never quite remember the correct syntax to use and I’ve learnt a few neat tricks over the last few months I thought I’d get this all in writing. At the very least it puts it all in one place for myself and it may well help someone else.

Today I’m going to keep things light (this is only my third post after all) and give a brief tour of a piece of XML as a way of introducing the key terms.

Below is a copy of the XML api output from BoardGameGeek for family favourite game Rhino Hero. I’ve edited it down a little for brevity, you can see the full output here.

<?xml version="1.0" encoding="UTF-8"?>
<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
   <item type="boardgame" id="91514">
      <thumbnail>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__thumb/img/zpzD1TJLfuNEjuTjpPbN1y1mpss=/fit-in/200x150/filters:strip_icc()/pic3271388.jpg</thumbnail>
      <image>https://cf.geekdo-images.com/R_VAhiLCzl5RXKwSluvEbg__original/img/hVn3HtD2_5vKNjhEBkL1qFB4FaU=/0x0/filters:format(jpeg)/pic3271388.jpg</image>
      <name type="primary" sortindex="1" value="Rhino Hero" />
      <name type="alternate" sortindex="1" value="Rino Ercolino" />
      <name type="alternate" sortindex="1" value="Super Rhino!" />
      <name type="alternate" sortindex="1" value="Super Rino!" />
      <description>
		Super Rhino! presents players with an incredibly heroic &amp;ndash; and regrettably heavy &amp;ndash; rhinoceros who is eager to climb a tall building and leap other tall buildings in a single bound. 
	  </description>
      <yearpublished value="2011" />
      <minplayers value="2" />
      <maxplayers value="5" />
      <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="61">
         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>
         <results numplayers="2">
            <result value="Best" numvotes="10" />
            <result value="Recommended" numvotes="40" />
            <result value="Not Recommended" numvotes="3" />
         </results>
         <results numplayers="3">
            <result value="Best" numvotes="44" />
            <result value="Recommended" numvotes="11" />
            <result value="Not Recommended" numvotes="0" />
         </results>
         <results numplayers="4">
            <result value="Best" numvotes="18" />
            <result value="Recommended" numvotes="30" />
            <result value="Not Recommended" numvotes="1" />
         </results>
         <results numplayers="5">
            <result value="Best" numvotes="5" />
            <result value="Recommended" numvotes="25" />
            <result value="Not Recommended" numvotes="13" />
         </results>
         <results numplayers="5+">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="3" />
            <result value="Not Recommended" numvotes="24" />
         </results>
      </poll>
      <playingtime value="15" />
      <minplaytime value="5" />
      <maxplaytime value="15" />
      <minage value="5" />
   </item>
</items>

One of the benefits of XML is that in most cases you can read it by eye, I won’t go as far as describing it as self documenting but you can get a fair idea of how and what any given piece of XML is doing.

So what is XML made up of?

Firstly we have tags like <description> these are like the bones of XML. They either come in pairs like this:

<description> </description>

Or as a single item like this:

<description/>

A pair of tags and everything in between is an element, elements can contain further elements and those can contain further elements. Note this part of the XML above.

         <results numplayers="1">
            <result value="Best" numvotes="0" />
            <result value="Recommended" numvotes="7" />
            <result value="Not Recommended" numvotes="29" />
         </results>

We have multiple <result> tags inside a <results> tag but if you look back up you’ll also note there are multiple <results> nested inside a <poll> tag.

Finally we have attibutes, these are extra pieces of information we can attach to give us further properties of a tag, sometimes this is all the tag needs:

<name type=”alternate” sortindex=”1″ value=”Super Rhino!”/>

Other times we’ll have both attributes and further information or tags inside the element (see the <results> tag example above).

That’s all for our whirlwind tour of XML, next we’ll start looking at how we use T-SQL to pull information out of a piece of XML and into something that will look a little more normal to the average SQL Server user.

Querying XML In SQL Server Series

Cheat Sheet

Posts in SQL Notebook form

I’ve Got 99.0991 Problems But Floating Point Precision ‘Aint One

In which the blogger finds an excuse to use a title he’s been holding on to for some time and we learn why float probably isn’t the best data type to hold your numerical data in.

T-SQL Tuesday Logo

For this week’s T-SQL Tuesday, Brent Ozar (b|t) asked us to share either our most or least favorite SQL Server data type. I can’t confess to having any particularly strong feelings on the subject in either direction even if you take into account my keenness to add the number 2 on the end every time I see a datetime.

I do however struggle to resist the opportunity to use a catchy title and this week’s subject was enough to finally give me the motivation to put the furloughed part of my current four day working week to good use and set up a blog.

The float data type can often look like a tempting, easy choice for numerical data types. Why worry about all that precision and scale nonsense when you can just use a data type that will hold literally* any number!

*do not take this claim literally

Let’s test this out by counting problems. We’ll start in a relaxed state and steadily increase the pressure level until we reach 99 problems (one metric Jay-Z).

DECLARE @Problems float(24) = 0.0;

While NOT(@Problems > 99)
BEGIN
	SET @Problems += 0.1
END;

Print 'I''ve got ' + CAST(@Problems as varchar(10)) + ' problems but floating point precision ''aint one';

Unfortunately we end up a little way over 99 at 99.0991 (henceforth known as an imperial Jay-Z).

Why? Because accuracy isn’t the main aim of a float, it’s a way of storing something that’s roughly the right number in a small space, in a way computers find easy to process quickly, there are values that it can get really close to but never accurately store. That’s great if you’re not that worried about precision (imagine streams of IOT readings with an inherent error to them), it’s less good if you want the exact right number (think applying interest rates to your savings).

You might also be caught out by code like this:

DECLARE @Problems float(24) = 0.0;

While @Problems <> 99
BEGIN
	SET @Problems += 0.1;
	PRINT 'I''ve got ' + CAST(@Problems as varchar(10)) + ' problems but floating point precision ''aint one';
END;

It looks innocent enough but because we never actually hit a value of 99 on the nose the loop never ends, unless you want unending loops running on your server then that’s a Bad Thing™

Is using a floating point data type the end of the world? Probably not, most likely you’ll just sooner or later waste a few hours working out why the percentages on a report don’t total to 100.
Probably the biggest benefit I get as a senior developer from the float datatype is code smell. It’s a nice clear flag that that code possibly hasn’t been thought through as carefully as it should and I probably want to take a closer look.

Thanks for reading and do come back in the future, I’m planning on starting up a series over the next few weeks on another data type I’ve been spending a lot of time with lately at work – XML.

Further Reading