Improve Xquery performance in SQL Server
Asked Answered
S

1

6

I have an Azure SQL Database with 1 table and a lot of records (more than 75 000). The table contains a column of the XML datatype. This column looks like this:

<error application="application" host="host" type="exception" message="message" ...>
  <serverVariables>
    <item name="name1">
      <value string="text" />
    </item>
    <item name="name2">
      <value string="text2" />
    </item>
    <item name="name3">
      <value string="text3" />
    </item>
    <item name="name4">
      <value string="text4" />
    </item>
    <item name="name5">
      <value string="text5" />
    </item>
    <item name="name6">
      <value string="text6" />
    </item>
    <item name="name7">
      <value string="text7" />
    </item>
  </serverVariables>
</error>

If I want to get all records where the item attribute name is name5 and the value attribute string is text5 I would write a query like this:

SELECT *
FROM Table
WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "text5"]]') = 1

This uses an XQuery and has to query the whole document. This is also very slow.

My question is how would it be possible to make this query execute faster? Would it be possible to declare a XML index on that column? Are there other possibilities to make XQueries execute faster?

Send answered 23/2, 2016 at 11:12 Comment(3)
Using a bigger count of XML for sorting/filtering is always a pain in the nexk... As you found yourself, creating an XML index was a choice. If you search/filter for the same thing always you could use an insert/update trigger to write this tiny information in an indexed "normal" column automatically. But I'm afraid there is no easy and fast way with this approach...Esque
Hi, I placed a question myself, might be interesting for you: stackoverflow.com/q/35597670/5089204Esque
Thanks, I will look into that question! Your test scenario was also helpfull.Send
E
4

I just did a little test. With .nodes() you could get some 3%... Not really much actually. In my test machine (just a simple laptop) I got a result out of 100.000 rows within ~5 seconds. Not that bad in fact... If you want it fast, you'll have to get the search values out of the XML or you use an XML index:

Test scenario

First I create a test table and fill it with 100.000 rows. A random number (0 to 1000) should lead to ~100 rows for each random number. This number is put into a varchar col and as a value into your XML.

Then I do a call like you'd need it with .exist() and with .nodes() with a small advantage for the second, but both take 5 to 6 seconds. In fact I do the calls twice: a second time in swapped order and with slightly changed search params and with "//item" instead of the full path to avoid false positives via cached results or plans.

Then I create an XML index and do the same calls

Now - what really did surprise me! - the .nodes with full path is much slower than before (9 secs) but the .exist() is down to half a second, with full path even down to about 0.10 sec

So my advise: Use an index and do it with .exist()

And here's the code for copy'n'paste and self testing

CREATE TABLE #testTbl(ID INT IDENTITY PRIMARY KEY, SomeData VARCHAR(100),XmlColumn XML);
GO

DECLARE @RndNumber VARCHAR(100)=(SELECT CAST(CAST(RAND()*1000 AS INT) AS VARCHAR(100)));

INSERT INTO #testTbl VALUES('Data_' + @RndNumber,
'<error application="application" host="host" type="exception" message="message" >
  <serverVariables>
    <item name="name1">
      <value string="text" />
    </item>
    <item name="name2">
      <value string="text2" />
    </item>
    <item name="name3">
      <value string="text3" />
    </item>
    <item name="name4">
      <value string="text4" />
    </item>
    <item name="name5">
      <value string="My test ' +  @RndNumber + '" />
    </item>
    <item name="name6">
      <value string="text6" />
    </item>
    <item name="name7">
      <value string="text7" />
    </item>
  </serverVariables>
</error>');

GO 100000

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesFullPath_no_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistFullPath_no_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('//item[@name="name5" and value/@string="My test 500"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistShortPath_no_index;
GO

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('//item[@name="name5" and value/@string="My test 500"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesShortPath_no_index;
GO

CREATE PRIMARY XML INDEX PXML_test_XmlColum1 ON #testTbl(XmlColumn);
CREATE XML INDEX IXML_test_XmlColumn2 ON #testTbl(XmlColumn) USING XML INDEX PXML_test_XmlColum1 FOR PATH;
GO

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesFullPath_with_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistFullPath_with_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('//item[@name="name5" and value/@string="My test 500"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistShortPath_with_index;
GO
    
DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('//item[@name="name5" and value/@string="My test 500"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesShortPath_with_index;
GO

DROP TABLE #testTbl;
Esque answered 24/2, 2016 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.