Sql Server: Selective XML Index not being efficiently used
Asked Answered
E

1

6

I'm exploring ways of improving the performance of an application which I can only affect on the database level to a limited degree. The SQL Server version is 2012 SP2 and the table and view structure in question is (I cannot really affect this + note that the xml document may have several hundred elements in total):

CREATE TABLE Orders(
    id nvarchar(64) NOT NULL,
    xmldoc xml NULL,
    CONSTRAINT PK_Order_id PRIMARY KEY CLUSTERED (id)
);

CREATE VIEW V_Orders as
SELECT 
    a.id, a.xmldoc
    ,a.xmldoc.value('data(/row/c1)[1]', 'nvarchar(max)') "Stuff"
    ,a.xmldoc.value('data(/row/c2)[1]', 'nvarchar(max)') "OrderType"
etc..... many columns
from Orders a;

A typical query (and the one being used for testing below):

SELECT id FROM V_Orders WHERE OrderType = '30791'

All the queries are performed against the view and I can affect neither the queries nor the table/view structure.

I thought adding a selective XML index to the table would be my saviour:

CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR(
    pathOrderType = '/row/c2' as SQL [nvarchar](20)
)

But even after updating the statistics the execution plan is looking weird. Couldn't post a pic as new account so the relevant details as text:

  • Clustered index seek from selectiveXml (Cost: 2% of total). Expected number of rows 1 but expected number of execution times 1269 (number of rows in the table)
  • -> Top N sort (Cost: 95% of total)
  • -> Compute scalar (Cost 0)

  • Separate branch: Clustered index scan PK_Order_id (Cost: 3% of total). Expected number of rows 1269

  • -> Merged to the Computer scalar results with Nested loops (Left outer join)
  • -> Filter
  • -> Final result (Expected number of rows 1269)

In actuality with my test data the query doesn't even return any results but whether it returns one or few doesn't make any difference. Execution times support the query really taking as long as could be deduced from the execution plan and have read counts in the thousands.

So my question is why is the selective xml index not being used properly by the optimizer? Or have I got something wrong? How would I optimize this specific query's performance with selective xml indexing (or perhaps persisted column)?

Edit: I did additional testing with larger sample data (~274k rows in the table with XML documents close to average production sizes) and compared the selective XML index to a promoted column. The results are from Profiler trace, concentrating on CPU usage and read counts. The execution plan for selective xml indexing is basically identical to what is described above.

Selective XML index and 274k rows (executing the query above): CPU: 6454, reads: 938521

After I updated the values in the searched field to be unique (total records still 274k) I got the following results:

Selective XML index and 274k rows (executing the query above): CPU: 10077, reads: 1006466

Then using a promoted (i.e. persisted) separately indexed column and using it directly in the view: CPU: 0, reads: 23

Selective XML index performance seems to be closer to full table scan than proper SQL indexed column fetch. I read somewhere that using schema for the table might help drop the TOP N step from execution plan (assuming we're searching for a non-repeating field) but I'm not sure whether that's a realistic possibility in this case.

Eupatorium answered 9/6, 2015 at 14:3 Comment(3)
You can upload the picture on other image hosting site and post the link here. People with enough reputation will happily help to include the image in your post if they think it is valuable to the question.Tobacconist
Thanks for the tip. I don't think the pic would really add much since I typed out the details in text already anyway.Eupatorium
More info on this in a question and answer at DBARepression
R
7

The selective XML index you create is stored in an internal table with the primary key from Orders as the leading column for the clustered key for the internal table and the paths specified stored as sparse columns.

The query plan you get probably looks a something like this:

enter image description here

You have a scan over the entire Orders table with a seek in the internal table on the primary key for each row in Orders. The final Filter operator is responsible for checking the value of OrderType returning only the matching rows.

Not really what you would expect from something called an index.

To the rescue comes a secondary selective XML index. They are created for one of the paths specified in the primary selective index and will create a non-clustered key on the values extracted in the path expression.

It is however not all that easy. SQL Server will not use the secondary index on predicates used on values extracted by the values() function. You have to use exists() instead. Also, exists() requires the use of XQUERY data types in the path expressions where value() uses SQL data types.

Your primary selective XML index could look like this:

CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR 
(
  pathOrderType = '/row/c2' as sql nvarchar(20), 
  pathOrderTypeX = '/row/c2/text()' as xquery 'xs:string' maxlength (20)
)

With a secondary on pathOrderTypeX.

CREATE XML INDEX I_Orders_OrderType2 ON Orders(xmldoc)
  USING XML INDEX I_Orders_OrderType FOR (pathOrderTypeX) 

And with a query that uses exist() you will get this plan.

select id
from V_Orders
where xmldoc.exist('/row/c2/text()[. = "30791"]') = 1

enter image description here

The first seek is a seek for the value you are looking for in the non-clustered index of the internal table. The key lookup is done on the clustered key on the internal table (don't know why that is necessary). And the last seek is on primary key in the Orders table followed by a filter that checks for null values in the column xmldoc.

If you can get away with using property promotion, creating calculated indexed columns in the Orders table from the XML, I guess you would still get better performance than using secondary selective XML indexes.

Repression answered 10/6, 2015 at 12:47 Comment(2)
Excellent answer! Thank you. This helped me understand how the selective xml indexing works under the hood and the restrictions. Unfortunately it also means that practically I cannot use it in my current scenario but hopefully this will be useful elsewhere and to others. Reason why selective indexing doesn't seem to apply to my exact problem is that the cases where .exist is being used, part of the xpath query is in the view so it's a combination of the actual query and view query and this results in the nice execution plan being shredded.Eupatorium
No worries. I understand that you need to expose the view to some user that does not want to / can deal with the xpath expressions in the exists(). Once I solved that by building a table valued UDF that in this case would take orderid as an argument, use that in an exists() query with sql:column() and return all id that match. Then the users of the view only need to join the UDF to the view and provide the orderid as an argument to the UDF. Or the UDF can be used in the where clause as id in (select id from UDF(123456)).Repression

© 2022 - 2024 — McMap. All rights reserved.