XQuery Plan Complexity
Asked Answered
N

1

5

I have a SQLCLR scalar function that will steam an XmlReader that I need to shred on demand into an inline resultset. These XML objects are generated on demand so I can't use an XML index. It will be common to have over 100 columns in the resulting datasets. Consider this sample code:

CREATE XML SCHEMA COLLECTION RAB AS '
<xsd:schema xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />

  <xsd:element name="r" type="r"/>

  <xsd:complexType name="r">
    <xsd:attribute name="a" type="sqltypes:int" use="required"/>
    <xsd:attribute name="b" type="sqltypes:int" use="required"/>
    <xsd:attribute name="c" type="sqltypes:int" use="required"/>
  </xsd:complexType>
</xsd:schema>';
GO

DECLARE @D TABLE(x XML(DOCUMENT RAB) NOT NULL);

INSERT INTO @D
VALUES
('<r a="3" b="4" c="34"/>'),
('<r a="5" b="6" c="56"/>'),
('<r a="7" b="8" c="78"/>')

SELECT x.value('/r/@a', 'int') a, x.value('/r/@b', 'int') b, x.value('/r/@c', 'int') c
FROM @d a

This fills a typed XML column in a table variable with some XML values and breaks the attributes into separate columns. The execution plan for this seems overly messy:

 |--Compute Scalar(DEFINE:([Expr1009]=[Expr1008], [Expr1016]=[Expr1015], [Expr1023]=[Expr1022]))
     |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[x]))
        |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[x]))
          |    |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[x]))
          |    |    |--Table Scan(OBJECT:(@d AS [a]))
          |    |    |--Stream Aggregate(DEFINE:([Expr1008]=MIN([Expr1024])))
          |    |         |--Compute Scalar(DEFINE:([Expr1024]=CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],XML Reader with XPath filter.[lvalue],XML Reader wi
          |    |              |--Table-valued function
          |    |--Stream Aggregate(DEFINE:([Expr1015]=MIN([Expr1025])))
          |         |--Compute Scalar(DEFINE:([Expr1025]=CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],XML Reader with XPath filter.[lvalue],XML Reader with XP
          |              |--Table-valued function
          |--Stream Aggregate(DEFINE:([Expr1022]=MIN([Expr1026])))
               |--Compute Scalar(DEFINE:([Expr1026]=CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],XML Reader with XPath filter.[lvalue],XML Reader with XPath f
                    |--Table-valued function

It's got a nested loop for each column! The query plan is going to be way too complex if I join multiple of these tables with 100 columns each. Also, I don't understand the purpose of those StreamAggregate operators. The the contents go like this:

MIN(
    CASE WHEN @d.[x] as [a].[x] IS NULL 
    THEN NULL ELSE 
    CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,
            CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7))
            ,0),0))>=(128)
    THEN CONVERT_IMPLICIT(int,
        CASE WHEN datalength(xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7)))<(128)
        THEN NULL
        ELSE xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7))
        END,0)
        ELSE CONVERT_IMPLICIT(int,
            CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7)),0),0),0)
    END
END)

Yuck! I thought using a typed XML group with a sqltype type was supposed to avoid the conversion?

Either I'm overestimating how efficient this is going to be or I'm doing something wrong. My question is how can I fix this so I don't have extra query plan operators added for each column and ideally avoid the conversions, or should I give up and find a non-xpath way of doing this?

References:

sqlTypes http://msdn.microsoft.com/en-us/library/ee320775%28v=sql.105%29.aspx

XML data type methods http://technet.microsoft.com/en-us/library/ms190798%28v=sql.105%29.aspx

enter image description here

Nelsonnema answered 31/10, 2014 at 22:34 Comment(1)
The usual way I do it is SELECT r.value('@a','int'),r.value('@b','int'),r.value('@c','int') FROM @D CROSS APPLY x.nodes('/r') as ca(r) but the plan looks much the same shape (with the TVF nodes given a much higher estimated subtree cost)Proselytism
L
11

There are some mysteries in the query plan that needs to be sorted out first. What does the compute scalar do a and why is there a stream aggregate.

The table valued function returns a node table of the shredded XML, one row for each shredded row. When you use typed XML those columns are value, lvalue, lvaluebin and tid. Those columns are used in the compute scalar to calculate the actual value. The code in there looks a bit strange and I can't say that I understand why it is as it is but the gist of it is that the function xsd_cast_to_maybe_large returns the value and there is code that handles the case when the value is equal to and greater than 128 bytes.

CASE WHEN datalength(
                     CONVERT_IMPLICIT(sql_variant,
                        CONVERT_IMPLICIT(nvarchar(64),
                                         xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                                 XML Reader with XPath filter.[lvalue],
                                                                 XML Reader with XPath filter.[lvaluebin],
                                                                 XML Reader with XPath filter.[tid],(15),(5),(0)),0),0))>=(128) 
  THEN CONVERT_IMPLICIT(int,CASE WHEN datalength(xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                                         XML Reader with XPath filter.[lvalue],
                                                                         XML Reader with XPath filter.[lvaluebin],
                                                                         XML Reader with XPath filter.[tid],(15),(5),(0)))<(128) 
                              THEN NULL 
                              ELSE xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                           XML Reader with XPath filter.[lvalue],
                                                           XML Reader with XPath filter.[lvaluebin],
                                                           XML Reader with XPath filter.[tid],(15),(5),(0)) 
                            END,0) 
  ELSE CONVERT_IMPLICIT(int,CONVERT_IMPLICIT(sql_variant,
                                               CONVERT_IMPLICIT(nvarchar(64),
                                                                xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                                                        XML Reader with XPath filter.[lvalue],
                                                                                        XML Reader with XPath filter.[lvaluebin],
                                                                                        XML Reader with XPath filter.[tid],(15),(5),(0)),0),0),0) 
END

The same compute scalar for non typed XML is much simpler and actually understandable.

CASE WHEN datalength(XML Reader with XPath filter.[value])>=(128) 
  THEN CONVERT_IMPLICIT(int,XML Reader with XPath filter.[lvalue],0) 
  ELSE CONVERT_IMPLICIT(int,XML Reader with XPath filter.[value],0) 
END

If there are more than 128 bytes in value fetch from lvalue else fetch from value. In the case with non typed XML the returned node table only outputs the columns id, value and lvalue.

When you use typed XML the storage of the node values are optimized based on the datatype specified in the schema. Looks like it could either end up in value, lvalue or lvaluebin in the node table depending on what type of value it is and xsd_cast_to_maybe_large is there to help sort things out.

The stream aggregate does a min() over the returned values from the compute scalar. We know and SQL Server does (at least sometimes) knows that there will only ever be one row returned from the table valued function when you specify an XPath in the value() function. The parser makes sure that we build the XPath correctly but when the query optimizer looks at the estimated rows it sees 200 rows. The base estimate for the table valued function that parses XML is 10000 rows and then there is some adjustments made using the XPath used. In this case it ends up with 200 rows where there is only one. Pure speculation on my part is that the stream aggregate is there to take care of this discrepancy. It will never aggregate anything, only send the one row through that is returned but it does affect the cardinality estimate for the entire branch and makes sure the optimizer uses 1 rows as an estimate for that branch. That is of course really important when the optimizer chooses join strategies etc.

So how about 100 attributes? Yes, there will be 100 branches if you use the value function 100 times. But there are some optimizations to be done here. I created a test rig to see what shape and form of the query would be the fastest using 100 attributes over 10 rows.

The winner was to use untyped XML and not to use the nodes() function to shred on r.

select X.value('(/r/@a1)[1]', 'int') as a1,
       X.value('(/r/@a2)[1]', 'int') as a2,
       X.value('(/r/@a3)[1]', 'int') as a3
from @T

There is also a way to avoid the 100 branches using pivot but depending on what your actual query looks like it might not be possible. The data type coming out from the pivot must be the same. You could of course extract them as a string and convert to appropriate type in the column list. It also requires that your table has a primary/unique key.

select a1, a2, a3
from (
     select T.ID, -- primary key of @T
            A.X.value('local-name(.)', 'nvarchar(50)') as Name,
            A.X.value('.', 'int') as Value
     from @T as T
       cross apply T.X.nodes('/r/@*') as A(X)
     ) as T
pivot(min(T.Value) for Name in (a1, a2, a3)) as P  

Query plan for pivot query, 10 rows 100 attributes:

enter image description here

Below is the results and the test rig I used. I tested with 100 attributes and 10 rows and all int attributes.

Result:

Test                                                Duration (ms)
--------------------------------------------------  -------------
untyped XML value('/r[1]/@a')                       195     
untyped XML value('(/r/@a)[1]')                     108
untyped XML value('@a') cross apply nodes('/r')     131  
untyped XML value('@a') cross apply nodes('/r[1]')  127
typed XML value('/r/@a')                            185
typed XML value('(/r/@a)[1]')                       148
typed XML value('@a') cross apply nodes('/r')       176
untyped XML pivot                                    34
typed XML pivot                                      52

Code:

drop type dbo.TRABType
drop type dbo.TType;
drop xml schema collection dbo.RAB;

go

declare @NumAtt int = 100;
declare @Attribs nvarchar(max);

with xmlnamespaces('http://www.w3.org/2001/XMLSchema' as xsd)
select @Attribs = (
select top(@NumAtt) 'a'+cast(row_number() over(order by 1/0) as varchar(11)) as '@name',

                    'sqltypes:int' as '@type',
                    'required' as '@use'
from sys.columns
for xml path('xsd:attribute')
)
--CREATE XML SCHEMA COLLECTION RAB AS 

declare @Schema nvarchar(max) = 
'
<xsd:schema xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
  <xsd:element name="r" type="r"/>
  <xsd:complexType name="r">[ATTRIBS]</xsd:complexType>
</xsd:schema>';

set @Schema = replace(@Schema, '[ATTRIBS]', @Attribs)

create xml schema collection RAB as @Schema

go

create type dbo.TType as table
(
  ID int identity primary key,
  X xml not null
);

go

create type dbo.TRABType as table
(
  ID int identity primary key,
  X xml(document rab) not null
);


go

declare @NumAtt int = 100;
declare @NumRows int = 10;

declare @X nvarchar(max);
declare @C nvarchar(max);
declare @M nvarchar(max);

declare @S1 nvarchar(max);
declare @S2 nvarchar(max);
declare @S3 nvarchar(max);
declare @S4 nvarchar(max);
declare @S5 nvarchar(max);
declare @S6 nvarchar(max);
declare @S7 nvarchar(max);
declare @S8 nvarchar(max);
declare @S9 nvarchar(max);

set @X = N'<r '+
  (
  select top(@NumAtt) 'a'+cast(row_number() over(order by 1/0) as varchar(11))+'="'+cast(row_number() over(order by 1/0) as varchar(11))+'" '
  from sys.columns
  for xml path('')
  )+
'/>';

set @C = 
  stuff((
  select top(@NumAtt) ',a'+cast(row_number() over(order by 1/0) as varchar(11))
  from sys.columns
  for xml path('') 
  ), 1, 1, '')

set @M = 
  stuff((
  select top(@NumAtt) ',MAX(CASE WHEN name = ''a'+cast(row_number() over(order by 1/0) as varchar(11))+''' THEN val END)'
  from sys.columns
  for xml path('') 
  ), 1, 1, '')


declare @T dbo.TType;
insert into @T(X)
select top(@NumRows) @X
from sys.columns;

declare @TRAB dbo.TRABType;
insert into @TRAB(X)
select top(@NumRows) @X
from sys.columns;


-- value('/r[1]/@a')
set @S1 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''/r[1]/@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
option (maxdop 1)';

-- value('(/r/@a)[1]')
set @S2 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''(/r/@a'+cast(row_number() over(order by 1/0) as varchar(11))+')[1]'', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
option (maxdop 1)';

-- value('@a') cross apply nodes('/r')
set @S3 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T2.X.value(''@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
  cross apply T.X.nodes(''/r'') as T2(X)
option (maxdop 1)';


-- value('@a') cross apply nodes('/r[1]')
set @S4 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T2.X.value(''@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
  cross apply T.X.nodes(''/r[1]'') as T2(X)
option (maxdop 1)';

-- value('/r/@a') typed XML
set @S5 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''/r/@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @TRAB as T
option (maxdop 1)';

-- value('(/r/@a)[1]')
set @S6 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''(/r/@a'+cast(row_number() over(order by 1/0) as varchar(11))+')[1]'', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @TRAB as T
option (maxdop 1)';

-- value('@a') cross apply nodes('/r') typed XML
set @S7 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T2.X.value(''@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @TRAB as T
  cross apply T.X.nodes(''/r'') as T2(X)
option (maxdop 1)';

-- pivot
set @S8 = N'
select ID, '+@C+' 
from (
     select T.ID,
            A.X.value(''local-name(.)'', ''nvarchar(50)'') as Name,
            A.X.value(''.'', ''int'') as Value
     from @T as T
       cross apply T.X.nodes(''/r/@*'') as A(X)
     ) as T
pivot(min(T.Value) for Name in ('+@C+')) as P
option (maxdop 1)';  

-- typed pivot
set @S9 = N'
select ID, '+@C+' 
from (
     select T.ID,
            A.X.value(''local-name(.)'', ''nvarchar(50)'') as Name,
            cast(cast(A.X.query(''string(.)'') as varchar(11)) as int) as Value
     from @TRAB as T
       cross apply T.X.nodes(''/r/@*'') as A(X)
     ) as T
pivot(min(T.Value) for Name in ('+@C+')) as P
option (maxdop 1)';  


exec sp_executesql @S1, N'@T dbo.TType readonly', @T;
exec sp_executesql @S2, N'@T dbo.TType readonly', @T;
exec sp_executesql @S3, N'@T dbo.TType readonly', @T;
exec sp_executesql @S4, N'@T dbo.TType readonly', @T;
exec sp_executesql @S5, N'@TRAB dbo.TRABType readonly', @TRAB;
exec sp_executesql @S6, N'@TRAB dbo.TRABType readonly', @TRAB;
exec sp_executesql @S7, N'@TRAB dbo.TRABType readonly', @TRAB;
exec sp_executesql @S8, N'@T dbo.TType readonly', @T;
exec sp_executesql @S9, N'@TRAB dbo.TRABType readonly', @TRAB;
Lifegiving answered 3/11, 2014 at 10:58 Comment(1)
+1 Detailed and informative with profiled alternatives! I have considered the PIVOT option, may as well generate the key/name/value tuples directly from CLR directly in that case. I was hoping to use XML to make it more efficient. This is the kind of answer that makes SO great!Nelsonnema

© 2022 - 2024 — McMap. All rights reserved.