FOR XML cannnot serialize char(0x0000) even though REPLACE function already included to replace Char(0x0000)
Asked Answered
P

1

2

I am trying to concatenate comments field per ID separated by ';' from multiple rows into one per ID in SQL Server 13. For the same, I'm using the below query:

--table1 in the table name; --element_id is the ID needed for concatenation/aggregation; --value contains the text which is to be concatenated

--Data type of value is nvarchar(max)

       ( SELECT casecomment + ';' 
           FROM (select 
    REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( 
  value
,char(0x0000),'') ,char(0x0001),'') ,char(0x0002),'') ,char(0x0003),'') ,char(0x0004),'') 
,char(0x0005),'') ,char(0x0006),'') ,char(0x0007),'') ,char(0x0008),'') ,char(0x000B),'') 
,char(0x000C),'') ,char(0x000E),'') ,char(0x000F),'') ,char(0x0010),'') ,char(0x0011),'') 
,char(0x0012),'') ,char(0x0013),'') ,char(0x0014),'') ,char(0x0015),'') ,char(0x0016),'') 
,char(0x0017),'') ,char(0x0018),'') ,char(0x0019),'') ,char(0x001A),'') ,char(0x001B),'') 
,char(0x001C),'') ,char(0x001D),'') ,char(0x001E),'') ,char(0x001F),'')
 as casecomment, element_id from table1
where element = 'comments'
) y
          WHERE x.element_id = y.element_id
          ORDER BY element_id
            FOR XML PATH('')  , TYPE).value('.','varchar(max)') as Comments     FROM (select REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
REPLACE( REPLACE( REPLACE( REPLACE( 
 value
,char(0x0000),'') ,char(0x0001),'') ,char(0x0002),'') ,char(0x0003),'') ,char(0x0004),'') 
,char(0x0005),'') ,char(0x0006),'') ,char(0x0007),'') ,char(0x0008),'') ,char(0x000B),'') 
,char(0x000C),'') ,char(0x000E),'') ,char(0x000F),'') ,char(0x0010),'') ,char(0x0011),'') 
,char(0x0012),'') ,char(0x0013),'') ,char(0x0014),'') ,char(0x0015),'') ,char(0x0016),'') 
,char(0x0017),'') ,char(0x0018),'') ,char(0x0019),'') ,char(0x001A),'') ,char(0x001B),'') 
,char(0x001C),'') ,char(0x001D),'') ,char(0x001E),'') ,char(0x001F),'') as casecomments, element_id from table1

where element = 'comments'
) x
    GROUP BY element_id```

Even though I have used a replace for char(0x0000), I am still getting the following error:

FOR XML could not serialize the data for node 'NoName' because it contains a character (0x0000) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive.

My data contains many special characters (Latin characters, etc) but I am unable to find the offending row. Please note I have more than 400K rows so it is not possible to evaluate manually. Also, I have an old version of SQL Server, so translate function does not work.

Any help would be appreciated. Thank you.

Placia answered 17/4, 2020 at 6:29 Comment(0)
C
4

The 0-character (0x00) is very special... On the lowest level it marks the end of a string in many environments.

Try this

DECLARE @string VARCHAR(10)=CONCAT('a',CHAR(0),'b');

SELECT LEN(@string) AS LenString
      ,CAST(@string AS VARBINARY(10)) AS Internal
      ,@string AS cut_after_a
      ,CHARINDEX(CHAR(0),@string) AS Pos0_not_found
      ,REPLACE(@string,CHAR(0),'') AS Replace_not_working;

The result

LenString   Internal    cut_after_a Pos0_not_found  Replace_not_working
3           0x610062    a           0               a

But you can trick this out with a BIN-collation

SELECT LEN(@string) AS LenString
      ,CAST(@string AS VARBINARY(10)) AS Internal
      ,@string COLLATE Latin1_General_BIN AS BIN_but_cut
      ,CHARINDEX(CHAR(0) COLLATE Latin1_General_BIN,@string COLLATE Latin1_General_BIN) AS Pos0_found_at_2
      ,REPLACE(@string COLLATE Latin1_General_BIN,CHAR(0) COLLATE Latin1_General_BIN, '') AS Replace_working;

The result

LenString   Internal    BIN_but_cut Pos0_found_at_2 Replace_working
3           0x610062    a           2               ab

UPDATE: FOR XML works for me...

Try it with PATH (the recommended approach in most cases):

DECLARE @string VARCHAR(10)=CONCAT('a',CHAR(0),'b');

SELECT @string FOR XML PATH('test');

The result

<test>a&#x0;b</test>

You might read this related question.

Cryogenics answered 17/4, 2020 at 12:2 Comment(2)
Thank you so much for your help! Worked after I added COLLATE Latin1_General_BIN to my value field. Much appreciated!Placia
Your FOR XML query is missing one important option: TYPE. If you omit TYPE, the XML is returned as nvarchar(max), and the server will not enforce the disallowed characters. But if you specify TYPE (which you must do if you want to nest a subquery element), then SQL Server returns the xml as xml data type, in which case it will not allow those characters that xml "disallows" (they are forbidden - escaped or not)Ruyle

© 2022 - 2024 — McMap. All rights reserved.