I have a table with a couple thousand rows. The description and summary fields are NTEXT, and sometimes have non-ASCII chars in them. How can I locate all of the rows with non ASCII characters?
I have sometimes been using this "cast" statement to find "strange" chars
select
*
from
<Table>
where
<Field> != cast(<Field> as varchar(1000))
The data types ntext and nvarchar(max) are incompatible in the not equal to operator.
Ideas? - Using SQL Server 2005 –
Funky First build a string with all the characters you're not interested in (the example uses the 0x20 - 0x7F range, or 7 bits without the control characters.) Each character is prefixed with |, for use in the escape clause later.
-- Start with tab, line feed, carriage return
declare @str varchar(1024)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)
-- Add all normal ASCII characters (32 -> 127)
declare @i int
set @i = 32
while @i <= 127
begin
-- Uses | to escape, could be any character
set @str = @str + '|' + char(@i)
set @i = @i + 1
end
The next snippet searches for any character that is not in the list. The % matches 0 or more characters. The [] matches one of the characters inside the [], for example [abc] would match either a, b or c. The ^ negates the list, for example [^abc] would match anything that's not a, b, or c.
select *
from yourtable
where yourfield like '%[^' + @str + ']%' escape '|'
The escape character is required because otherwise searching for characters like ], % or _ would mess up the LIKE expression.
Hope this is useful, and thanks to JohnFX's comment on the other answer.
Here ya go:
SELECT *
FROM Objects
WHERE
ObjectKey LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?@\[\^_`{|}~\]\\]%' ESCAPE '\'
It's probably not the best solution, but maybe a query like:
SELECT *
FROM yourTable
WHERE yourTable.yourColumn LIKE '%[^0-9a-zA-Z]%'
Replace the "0-9a-zA-Z" expression with something that captures the full ASCII set (or a subset that your data contains).
Technically, I believe that an NCHAR(1) is a valid ASCII character IF & Only IF UNICODE(@NChar) < 256 and ASCII(@NChar) = UNICODE(@NChar) though that may not be exactly what you intended. Therefore this would be a correct solution:
;With cteNumbers as
(
Select ROW_NUMBER() Over(Order By c1.object_id) as N
From sys.system_columns c1, sys.system_columns c2
)
Select Distinct RowID
From YourTable t
Join cteNumbers n ON n <= Len(CAST(TXT As NVarchar(MAX)))
Where UNICODE(Substring(TXT, n.N, 1)) > 255
OR UNICODE(Substring(TXT, n.N, 1)) <> ASCII(Substring(TXT, n.N, 1))
This should also be very fast.
SELECT DISTINCT <RowID> FROM <Table> t JOIN (SELECT ROW_NUMBER() OVER(ORDER BY c1.object_id) as rNum From sys.system_columns c1, sys.system_columns c2) AS n ON n.rNum <= Len(<Field>) Where UNICODE(Substring(<Field>, n.rNum, 1)) > 255 OR UNICODE(Substring(<Field>, n.rNum, 1)) <> ASCII(Substring(<Field>, n.rNum, 1))
–
Crew UNICODE(Substring(<Field>, n.rNum, 1)) > 255 OR
from the where statement of my previous query resulted in a very small time reduction (5:12 vs 5:34). –
Crew I started with @CC1960's solution but found an interesting use case that caused it to fail. It seems that SQL Server will equate certain Unicode characters to their non-Unicode approximations. For example, SQL Server considers the Unicode character "fullwidth comma" (http://www.fileformat.info/info/unicode/char/ff0c/index.htm) the same as a standard ASCII comma when compared in a WHERE clause.
To get around this, have SQL Server compare the strings as binary. But remember, nvarchar and varchar binaries don't match up (16-bit vs 8-bit), so you need to convert your varchar back up to nvarchar again before doing the binary comparison:
select *
from my_table
where CONVERT(binary(5000),my_table.my_column) != CONVERT(binary(5000),CONVERT(nvarchar(1000),CONVERT(varchar(1000),my_table.my_column)))
If you are looking for a specific unicode character, you might use something like below.
select Fieldname from
(
select Fieldname,
REPLACE(Fieldname COLLATE Latin1_General_BIN,
NCHAR(65533) COLLATE Latin1_General_BIN,
'CustomText123') replacedcol
from table
) results where results.replacedcol like '%CustomText123%'
My previous answer was confusing UNICODE/non-UNICODE data. Here is a solution that should work for all situations, although I'm still running into some anomalies. It seems like certain non-ASCII unicode characters for superscript characters are being confused with the actual number character. You might be able to play around with collations to get around that.
Hopefully you already have a numbers table in your database (they can be very useful), but just in case I've included the code to partially fill that as well.
You also might need to play around with the numeric range, since unicode characters can go beyond 255.
CREATE TABLE dbo.Numbers
(
number INT NOT NULL,
CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (number)
)
GO
DECLARE @i INT
SET @i = 0
WHILE @i < 1000
BEGIN
INSERT INTO dbo.Numbers (number) VALUES (@i)
SET @i = @i + 1
END
GO
SELECT *,
T.ID, N.number, N'%' + NCHAR(N.number) + N'%'
FROM
dbo.Numbers N
INNER JOIN dbo.My_Table T ON
T.description LIKE N'%' + NCHAR(N.number) + N'%' OR
T.summary LIKE N'%' + NCHAR(N.number) + N'%'
and t.id = 1
WHERE
N.number BETWEEN 127 AND 255
ORDER BY
T.id, N.number
GO
-- This is a very, very inefficient way of doing it but should be OK for -- small tables. It uses an auxiliary table of numbers as per Itzik Ben-Gan and simply -- looks for characters with bit 7 set.
SELECT *
FROM yourTable as t
WHERE EXISTS ( SELECT *
FROM msdb..Nums as NaturalNumbers
WHERE NaturalNumbers.n < LEN(t.string_column)
AND ASCII(SUBSTRING(t.string_column, NaturalNumbers.n, 1)) > 127)
© 2022 - 2024 — McMap. All rights reserved.