How can I remove accents on a string?
Asked Answered
Y

6

73

I have the following string

áéíóú

which I need to convert it to

aeiou

How can I achieve it? (I don't need to compare, I need the new string to save)

Yeung answered 26/8, 2010 at 19:7 Comment(0)
G
135

Try using COLLATE:

select 'áéíóú' collate SQL_Latin1_General_Cp1251_CS_AS

For Unicode data, try the following:

select cast(N'áéíóú' as varchar(max)) collate SQL_Latin1_General_Cp1251_CS_AS

I am not sure what you may lose in the translation when using the second approach.

Update

It looks like œ is a special case, and we have to handle upper and lower case separately. You can do it like this (this code is a good candidate for a user-defined function):

declare @str nvarchar(max) = N'ñaàeéêèioô; Œuf un œuf'
select cast(
    replace((
        replace(@str collate Latin1_General_CS_AS, 'Œ' collate Latin1_General_CS_AS, 'OE' collate Latin1_General_CS_AS) 
    ) collate Latin1_General_CS_AS, 'œ' collate Latin1_General_CS_AS, 'oe' collate Latin1_General_CS_AS) as varchar(max)
) collate SQL_Latin1_General_Cp1251_CS_AS 
-- Output:
-- naaeeeeioo; Oeuf un oeuf

User Defined Function

create function dbo.fnRemoveAccents(@str nvarchar(max))  
returns varchar(max) as
begin
return cast(
    replace((
        replace(@str collate Latin1_General_CS_AS, 'Œ' collate Latin1_General_CS_AS, 'OE' collate Latin1_General_CS_AS) 
    ) collate Latin1_General_CS_AS, 'œ' collate Latin1_General_CS_AS, 'oe' collate Latin1_General_CS_AS) as varchar(max)
) collate SQL_Latin1_General_Cp1251_CS_AS 
end
Gwen answered 26/8, 2010 at 19:14 Comment(6)
Only works for non-unicode columns. I.e., will not work for N'áéíóú'.Dodie
just a heads up: this cast will mess up case insensitive like queries due to the collate, make sure you address thatWandis
Those solutions don't work for the word "bœuf". SELECT cast(N'bœuf' as varchar(max)) collate SQL_Latin1_General_CP1253_CI_AI. It results in "b?uf". I always use those "œ" letters when testing encoding because they often break everything...Solenne
@RedFilter Yes, I do something really similar, but also with the "æ"/"Æ" character too, which seem to be the other special case! Also, in your example, I think 'Oe' should be 'OE'.Solenne
Added UDF example.Torrid
Hi, just wonder why convert into SQL_Latin1_General_Cp1251_CS_AS in the end? What special about this particular Code Page?Lucas
L
12

Use the translate function:

SELECT TRANSLATE(
N'INPUT: ïÜ×ÌùµŪč©īĐÃÙěÓńÿâŘåòÔÕłćýçÀŻūìóèůüíÄûØõäÕťżîŃà£êřßøŽÖáďÉęúÂĪāËžŮōÑÇĆź®Š¥ĘĒśŹĚŚšŸ¢ŁéąÈđÆÍÛĄÝĎēČÊŌŇöÏňëÎæãŤñÒÚĀÅÁô',
N'ÁÀÂÃÄÅàáâãäåĀāąĄæÆÇçćĆčČ¢©đĐďĎÈÉÊËèéêëěĚĒēęĘÌÍÎÏìíîïĪīłŁ£ÑñňŇńŃÒÓÔÕÕÖØòóôõöøŌōřŘ®ŠšśŚßťŤÙÚÛÜùúûüůŮŪūµ×¥ŸÿýÝŽžżŻźŹ', 
N'aaaaaaaaaaaaaaaaaaccccccccddddeeeeeeeeeeeeeeiiiiiiiiiilllnnnnnnooooooooooooooorrrsssssttuuuuuuuuuuuuuxyyyyyzzzzzz');

-- OUTPUT: 'INPUT: iuxiuuuccidaueonyaraooolcycazuioeuuiauooaotzinalersozoadeeuaiaezuoncczrsyeeszessycleaedaiuaydeceonoineiaatnouaaao'

Check this link to find more 'look-a-like' characters:

https://github.com/apache/lucene-solr/blob/1ca7067a810578d4e246b5434b9cdcec7145d230/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java#L189

Leucoplast answered 1/4, 2020 at 9:39 Comment(4)
FYI: Works in SQL Server 2017+Harpoon
NOTE that there are two bugs in the 3rd parameter: The correct translation should be N'aaaaaaaaaaaaaaaaaaccccccccddddeeeeeeeeeeeeeeiiiiiiiiiilllnnnnnnooooooooooooooorrrsssssttuuuuuuuuuuuuuxyyyyyzzzzzz'Standup
Chek my answer for a simpler approach also using TRANSLATE()Determinant
@Standup I fixed the third parameter with your recommendation. Waiting for approval.Overkill
H
5

Sometimes, the string can have another COLLATION, so you still have accents in the result. In that case, you can use this line (based on this solution here):

SELECT convert(varchar, your_string) COLLATE SQL_Latin1_General_Cp1251_CS_AS;
Hipped answered 27/6, 2019 at 13:11 Comment(0)
M
4

I had the same problem. In Greek for proper conversion to UPPER() you must suppress accent. Changing collation caused issues in other applications. Putting some REPLACE() functions I had more control on the behavior maintaining collation. Below is my ToUpperCaseGR function.

    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO

    create FUNCTION ToUpperCaseGR
    (
     @word nvarchar(max)
    )
    RETURNS nvarchar(max)
    AS
    BEGIN
        -- Declare the return variable here
        declare @res nvarchar(max)
        set @res = UPPER(@word)
        set @res = replace(@res,'Ά','Α')
        set @res = replace(@res,'Έ','Ε')
        set @res = replace(@res,'Ί','Ι')
        set @res = replace(@res,'Ή','Η')
        set @res = replace(@res,'Ό','Ο')
        set @res = replace(@res,'Ύ','Υ')
        set @res = replace(@res,'Ώ','Ω')


        -- Return the result of the function
        RETURN @res

    END
    GO
Mendy answered 21/2, 2017 at 14:43 Comment(1)
The collate does not work for me, but this one is deadproof.Sharitasharity
D
4

Use the TRANSLATE() function, together with COLLATE and an accent-insensitive (AI) collation to do an easy catch-all replace:

SELECT TRANSLATE('TÉST ínpüt' COLLATE Latin1_General_CS_AI, 'AEIOUaeiou', 'AEIOUaeiou')

-- OUTPUT: 'TEST input'
Determinant answered 8/2, 2023 at 7:28 Comment(0)
B
2

Use this function:

CREATE FUNCTION [dbo].[F_RemoveDiacritics] (
 @String varchar(max)
)   RETURNS varchar(max)

AS BEGIN
DECLARE @StringResult VARCHAR(max);

select @StringResult= @String collate SQL_Latin1_General_Cp1251_CS_AS

return @StringResult


END
Blader answered 31/12, 2017 at 11:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.