Change special characters in array Delphi
Asked Answered
L

2

6

Some string that I am getting is UTF-8 encoded, and contains some special characters like Å¡, Ä‘, Ä etc. I am using StringReplace() to convert it to some normal text, but I can only convert one type of character. Because PHP also has a function to replace strings as seen here: how to replace special characters with the ones they're based on in PHP?, but it supports arrays:

<?php
  $vOriginalString = "¿Dónde está el niño que vive aquí? En el témpano o en el iglú. ÁFRICA, MÉXICO, ÍNDICE, CANCIÓN y NÚMERO.";

  $vSomeSpecialChars = array("á", "é", "í", "ó", "ú", "Á", "É", "Í", "Ó", "Ú", "ñ", "Ñ");
  $vReplacementChars = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U", "n", "N");

  $vReplacedString = str_replace($vSomeSpecialChars, $vReplacementChars, $vOriginalString);

  echo $vReplacedString; // outputs '¿Donde esta el nino que vive aqui? En el tempano o en el iglu. AFRICA, MEXICO, INDICE, CANCION y NUMERO.'
?>

How can I do this in Delphi? StringReplace doesn't support arrays.

Lecky answered 6/7, 2011 at 16:15 Comment(2)
The string is UTF-8 encoded and contains "special characters"? What's a "special character"? Check out this answer too -- if you have access to iconv.Josefina
If you want this for comparison, then use CompareString with at least NORM_IGNORENONSPACE in dwCmpFlags.Aluminum
H
6
function str_replace(const oldChars, newChars: array of Char; const str: string): string;
var
  i: Integer;
begin
  Assert(Length(oldChars)=Length(newChars));
  Result := str;
  for i := 0 to high(oldChars) do
    Result := StringReplace(Result, oldChars[i], newChars[i], [rfReplaceAll])
end;

If you are concerned about all the needless heap allocations caused by StringReplace then you could write it this way:

function str_replace(const oldChars, newChars: array of Char; const str: string): string;
var
  i, j: Integer;
begin
  Assert(Length(oldChars)=Length(newChars));
  Result := str;
  for i := 1 to Length(Result) do
    for j := 0 to high(oldChars) do
      if Result[i]=oldChars[j] then
      begin
        Result[i] := newChars[j];
        break;
      end;
end;

Call it like this:

newStr := str_replace(
  ['á','é','í'],
  ['a','e','i'], 
  oldStr
);
Hydropic answered 6/7, 2011 at 16:23 Comment(1)
@Lecky To save you some work, here's a complete array (among others I'm sure).Aluminum
Y
6

Getting rid of your accents is called Normalization.

Since you are using Unicode, you are not only wanting to normalize the short list of accented characters in your question. In fact you are looking for Unicode Normalization Form D (NFD) or KD (NFKD), which you can do in Windows and of course in Delphi.

This answer should get you going on the theoretical side.

This Delphi code and this answer should get you going implementing.

Yumuk answered 6/7, 2011 at 17:28 Comment(8)
This sounds like the right approach. I just naively answered the question as asked.Hydropic
Sorry, "getting rid of accents" is not normalization -- it's just getting rid of accents! Normalization doesn't change the semantics of the character, it just chooses between "base plus diacritic" and "legacy Latin-1" form (and some other forms if appropriate) in a consistent fashion so that two normalized strings compare equal if they're semantically equal. The OP's goal appears to be transliteration to ASCII-only characters.Josefina
@Kerrek: I inferred Normalization because the OP links to the PHP solution mentioning Normalization.Yumuk
@Jeroen: Even the linked accepted SO answer is wrong, as pointed out in its comments; the PHP normalize function does exactly what the Unicode standard says and what I said. It does not transliterate ä to a!Josefina
@Jeroen: Well, I would prefer iconving to ASCII//TRANSLIT and then regexing \w out, I think that's a bit simpler and more foolproof...Josefina
@Kerrek: there is no iconv in Delphi.Yumuk
@Jeroen: Shame. No way to pull in a C library? Oh well, in that case you'll have to roll your own transliterator, indeed.Josefina
@Kerrek: importing LIB files in Delphi is a pain; OBJ files can be done, but is hard. The easiest is to import DLLs. Same BTW is in .NET: importing OBJ/LIB there is virtually impossible too.Yumuk

© 2022 - 2024 — McMap. All rights reserved.