How do i construct a WideString with a diacratic in a non-unicode Delphi version?
Asked Answered
C

3

4

i am trying to construct a (test) WideString of:

á (U+00E1 Small Letter Latin A with acute)

but using it's decomposed form:

LATIN SMALL LETTER A (U+0061) COMBINING ACUTE ACCENT (U+0301)

So i have the code fragment:

var
    test: WideString;
begin
   test := #$0061#$0301;
   MessageBoxW(0, PWideChar(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

Except it doesn't appear to work:

enter image description here

This could be a bug in MessageBox, but i'm going to go ahead and say that it's more likely the bug is in my code.

Some other variations i have tried:

test := WideString(#$0061#$0301);


const
    SmallLetterLatinAWithAcuteDecomposed: WideString = #$0061#$0301;
test := SmallLetterLatinAWithAcuteDecomposed


test := #$0061+#$0301;  (Doesn't compile; incompatible types)


test := WideString(#$0061)+WideString(#$0301);  (Doesn't compile; crashes compiler)


test := 'a'+WideString(#$0301);  (Doesn't compile; crashes compiler)


//Arnauld's thought:
test := #$0301#$0061;

Bonus chatter

Canthus answered 11/8, 2011 at 16:12 Comment(7)
What Delphi version are you using? I seem to recall that some versions are more sensitive to the compilation environment's locale settings than others are.Sachikosachs
i am, indeed, still using Delphi 5. Updated tags.Canthus
Works OK with D2007 (tested the first snippet)..Description
@Ian Good for you trying to scrag some live out of the old dog that is D5. But for bonus points you really should be doing this in D2!! ;-)Sourdough
Let's have some love for Delphi 1. 16 bits ought to be enough for anybody.Copenhagen
@Warren The love ran out when 64 bit systems arrivedSourdough
@David Heffernan: Our company was using D1 back when you had to sign an NDA to get the VCL source code!Canthus
C
11

Best answer:

const
    n: WideString = '';  //n=Nothing

s := n+#$0061+#$0301;

This fixes all cases i have below that otherwise fail.


The only variant that works is to declare it as a constant:

AccentAcute: WideString = #$0301;
AccentAcute: WideString = WideChar($0301);
AccentAcute: WideString = WideChar(#$0301);
AccentAcute: WideString = WideString(#$0301);

Sample Usage:

s := 'Pasta'+AccentAcute;

Constant based syntaxes that do not work

  • AccentAcute: WideString = $0301;
    incompatible types
  • AccentAcute: WideString = #0301;
    gives enter image description here
  • AccentAcute: WideString = WideString($0301);
    invalid typecast
  • AccentAcute: WideString = WideString(#$0301);
    invalid typecast
  • AccentAcute: WideChar = WideChar(#0301); gives Pastai
  • AccentAcute: WideChar = WideChar($0301); gives Pasta´

Other syntaxes that fail

  • 'Pasta'+WideChar($0301)
    gives Pasta´
  • 'Pasta'+#$0301
    gives Pasta´
  • WideString('Pasta')+#$0301
    gives enter image description here

Summary of all constant based syntaxes i found think up:

AccentAcute: WideString =            #$0301;   //works
AccentAcute: WideString =   WideChar(#$0301);  //works
AccentAcute: WideString = WideString(#$0301);  //works
AccentAcute: WideString =             $0301;   //incompatble types
AccentAcute: WideString =    WideChar($0301);  //works
AccentAcute: WideString =  WideString($0301);  //invalid typecast

AccentAcute: WideChar =            #$0301;     //fails, gives Pasta´
AccentAcute: WideChar =   WideChar(#$0301);    //fails, gives Pasta´
AccentAcute: WideChar = WideString(#$0301);    //incompatible types
AccentAcute: WideChar =             $0301;     //incompatible types
AccentAcute: WideChar =    WideChar($0301);    //fails, gives Pasta´
AccentAcute: WideChar =  WideString($0301);    //invalid typecast

Rearranging WideChar can work, as long as you only append to a variable

//Works
t := '0123401234012340123';
t := t+WideChar(#$D840);
t := t+WideChar(#$DC00);

//fails
t := '0123401234012340123'+WideChar(#$D840);
t := t+WideChar(#$DC00);

//fails
t := '0123401234012340123'+WideChar(#$D840)+WideChar(#$DC00);

//works
t := '0123401234012340123';
t := t+WideChar(#$D840)+WideChar(#$DC00);

//works
t := '';
t := t+WideChar(#$D840)+WideChar(#$DC00);

//fails; gives junk
t := ''+WideChar(#$D840)+WideChar(#$DC00);

//crashes compiler
t := WideString('')+WideChar(#$D840)+WideChar(#$DC00);

//doesn't compile
t := WideChar(#$D840)+WideChar(#$DC00);

Definitely hitting against compiler nonsense; cases that weren't tested tested fully. Yes, i know David, we should upgrade.

Canthus answered 11/8, 2011 at 19:50 Comment(4)
Essentially; Delphi 5 not a good fit for Unicode development. :-)Copenhagen
Hopefully this nonsense is just limited to the compiler's ability to read my source code files. This should all be fixed once i've set my generation of unit-test data of characters outside the Basic Multilingual Plane (BMP) and tests of surrogate pairs.Canthus
... Then Delphi 5 was still way out of date!Copenhagen
David Heffernan's got you covered there.Canthus
C
2

This works in Delphi 5/7:

var
  test: WideString;
begin

   test := WideChar($0061);
   test := test + WideChar($0301);

   MessageBoxW(0, PWideChar(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

In short:

  • In delphi 5 and delphi 7, it does not appear that concatenating WideChars to WideString works using #$xxxx form literals.
  • # doesn't seem to work as you'd expect for unicode literals.

  • You can't just add two or more widechars in a single expression, like this:

    test := WideChar(a)+WideChar(b);  // won't compile in D5/D7.
    
Copenhagen answered 11/8, 2011 at 18:35 Comment(5)
The 'string' + WideChar($xxxx) syntax fails for 'Pasta'+WideChar($0301);. It generates Pasta´, rather than Pastá.Canthus
Ian you can't use 'ansistring' literals and just append WideChars to them! It seems crazy to me that you would even try that in Delphi 5.Copenhagen
Why would you think 'string' is an AnsiString? The compiler can treat it as a WideString during compilation.Canthus
You can add more than one WideChar in a single expression, as long as you're appending to an existing string: t := t+WideChar(#$D840)+WideChar(#$DC00); or t := t+WideChar($D840)+WideChar($DC00);Canthus
Building off our thoughts of splitting up otherwise equivalent concatenations, led me to find more cases that work, and some that don't. In the end i found the best fix was always to concatenate with a variable or constant.Canthus
C
0

Did you try #$0301#$0061 (i.e. diacritic first)?

OK.

So #$.... only handles ASCII 8 bits constants in this version.

You can just use a workaround using memory level:

type
    TWordArray  = array[1..MaxInt div SizeOf(word)-2] of word;
    // start at [1], just as WideStrings
    // or: TWordArray  = array[0..MaxInt div SizeOf(word)-1] of word;
    PWordArray = ^TWordArray;

var
  test: WideString;
begin
  test := '12'; // or SetLength(test,2);
  PWordArray(test)[1] := $61; 
  PWordArray(test)[2] := $301;
  MessageBoxW(0, pointer(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

This will always work since you don't play with chars/widechars and such.

And it will also work as expected with Unicode version of Delphi.

Currajong answered 11/8, 2011 at 17:33 Comment(4)
i tried it; updated question. i would expect it to not work since there is no "previous" character for the diacritic to apply to.Canthus
The problem is # only works for ascii codepoints (61 is fine, but $301 is > than $FF, so no go) in delphi 5.Copenhagen
You are allowed to use #$xxxx syntax in Delphi; it works for me (see above), it works in VirtualTreeview (VirtualTrees.pas), and it works in Jedi (JclUnicode.pas). The bugs i'm hitting seem to revolve around the compiler's interpretation of them in various forms. But at a basic level the syntax ThreeEighths: WideString = #$215C; is accepted.Canthus
@Ian To circumvent the #$.... inconsistency between Delphi versions, using direct in-memory access with words is safe and fast. The PWordArray(test)[] access is safe and fast.Currajong

© 2022 - 2024 — McMap. All rights reserved.