How can I make fixed-length Delphi strings use wide characters?
Asked Answered
I

2

5

Under Delphi 2010 (and probably under D2009 also) the default string type is UnicodeString.

However if we declare...

const
 s  :string = 'Test';
 ss :string[4] = 'Test';

... then the first string s if declared as UnicodeString, but the second one ss is declared as AnsiString!

We can check this: SizeOf(s[1]); will return size 2 and SizeOf(ss[1]); will return size 1.

If I declare...

var
  s  :string;
  ss :string[4];

... than I want that ss is also UnicodeString type.

  1. How can I tell to Delphi 2010 that both strings should be UnicodeString type?
  2. How else can I declare that ss holds four WideChars? The compiler will not accept the type declarations WideString[4] or UnicodeString[4].
  3. What is the purpose of two different compiler declarations for the same type name: string?
Immitigable answered 25/1, 2011 at 13:52 Comment(4)
You should be aware that the default string type is not WideString; it's UnicodeString. They both use wide chars, but the semantics are very different. For one thing, WideString is not reference-counted, but UnicodeString is.Lasso
@Mason This is a good point. As an aside I find the term semantics rather confusing. Semantics is the study of meaning. But what's really different about these two types is their implementation. The key difference is that, as well as reference counting, they use copy-on-write. This gives the types different performance characteristics, but the same meaning when viewed from the outside. I appreciate fully that the world of computer programmers uses the term semantics in this particular way, but it just always confuses the heck out of me!Newsdealer
@Mason, since GJ's faulty assumption about the default type doesn't really change the point of the question, I hope everyone can agree that my editing it to say UnicodeString doesn't affect the validity of any answers. The question is about how to declare fixed-length Unicode strings, whatever the actual type might be.Bertero
possible duplicate of Delphi Unicode String Type Stored Directly at its Address (or "Unicode ShortString")Emlin
N
12

The answer to this lies in the fact that string[n], which is a ShortString, is now considered a legacy type. Embarcadero took the decision not to convert ShortString to have support for Unicode. Since the long string was introduced, if my memory serves correctly, in Delphi 2, that seems a reasonable decision to me.

If you really want fixed length arrays of WideChar then you can simply declare array [1..n] of char.

Newsdealer answered 25/1, 2011 at 14:2 Comment(0)
P
4
  1. You can't, using string[4] as the type. Declaring it that way automatically makes it a ShortString.

  2. Declare it as an array of Char instead, which will make it an array of 4 WideChars.

  3. Because a string[4] makes it a string containing 4 characters. However, since WideChars can be more than one byte in size, this would be a) wrong, and b) confusing. ShortStrings are still around for backward compatibility, and are automatically AnsiStrings because they consist of [x] one byte chars.

Petal answered 25/1, 2011 at 14:1 Comment(7)
You told: WideChars can be more than one byte in size, yes but the size of WideChar is exactly 2 bytes and not less or more!Immitigable
What Ken meant is that in Unicode a code point can consist of more then one code unit. So a "char" could be 4 bytes. It's the meaning of the word "character" thats a bit confusing here, what does "character" mean? A code point or a code unit? In Delphi its a "code unit" (so 8-Bit for AnsiChar and 16-Bit for WideChar).Pascia
@Ken: An Ansi glyph can consist of more than one byte (think of multi-byte encoding). Windows even considers UTF-8 as an Ansi encoding as does Delphi 2009+.Pascia
@Jens: But ShortString wasn't actually ANSI, but ASCII (TP/BP days), IIRC. That's why it was a single-byte signed char. Or am I remembering wrong (it's possible - TP was eons ago, wasn't it? <g>)?Petal
Sure, DOS used OEM character sets for values beyond 127 (and DOS really only had 1 byte character sets IIRC). Char was always defined as #0..#255 until Delphi.NET and Delphi 2009, so I think it wasn't signed in Turbo Pascal.Pascia
Yes. They called it extended ascii at the time.Blakeslee
@Jens/@Marco: Thanks for confirming that my memory still works once in a while. :-)Petal

© 2022 - 2024 — McMap. All rights reserved.