Numeric equivalence between HTML Character Entities and Delphi?
Asked Answered
C

1

8

The HTML Character Entity 𝕒:

enter image description here

can be created from the number 120146 with this HTML code:

<!DOCTYPE html>
<html>
<style>
body {
    font-size: 20px;
}
</style>
<body>

<p>I will display &#120146;</p>

</body>
</html>

Some of these extended character symbols can be created from the identical number value both in HTML and in Delphi 10.1.2. For example:

Both &#174; and Chr(174) create the "registered trademark" symbol character ®

Both &#163; and Chr(163) create the "pound" symbol character £

Etc.

Unfortunately, this is not the case with the above number 120146 where Chr(120146) in Delphi creates a "funny Chinese symbol".

So how can I create the above &aopf; character symbol from the number 120146 in Delphi? And which is the numeric range where the above numeric equivalence between HTML and Delphi does work or does not work?

Childers answered 21/9, 2017 at 19:6 Comment(0)
P
11

This is 'MATHEMATICAL DOUBLE-STRUCK SMALL A' (U+1D552). It is outside the Basic Multilingual Plane, and so in UFT-16 is encoded using a surrogate pair. Which means that two UTF-16 character elements are required.

Look at your attempt: Chr(120146). Now, 120146 > high(Word) (= 65535) which tells you that your code cannot succeed. Remember that each UTF-16 character element is 16 bits in size. It would be nice if the compiler warned about this. Does it?

The link above tells you how to encode it. It is given by this surrogate pair:

0xD835 0xDD52

In Delphi that would be most easily written as:

#$D835#$DD52

If you are starting with the UTF-32 code as a numeric value then you can convert it to a Delphi string using TCharacter.ConvertFromUtf32 from the System.Character unit:

TCharacter.ConvertFromUtf32($1D552)

Obviously the argument to this function can be a variable.

If much of the above Unicode terminology is unknown to you, read these articles:

Parhe answered 21/9, 2017 at 19:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.