How to convert strange strong/bold Unicode to non bold UTF-8 chars in php?
Asked Answered
R

4

7

I'm trying to store a tweet in my database with twitter api, but I get this kind of strage chars which seems to be "naturals" bold chars

NORMAL CHARS:

azertyuio

STRANGE CHARS:

๐˜€๐—ฒ๐˜ ๐—ถ๐˜€ ๐—ฟ๐—ฒ๐—ฎ๐—ฑ๐˜† ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ฒ ๐—ฑ๐—ถ๐˜€๐—ฐ๐˜‚๐˜€๐˜€๐—ถ๐—ผ๐—ป!!

If I paste the strongs chars in my netbeans editor I get something like square chars...

I've never seen that before. Could you help me to convert this text in a non bold chars in php?

Rocaille answered 15/2, 2017 at 16:0 Comment(6)
What database? What is the table structure, and specifically the character set/collation you are using? This looks like a character set issue. It seems that you need to be using UTF-8 within your php client script and for storage in the field in your table. See this question: #8275472 โ€“ Audun
for example var_dump(ord('๐˜€')); //return 240 var_dump(ord('s')); //return 115 โ€“ Rocaille
These are unicode characters, specifically MATHEMATICAL SANS-SERIF BOLD SMALL from U+1D400 to U+1D7FF. โ€“ Avron
ok thanks but how can I convert its chars to "classic" chars ? So strange ... why twitter use this kind of chars ? โ€“ Rocaille
Can you call iconv or a related library/plug-in for PHP? $ echo ๐˜€๐—ฒ๐˜ ๐—ถ๐˜€ ๐—ฟ๐—ฒ๐—ฎ๐—ฑ๐˜† ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ฒ ๐—ฑ๐—ถ๐˜€๐—ฐ๐˜‚๐˜€๐˜€๐—ถ๐—ผ๐—ป | iconv -f UTF-8 -t ASCII//TRANSLIT yields set is ready for the discussion. โ€“ Clamatorial
FYI: 3v4l.org/ZfobW See also: Remove Strange Font Coding โ€“ Thiosinamine
T
0

This is one of the reasons for using UTF or HTML entity character encoding rather than ansi. UTF allows you to store and display characters like these (and those from other languages), handle searches when someone inputs these characters in those languages/charsets (which will only match things written in those same characters), and so on.

The alternative would be for you to write a "conversion" for every odd character set that people choose to use. Still, converting these is possible to do -- you'll just need to decide whether it is really worth your time.

The characters you submitted are called Sans-Serif Mathematical Bold characters. You can find the list here at w3.org. As well, there are standard, slanted, slanted bold variations for just these (use the previous and next links at the top of that page).

The problem you will encounter is that, unlike switching capitalized characters to lowercase (add 32 to the decimal value, or chr(ord(x)+32) ) there won't be a set decimal amount you can use to switch all characters from Mathematical Bold to an ANSI equivalent for each of the character groups. As well, ord() and chr() will not work for these characters.

Example:

๐—ฎ is 120302, a is 97. 120302 - 97 = 120205
๐—” is 120276, A is 65. 120276 - 65 = 120211

Thus, subtracting 120205 would give you the correct lowercase a for ๐—ฎ, however, the same would not work for ๐—”. That means your would have to determine which charset the character is (Mathematical Bold, Slanted Mathematical, etc), identify the subset it belongs to (a-z, A-Z, 0-9), then use a corresponding offset you calculated to correct it. In order to do that, you have to check every character of every tweet for characters that fit in one of your supported conversion charsets, then convert it those letters.

That might be worth doing if there are a large number of tweets using Mathematical Bold only, but if you're importing large sets of tweets *that can contain all sorts of potential characters, you're in for a lot of work.

If you think it is worthwhile, the first thing you'll need to do is look at the raw character encoding you're receiving from the API, whether it needs to be converted, then decide whether you want to map between charsets using an array of characters, use a range of values for the subsets, or some other method. You also need to decide how you'll scan for those characters.

All in all, the answer to your question is that it is possible to convert them, but your situation and particulars are going to determine whether it is worthwhile and how you accomplish it. It's not something that can be written for you.

Tallula answered 15/2, 2017 at 18:34 Comment(3)
woowww ! Big thanks for this reply :) Now I understand ;) I will look it I can find a function on the web for this issue (but I doubt ... ) I keep you informed ;) thanks โ€“ Rocaille
And for info tweets with this kind of chars are made by an extension not natively by twitter โ€“ Rocaille
@J.Doe FYI. The problem you are facing could be similarly described as trying to convert Emoji to words. Rather than Emoji, you are attempting to handle characters. In either case, what would be required is the same -- you would need to know every Emoji for every type of phone, and every corresponding word to replace it with. Same goes for various charsets and the intended characters they should be replaced by. Edit: I say the same, because from the perspective of the computer they are the same -- simply unicode characters. โ€“ Tallula
B
10

Using http://slothsoft.net/getResource.php/slothsoft/unicode-mapper source, I made a function:

public function convertSpecialCharToNormalChar($text) {
    $target = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"];
    $specialList = [
        'serifBold' => ['๐š', '๐›', '๐œ', '๐', '๐ž', '๐Ÿ', '๐ ', '๐ก', '๐ข', '๐ฃ', '๐ค', '๐ฅ', '๐ฆ', '๐ง', '๐จ', '๐ฉ', '๐ช', '๐ซ', '๐ฌ', '๐ญ', '๐ฎ', '๐ฏ', '๐ฐ', '๐ฑ', '๐ฒ', '๐ณ', '๐€', '๐', '๐‚', '๐ƒ', '๐„', '๐…', '๐†', '๐‡', '๐ˆ', '๐‰', '๐Š', '๐‹', '๐Œ', '๐', '๐Ž', '๐', '๐', '๐‘', '๐’', '๐“', '๐”', '๐•', '๐–', '๐—', '๐˜', '๐™', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'serifItalic' => ['๐‘Ž', '๐‘', '๐‘', '๐‘‘', '๐‘’', '๐‘“', '๐‘”', 'โ„Ž', '๐‘–', '๐‘—', '๐‘˜', '๐‘™', '๐‘š', '๐‘›', '๐‘œ', '๐‘', '๐‘ž', '๐‘Ÿ', '๐‘ ', '๐‘ก', '๐‘ข', '๐‘ฃ', '๐‘ค', '๐‘ฅ', '๐‘ฆ', '๐‘ง', '๐ด', '๐ต', '๐ถ', '๐ท', '๐ธ', '๐น', '๐บ', '๐ป', '๐ผ', '๐ฝ', '๐พ', '๐ฟ', '๐‘€', '๐‘', '๐‘‚', '๐‘ƒ', '๐‘„', '๐‘…', '๐‘†', '๐‘‡', '๐‘ˆ', '๐‘‰', '๐‘Š', '๐‘‹', '๐‘Œ', '๐‘', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'serifBoldItalic' => ['๐’‚', '๐’ƒ', '๐’„', '๐’…', '๐’†', '๐’‡', '๐’ˆ', '๐’‰', '๐’Š', '๐’‹', '๐’Œ', '๐’', '๐’Ž', '๐’', '๐’', '๐’‘', '๐’’', '๐’“', '๐’”', '๐’•', '๐’–', '๐’—', '๐’˜', '๐’™', '๐’š', '๐’›', '๐‘จ', '๐‘ฉ', '๐‘ช', '๐‘ซ', '๐‘ฌ', '๐‘ญ', '๐‘ฎ', '๐‘ฏ', '๐‘ฐ', '๐‘ฑ', '๐‘ฒ', '๐‘ณ', '๐‘ด', '๐‘ต', '๐‘ถ', '๐‘ท', '๐‘ธ', '๐‘น', '๐‘บ', '๐‘ป', '๐‘ผ', '๐‘ฝ', '๐‘พ', '๐‘ฟ', '๐’€', '๐’', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'sans' => ['๐–บ', '๐–ป', '๐–ผ', '๐–ฝ', '๐–พ', '๐–ฟ', '๐—€', '๐—', '๐—‚', '๐—ƒ', '๐—„', '๐—…', '๐—†', '๐—‡', '๐—ˆ', '๐—‰', '๐—Š', '๐—‹', '๐—Œ', '๐—', '๐—Ž', '๐—', '๐—', '๐—‘', '๐—’', '๐—“', '๐– ', '๐–ก', '๐–ข', '๐–ฃ', '๐–ค', '๐–ฅ', '๐–ฆ', '๐–ง', '๐–จ', '๐–ฉ', '๐–ช', '๐–ซ', '๐–ฌ', '๐–ญ', '๐–ฎ', '๐–ฏ', '๐–ฐ', '๐–ฑ', '๐–ฒ', '๐–ณ', '๐–ด', '๐–ต', '๐–ถ', '๐–ท', '๐–ธ', '๐–น', '๐Ÿข', '๐Ÿฃ', '๐Ÿค', '๐Ÿฅ', '๐Ÿฆ', '๐Ÿง', '๐Ÿจ', '๐Ÿฉ', '๐Ÿช', '๐Ÿซ', '!', '?', '.', ',', '"', "'"],
        'sansBold' => ['๐—ฎ', '๐—ฏ', '๐—ฐ', '๐—ฑ', '๐—ฒ', '๐—ณ', '๐—ด', '๐—ต', '๐—ถ', '๐—ท', '๐—ธ', '๐—น', '๐—บ', '๐—ป', '๐—ผ', '๐—ฝ', '๐—พ', '๐—ฟ', '๐˜€', '๐˜', '๐˜‚', '๐˜ƒ', '๐˜„', '๐˜…', '๐˜†', '๐˜‡', '๐—”', '๐—•', '๐—–', '๐——', '๐—˜', '๐—™', '๐—š', '๐—›', '๐—œ', '๐—', '๐—ž', '๐—Ÿ', '๐— ', '๐—ก', '๐—ข', '๐—ฃ', '๐—ค', '๐—ฅ', '๐—ฆ', '๐—ง', '๐—จ', '๐—ฉ', '๐—ช', '๐—ซ', '๐—ฌ', '๐—ญ', '๐Ÿฌ', '๐Ÿญ', '๐Ÿฎ', '๐Ÿฏ', '๐Ÿฐ', '๐Ÿฑ', '๐Ÿฒ', '๐Ÿณ', '๐Ÿด', '๐Ÿต', 'โ—', 'โ“', '.', ',', '"', "'"],
        'sansItalic' => ['๐˜ข', '๐˜ฃ', '๐˜ค', '๐˜ฅ', '๐˜ฆ', '๐˜ง', '๐˜จ', '๐˜ฉ', '๐˜ช', '๐˜ซ', '๐˜ฌ', '๐˜ญ', '๐˜ฎ', '๐˜ฏ', '๐˜ฐ', '๐˜ฑ', '๐˜ฒ', '๐˜ณ', '๐˜ด', '๐˜ต', '๐˜ถ', '๐˜ท', '๐˜ธ', '๐˜น', '๐˜บ', '๐˜ป', '๐˜ˆ', '๐˜‰', '๐˜Š', '๐˜‹', '๐˜Œ', '๐˜', '๐˜Ž', '๐˜', '๐˜', '๐˜‘', '๐˜’', '๐˜“', '๐˜”', '๐˜•', '๐˜–', '๐˜—', '๐˜˜', '๐˜™', '๐˜š', '๐˜›', '๐˜œ', '๐˜', '๐˜ž', '๐˜Ÿ', '๐˜ ', '๐˜ก', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'sansBoldItalic' => ['๐™–', '๐™—', '๐™˜', '๐™™', '๐™š', '๐™›', '๐™œ', '๐™', '๐™ž', '๐™Ÿ', '๐™ ', '๐™ก', '๐™ข', '๐™ฃ', '๐™ค', '๐™ฅ', '๐™ฆ', '๐™ง', '๐™จ', '๐™ฉ', '๐™ช', '๐™ซ', '๐™ฌ', '๐™ญ', '๐™ฎ', '๐™ฏ', '๐˜ผ', '๐˜ฝ', '๐˜พ', '๐˜ฟ', '๐™€', '๐™', '๐™‚', '๐™ƒ', '๐™„', '๐™…', '๐™†', '๐™‡', '๐™ˆ', '๐™‰', '๐™Š', '๐™‹', '๐™Œ', '๐™', '๐™Ž', '๐™', '๐™', '๐™‘', '๐™’', '๐™“', '๐™”', '๐™•', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'script' => ['๐’ถ', '๐’ท', '๐’ธ', '๐’น', 'โ„ฏ', '๐’ป', 'โ„Š', '๐’ฝ', '๐’พ', '๐’ฟ', '๐“€', '๐“', '๐“‚', '๐“ƒ', 'โ„ด', '๐“…', '๐“†', '๐“‡', '๐“ˆ', '๐“‰', '๐“Š', '๐“‹', '๐“Œ', '๐“', '๐“Ž', '๐“', '๐’œ', 'โ„ฌ', '๐’ž', '๐’Ÿ', 'โ„ฐ', 'โ„ฑ', '๐’ข', 'โ„‹', 'โ„', '๐’ฅ', '๐’ฆ', 'โ„’', 'โ„ณ', '๐’ฉ', '๐’ช', '๐’ซ', '๐’ฌ', 'โ„›', '๐’ฎ', '๐’ฏ', '๐’ฐ', '๐’ฑ', '๐’ฒ', '๐’ณ', '๐’ด', '๐’ต', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'scriptBold' => ['๐“ช', '๐“ซ', '๐“ฌ', '๐“ญ', '๐“ฎ', '๐“ฏ', '๐“ฐ', '๐“ฑ', '๐“ฒ', '๐“ณ', '๐“ด', '๐“ต', '๐“ถ', '๐“ท', '๐“ธ', '๐“น', '๐“บ', '๐“ป', '๐“ผ', '๐“ฝ', '๐“พ', '๐“ฟ', '๐”€', '๐”', '๐”‚', '๐”ƒ', '๐“', '๐“‘', '๐“’', '๐““', '๐“”', '๐“•', '๐“–', '๐“—', '๐“˜', '๐“™', '๐“š', '๐“›', '๐“œ', '๐“', '๐“ž', '๐“Ÿ', '๐“ ', '๐“ก', '๐“ข', '๐“ฃ', '๐“ค', '๐“ฅ', '๐“ฆ', '๐“ง', '๐“จ', '๐“ฉ', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'fraktur' => ['๐”ž', '๐”Ÿ', '๐” ', '๐”ก', '๐”ข', '๐”ฃ', '๐”ค', '๐”ฅ', '๐”ฆ', '๐”ง', '๐”จ', '๐”ฉ', '๐”ช', '๐”ซ', '๐”ฌ', '๐”ญ', '๐”ฎ', '๐”ฏ', '๐”ฐ', '๐”ฑ', '๐”ฒ', '๐”ณ', '๐”ด', '๐”ต', '๐”ถ', '๐”ท', '๐”„', '๐”…', 'โ„ญ', '๐”‡', '๐”ˆ', '๐”‰', '๐”Š', 'โ„Œ', 'โ„‘', '๐”', '๐”Ž', '๐”', '๐”', '๐”‘', '๐”’', '๐”“', '๐””', 'โ„œ', '๐”–', '๐”—', '๐”˜', '๐”™', '๐”š', '๐”›', '๐”œ', 'โ„จ', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'frakturBold' => ['๐–†', '๐–‡', '๐–ˆ', '๐–‰', '๐–Š', '๐–‹', '๐–Œ', '๐–', '๐–Ž', '๐–', '๐–', '๐–‘', '๐–’', '๐–“', '๐–”', '๐–•', '๐––', '๐–—', '๐–˜', '๐–™', '๐–š', '๐–›', '๐–œ', '๐–', '๐–ž', '๐–Ÿ', '๐•ฌ', '๐•ญ', '๐•ฎ', '๐•ฏ', '๐•ฐ', '๐•ฑ', '๐•ฒ', '๐•ณ', '๐•ด', '๐•ต', '๐•ถ', '๐•ท', '๐•ธ', '๐•น', '๐•บ', '๐•ป', '๐•ผ', '๐•ฝ', '๐•พ', '๐•ฟ', '๐–€', '๐–', '๐–‚', '๐–ƒ', '๐–„', '๐–…', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'monospace' => ['๐šŠ', '๐š‹', '๐šŒ', '๐š', '๐šŽ', '๐š', '๐š', '๐š‘', '๐š’', '๐š“', '๐š”', '๐š•', '๐š–', '๐š—', '๐š˜', '๐š™', '๐šš', '๐š›', '๐šœ', '๐š', '๐šž', '๐šŸ', '๐š ', '๐šก', '๐šข', '๐šฃ', '๐™ฐ', '๐™ฑ', '๐™ฒ', '๐™ณ', '๐™ด', '๐™ต', '๐™ถ', '๐™ท', '๐™ธ', '๐™น', '๐™บ', '๐™ป', '๐™ผ', '๐™ฝ', '๐™พ', '๐™ฟ', '๐š€', '๐š', '๐š‚', '๐šƒ', '๐š„', '๐š…', '๐š†', '๐š‡', '๐šˆ', '๐š‰', '๐Ÿถ', '๐Ÿท', '๐Ÿธ', '๐Ÿน', '๐Ÿบ', '๐Ÿป', '๐Ÿผ', '๐Ÿฝ', '๐Ÿพ', '๐Ÿฟ', '๏ผ', '๏ผŸ', '๏ผŽ', '๏ผŒ', '"', '๏ผ‡'],
        'fullwidth' => ['๏ฝ', '๏ฝ‚', '๏ฝƒ', '๏ฝ„', '๏ฝ…', '๏ฝ†', '๏ฝ‡', '๏ฝˆ', '๏ฝ‰', '๏ฝŠ', '๏ฝ‹', '๏ฝŒ', '๏ฝ', '๏ฝŽ', '๏ฝ', '๏ฝ', '๏ฝ‘', '๏ฝ’', '๏ฝ“', '๏ฝ”', '๏ฝ•', '๏ฝ–', '๏ฝ—', '๏ฝ˜', '๏ฝ™', '๏ฝš', '๏ผก', '๏ผข', '๏ผฃ', '๏ผค', '๏ผฅ', '๏ผฆ', '๏ผง', '๏ผจ', '๏ผฉ', '๏ผช', '๏ผซ', '๏ผฌ', '๏ผญ', '๏ผฎ', '๏ผฏ', '๏ผฐ', '๏ผฑ', '๏ผฒ', '๏ผณ', '๏ผด', '๏ผต', '๏ผถ', '๏ผท', '๏ผธ', '๏ผน', '๏ผบ', '๏ผ', '๏ผ‘', '๏ผ’', '๏ผ“', '๏ผ”', '๏ผ•', '๏ผ–', '๏ผ—', '๏ผ˜', '๏ผ™', '๏ผ', '๏ผŸ', '๏ผŽ', '๏ผŒ', '"', '๏ผ‡'],
        'doublestruck' => ['๐•’', '๐•“', '๐•”', '๐••', '๐•–', '๐•—', '๐•˜', '๐•™', '๐•š', '๐•›', '๐•œ', '๐•', '๐•ž', '๐•Ÿ', '๐• ', '๐•ก', '๐•ข', '๐•ฃ', '๐•ค', '๐•ฅ', '๐•ฆ', '๐•ง', '๐•จ', '๐•ฉ', '๐•ช', '๐•ซ', '๐”ธ', '๐”น', 'โ„‚', '๐”ป', '๐”ผ', '๐”ฝ', '๐”พ', 'โ„', '๐•€', '๐•', '๐•‚', '๐•ƒ', '๐•„', 'โ„•', '๐•†', 'โ„™', 'โ„š', 'โ„', '๐•Š', '๐•‹', '๐•Œ', '๐•', '๐•Ž', '๐•', '๐•', 'โ„ค', '๐Ÿ˜', '๐Ÿ™', '๐Ÿš', '๐Ÿ›', '๐Ÿœ', '๐Ÿ', '๐Ÿž', '๐ŸŸ', '๐Ÿ ', '๐Ÿก', 'โ•', 'โ”', '.', ',', '"', "'"],
        'capitalized' => ['แด€', 'ส™', 'แด„', 'แด…', 'แด‡', '๊œฐ', 'ษข', 'สœ', 'ษช', 'แดŠ', 'แด‹', 'สŸ', 'แด', 'ษด', 'แด', 'แด˜', 'q', 'ส€', '๊œฑ', 'แด›', 'แดœ', 'แด ', 'แดก', 'x', 'ส', 'แดข', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '๏น—', '๏น–', '๏น’', '๏น', '"', "'"],
        'circled' => ['โ“', 'โ“‘', 'โ“’', 'โ““', 'โ“”', 'โ“•', 'โ“–', 'โ“—', 'โ“˜', 'โ“™', 'โ“š', 'โ“›', 'โ“œ', 'โ“', 'โ“ž', 'โ“Ÿ', 'โ“ ', 'โ“ก', 'โ“ข', 'โ“ฃ', 'โ“ค', 'โ“ฅ', 'โ“ฆ', 'โ“ง', 'โ“จ', 'โ“ฉ', 'โ’ถ', 'โ’ท', 'โ’ธ', 'โ’น', 'โ’บ', 'โ’ป', 'โ’ผ', 'โ’ฝ', 'โ’พ', 'โ’ฟ', 'โ“€', 'โ“', 'โ“‚', 'โ“ƒ', 'โ“„', 'โ“…', 'โ“†', 'โ“‡', 'โ“ˆ', 'โ“‰', 'โ“Š', 'โ“‹', 'โ“Œ', 'โ“', 'โ“Ž', 'โ“', 'โ“ช', 'โ‘ ', 'โ‘ก', 'โ‘ข', 'โ‘ฃ', 'โ‘ค', 'โ‘ฅ', 'โ‘ฆ', 'โ‘ง', 'โ‘จ', '!', '?', '.', ',', '"', "'"],
        'parenthesized' => ['โ’œ', 'โ’', 'โ’ž', 'โ’Ÿ', 'โ’ ', 'โ’ก', 'โ’ข', 'โ’ฃ', 'โ’ค', 'โ’ฅ', 'โ’ฆ', 'โ’ง', 'โ’จ', 'โ’ฉ', 'โ’ช', 'โ’ซ', 'โ’ฌ', 'โ’ญ', 'โ’ฎ', 'โ’ฏ', 'โ’ฐ', 'โ’ฑ', 'โ’ฒ', 'โ’ณ', 'โ’ด', 'โ’ต', '๐Ÿ„', '๐Ÿ„‘', '๐Ÿ„’', '๐Ÿ„“', '๐Ÿ„”', '๐Ÿ„•', '๐Ÿ„–', '๐Ÿ„—', '๐Ÿ„˜', '๐Ÿ„™', '๐Ÿ„š', '๐Ÿ„›', '๐Ÿ„œ', '๐Ÿ„', '๐Ÿ„ž', '๐Ÿ„Ÿ', '๐Ÿ„ ', '๐Ÿ„ก', '๐Ÿ„ข', '๐Ÿ„ฃ', '๐Ÿ„ค', '๐Ÿ„ฅ', '๐Ÿ„ฆ', '๐Ÿ„ง', '๐Ÿ„จ', '๐Ÿ„ฉ', 'โ“ฟ', 'โ‘ด', 'โ‘ต', 'โ‘ถ', 'โ‘ท', 'โ‘ธ', 'โ‘น', 'โ‘บ', 'โ‘ป', 'โ‘ผ', '!', '?', '.', ',', '"', "'"],
        'underlinedSingle' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'underlinedDouble' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'strikethroughSingle' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'crosshatch' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
    ];

    foreach ($specialList as $list) {
        $text = str_replace($list, $target, $text);
    }

    return $text;
}
Beseem answered 24/7, 2020 at 7:23 Comment(5)
I had MySQL field with utf-8 encoding. Some special chars have mb4 or even some mathematical chars here are having utf-16. I was not able to upgrade the column due to limitations, but this function was the saviour. โ€“ Casuist
I feel bad using it, but it worked :D Used it to have bold text in an input. โ€“ Franny
@Franny why do you feel bad when using it? โ€“ Beseem
I guess "the right way" would have been a div (optionally with contenteditable), but with this solution I didn'T have to rebuilt all my forms โ€“ Franny
Also, I appreciate the effort in your answer! +1 โ€“ Franny
I
1

If you don't mind using command line: Using Jacob's mapping and some Perl magic, you can convert between char sets (for serifBold):

$ echo "๐‡๐ž๐ฅ๐ฅ๐จ" | perl -Mopen=locale -Mutf8 -pe 'y/๐š-๐ณ/a-z/' | perl -Mopen=locale -Mutf8 -pe 'y/๐€-๐™/A-Z/'
Hello

Piping from and to xsel -b (pbcopy on Mac) you can convert any text currently on system clipboard.

Immethodical answered 27/3, 2022 at 20:35 Comment(2)
How about numbering? :-) โ€“ Clamatorial
@KenSharp I don't know which Unicode block do you mean, but would be the same, map that range to 0-9. โ€“ Immethodical
T
0

This is one of the reasons for using UTF or HTML entity character encoding rather than ansi. UTF allows you to store and display characters like these (and those from other languages), handle searches when someone inputs these characters in those languages/charsets (which will only match things written in those same characters), and so on.

The alternative would be for you to write a "conversion" for every odd character set that people choose to use. Still, converting these is possible to do -- you'll just need to decide whether it is really worth your time.

The characters you submitted are called Sans-Serif Mathematical Bold characters. You can find the list here at w3.org. As well, there are standard, slanted, slanted bold variations for just these (use the previous and next links at the top of that page).

The problem you will encounter is that, unlike switching capitalized characters to lowercase (add 32 to the decimal value, or chr(ord(x)+32) ) there won't be a set decimal amount you can use to switch all characters from Mathematical Bold to an ANSI equivalent for each of the character groups. As well, ord() and chr() will not work for these characters.

Example:

๐—ฎ is 120302, a is 97. 120302 - 97 = 120205
๐—” is 120276, A is 65. 120276 - 65 = 120211

Thus, subtracting 120205 would give you the correct lowercase a for ๐—ฎ, however, the same would not work for ๐—”. That means your would have to determine which charset the character is (Mathematical Bold, Slanted Mathematical, etc), identify the subset it belongs to (a-z, A-Z, 0-9), then use a corresponding offset you calculated to correct it. In order to do that, you have to check every character of every tweet for characters that fit in one of your supported conversion charsets, then convert it those letters.

That might be worth doing if there are a large number of tweets using Mathematical Bold only, but if you're importing large sets of tweets *that can contain all sorts of potential characters, you're in for a lot of work.

If you think it is worthwhile, the first thing you'll need to do is look at the raw character encoding you're receiving from the API, whether it needs to be converted, then decide whether you want to map between charsets using an array of characters, use a range of values for the subsets, or some other method. You also need to decide how you'll scan for those characters.

All in all, the answer to your question is that it is possible to convert them, but your situation and particulars are going to determine whether it is worthwhile and how you accomplish it. It's not something that can be written for you.

Tallula answered 15/2, 2017 at 18:34 Comment(3)
woowww ! Big thanks for this reply :) Now I understand ;) I will look it I can find a function on the web for this issue (but I doubt ... ) I keep you informed ;) thanks โ€“ Rocaille
And for info tweets with this kind of chars are made by an extension not natively by twitter โ€“ Rocaille
@J.Doe FYI. The problem you are facing could be similarly described as trying to convert Emoji to words. Rather than Emoji, you are attempting to handle characters. In either case, what would be required is the same -- you would need to know every Emoji for every type of phone, and every corresponding word to replace it with. Same goes for various charsets and the intended characters they should be replaced by. Edit: I say the same, because from the perspective of the computer they are the same -- simply unicode characters. โ€“ Tallula
B
0

Guided by @envy's answer, I was able to put together 2 functions to convert UTF8 special font characters both to and fro.

<?php

/* 2 global arrays used to convert a string to and from special UTF8 font characters. [BEGIN] */

$utf8_norm_fnt_char_arr = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"];

$utf8_spcl_fnt_type_arr = array 
    (
        'serifBold' => ['๐š', '๐›', '๐œ', '๐', '๐ž', '๐Ÿ', '๐ ', '๐ก', '๐ข', '๐ฃ', '๐ค', '๐ฅ', '๐ฆ', '๐ง', '๐จ', '๐ฉ', '๐ช', '๐ซ', '๐ฌ', '๐ญ', '๐ฎ', '๐ฏ', '๐ฐ', '๐ฑ', '๐ฒ', '๐ณ', '๐€', '๐', '๐‚', '๐ƒ', '๐„', '๐…', '๐†', '๐‡', '๐ˆ', '๐‰', '๐Š', '๐‹', '๐Œ', '๐', '๐Ž', '๐', '๐', '๐‘', '๐’', '๐“', '๐”', '๐•', '๐–', '๐—', '๐˜', '๐™', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'serifItalic' => ['๐‘Ž', '๐‘', '๐‘', '๐‘‘', '๐‘’', '๐‘“', '๐‘”', 'โ„Ž', '๐‘–', '๐‘—', '๐‘˜', '๐‘™', '๐‘š', '๐‘›', '๐‘œ', '๐‘', '๐‘ž', '๐‘Ÿ', '๐‘ ', '๐‘ก', '๐‘ข', '๐‘ฃ', '๐‘ค', '๐‘ฅ', '๐‘ฆ', '๐‘ง', '๐ด', '๐ต', '๐ถ', '๐ท', '๐ธ', '๐น', '๐บ', '๐ป', '๐ผ', '๐ฝ', '๐พ', '๐ฟ', '๐‘€', '๐‘', '๐‘‚', '๐‘ƒ', '๐‘„', '๐‘…', '๐‘†', '๐‘‡', '๐‘ˆ', '๐‘‰', '๐‘Š', '๐‘‹', '๐‘Œ', '๐‘', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'serifBoldItalic' => ['๐’‚', '๐’ƒ', '๐’„', '๐’…', '๐’†', '๐’‡', '๐’ˆ', '๐’‰', '๐’Š', '๐’‹', '๐’Œ', '๐’', '๐’Ž', '๐’', '๐’', '๐’‘', '๐’’', '๐’“', '๐’”', '๐’•', '๐’–', '๐’—', '๐’˜', '๐’™', '๐’š', '๐’›', '๐‘จ', '๐‘ฉ', '๐‘ช', '๐‘ซ', '๐‘ฌ', '๐‘ญ', '๐‘ฎ', '๐‘ฏ', '๐‘ฐ', '๐‘ฑ', '๐‘ฒ', '๐‘ณ', '๐‘ด', '๐‘ต', '๐‘ถ', '๐‘ท', '๐‘ธ', '๐‘น', '๐‘บ', '๐‘ป', '๐‘ผ', '๐‘ฝ', '๐‘พ', '๐‘ฟ', '๐’€', '๐’', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'sans' => ['๐–บ', '๐–ป', '๐–ผ', '๐–ฝ', '๐–พ', '๐–ฟ', '๐—€', '๐—', '๐—‚', '๐—ƒ', '๐—„', '๐—…', '๐—†', '๐—‡', '๐—ˆ', '๐—‰', '๐—Š', '๐—‹', '๐—Œ', '๐—', '๐—Ž', '๐—', '๐—', '๐—‘', '๐—’', '๐—“', '๐– ', '๐–ก', '๐–ข', '๐–ฃ', '๐–ค', '๐–ฅ', '๐–ฆ', '๐–ง', '๐–จ', '๐–ฉ', '๐–ช', '๐–ซ', '๐–ฌ', '๐–ญ', '๐–ฎ', '๐–ฏ', '๐–ฐ', '๐–ฑ', '๐–ฒ', '๐–ณ', '๐–ด', '๐–ต', '๐–ถ', '๐–ท', '๐–ธ', '๐–น', '๐Ÿข', '๐Ÿฃ', '๐Ÿค', '๐Ÿฅ', '๐Ÿฆ', '๐Ÿง', '๐Ÿจ', '๐Ÿฉ', '๐Ÿช', '๐Ÿซ', '!', '?', '.', ',', '"', "'"],
        'sansBold' => ['๐—ฎ', '๐—ฏ', '๐—ฐ', '๐—ฑ', '๐—ฒ', '๐—ณ', '๐—ด', '๐—ต', '๐—ถ', '๐—ท', '๐—ธ', '๐—น', '๐—บ', '๐—ป', '๐—ผ', '๐—ฝ', '๐—พ', '๐—ฟ', '๐˜€', '๐˜', '๐˜‚', '๐˜ƒ', '๐˜„', '๐˜…', '๐˜†', '๐˜‡', '๐—”', '๐—•', '๐—–', '๐——', '๐—˜', '๐—™', '๐—š', '๐—›', '๐—œ', '๐—', '๐—ž', '๐—Ÿ', '๐— ', '๐—ก', '๐—ข', '๐—ฃ', '๐—ค', '๐—ฅ', '๐—ฆ', '๐—ง', '๐—จ', '๐—ฉ', '๐—ช', '๐—ซ', '๐—ฌ', '๐—ญ', '๐Ÿฌ', '๐Ÿญ', '๐Ÿฎ', '๐Ÿฏ', '๐Ÿฐ', '๐Ÿฑ', '๐Ÿฒ', '๐Ÿณ', '๐Ÿด', '๐Ÿต', 'โ—', 'โ“', '.', ',', '"', "'"],
        'sansItalic' => ['๐˜ข', '๐˜ฃ', '๐˜ค', '๐˜ฅ', '๐˜ฆ', '๐˜ง', '๐˜จ', '๐˜ฉ', '๐˜ช', '๐˜ซ', '๐˜ฌ', '๐˜ญ', '๐˜ฎ', '๐˜ฏ', '๐˜ฐ', '๐˜ฑ', '๐˜ฒ', '๐˜ณ', '๐˜ด', '๐˜ต', '๐˜ถ', '๐˜ท', '๐˜ธ', '๐˜น', '๐˜บ', '๐˜ป', '๐˜ˆ', '๐˜‰', '๐˜Š', '๐˜‹', '๐˜Œ', '๐˜', '๐˜Ž', '๐˜', '๐˜', '๐˜‘', '๐˜’', '๐˜“', '๐˜”', '๐˜•', '๐˜–', '๐˜—', '๐˜˜', '๐˜™', '๐˜š', '๐˜›', '๐˜œ', '๐˜', '๐˜ž', '๐˜Ÿ', '๐˜ ', '๐˜ก', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'sansBoldItalic' => ['๐™–', '๐™—', '๐™˜', '๐™™', '๐™š', '๐™›', '๐™œ', '๐™', '๐™ž', '๐™Ÿ', '๐™ ', '๐™ก', '๐™ข', '๐™ฃ', '๐™ค', '๐™ฅ', '๐™ฆ', '๐™ง', '๐™จ', '๐™ฉ', '๐™ช', '๐™ซ', '๐™ฌ', '๐™ญ', '๐™ฎ', '๐™ฏ', '๐˜ผ', '๐˜ฝ', '๐˜พ', '๐˜ฟ', '๐™€', '๐™', '๐™‚', '๐™ƒ', '๐™„', '๐™…', '๐™†', '๐™‡', '๐™ˆ', '๐™‰', '๐™Š', '๐™‹', '๐™Œ', '๐™', '๐™Ž', '๐™', '๐™', '๐™‘', '๐™’', '๐™“', '๐™”', '๐™•', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'script' => ['๐’ถ', '๐’ท', '๐’ธ', '๐’น', 'โ„ฏ', '๐’ป', 'โ„Š', '๐’ฝ', '๐’พ', '๐’ฟ', '๐“€', '๐“', '๐“‚', '๐“ƒ', 'โ„ด', '๐“…', '๐“†', '๐“‡', '๐“ˆ', '๐“‰', '๐“Š', '๐“‹', '๐“Œ', '๐“', '๐“Ž', '๐“', '๐’œ', 'โ„ฌ', '๐’ž', '๐’Ÿ', 'โ„ฐ', 'โ„ฑ', '๐’ข', 'โ„‹', 'โ„', '๐’ฅ', '๐’ฆ', 'โ„’', 'โ„ณ', '๐’ฉ', '๐’ช', '๐’ซ', '๐’ฌ', 'โ„›', '๐’ฎ', '๐’ฏ', '๐’ฐ', '๐’ฑ', '๐’ฒ', '๐’ณ', '๐’ด', '๐’ต', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'scriptBold' => ['๐“ช', '๐“ซ', '๐“ฌ', '๐“ญ', '๐“ฎ', '๐“ฏ', '๐“ฐ', '๐“ฑ', '๐“ฒ', '๐“ณ', '๐“ด', '๐“ต', '๐“ถ', '๐“ท', '๐“ธ', '๐“น', '๐“บ', '๐“ป', '๐“ผ', '๐“ฝ', '๐“พ', '๐“ฟ', '๐”€', '๐”', '๐”‚', '๐”ƒ', '๐“', '๐“‘', '๐“’', '๐““', '๐“”', '๐“•', '๐“–', '๐“—', '๐“˜', '๐“™', '๐“š', '๐“›', '๐“œ', '๐“', '๐“ž', '๐“Ÿ', '๐“ ', '๐“ก', '๐“ข', '๐“ฃ', '๐“ค', '๐“ฅ', '๐“ฆ', '๐“ง', '๐“จ', '๐“ฉ', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'fraktur' => ['๐”ž', '๐”Ÿ', '๐” ', '๐”ก', '๐”ข', '๐”ฃ', '๐”ค', '๐”ฅ', '๐”ฆ', '๐”ง', '๐”จ', '๐”ฉ', '๐”ช', '๐”ซ', '๐”ฌ', '๐”ญ', '๐”ฎ', '๐”ฏ', '๐”ฐ', '๐”ฑ', '๐”ฒ', '๐”ณ', '๐”ด', '๐”ต', '๐”ถ', '๐”ท', '๐”„', '๐”…', 'โ„ญ', '๐”‡', '๐”ˆ', '๐”‰', '๐”Š', 'โ„Œ', 'โ„‘', '๐”', '๐”Ž', '๐”', '๐”', '๐”‘', '๐”’', '๐”“', '๐””', 'โ„œ', '๐”–', '๐”—', '๐”˜', '๐”™', '๐”š', '๐”›', '๐”œ', 'โ„จ', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'frakturBold' => ['๐–†', '๐–‡', '๐–ˆ', '๐–‰', '๐–Š', '๐–‹', '๐–Œ', '๐–', '๐–Ž', '๐–', '๐–', '๐–‘', '๐–’', '๐–“', '๐–”', '๐–•', '๐––', '๐–—', '๐–˜', '๐–™', '๐–š', '๐–›', '๐–œ', '๐–', '๐–ž', '๐–Ÿ', '๐•ฌ', '๐•ญ', '๐•ฎ', '๐•ฏ', '๐•ฐ', '๐•ฑ', '๐•ฒ', '๐•ณ', '๐•ด', '๐•ต', '๐•ถ', '๐•ท', '๐•ธ', '๐•น', '๐•บ', '๐•ป', '๐•ผ', '๐•ฝ', '๐•พ', '๐•ฟ', '๐–€', '๐–', '๐–‚', '๐–ƒ', '๐–„', '๐–…', '๐ŸŽ', '๐Ÿ', '๐Ÿ', '๐Ÿ‘', '๐Ÿ’', '๐Ÿ“', '๐Ÿ”', '๐Ÿ•', '๐Ÿ–', '๐Ÿ—', 'โ—', 'โ“', '.', ',', '"', "'"],
        'monospace' => ['๐šŠ', '๐š‹', '๐šŒ', '๐š', '๐šŽ', '๐š', '๐š', '๐š‘', '๐š’', '๐š“', '๐š”', '๐š•', '๐š–', '๐š—', '๐š˜', '๐š™', '๐šš', '๐š›', '๐šœ', '๐š', '๐šž', '๐šŸ', '๐š ', '๐šก', '๐šข', '๐šฃ', '๐™ฐ', '๐™ฑ', '๐™ฒ', '๐™ณ', '๐™ด', '๐™ต', '๐™ถ', '๐™ท', '๐™ธ', '๐™น', '๐™บ', '๐™ป', '๐™ผ', '๐™ฝ', '๐™พ', '๐™ฟ', '๐š€', '๐š', '๐š‚', '๐šƒ', '๐š„', '๐š…', '๐š†', '๐š‡', '๐šˆ', '๐š‰', '๐Ÿถ', '๐Ÿท', '๐Ÿธ', '๐Ÿน', '๐Ÿบ', '๐Ÿป', '๐Ÿผ', '๐Ÿฝ', '๐Ÿพ', '๐Ÿฟ', '๏ผ', '๏ผŸ', '๏ผŽ', '๏ผŒ', '"', '๏ผ‡'],
        'fullwidth' => ['๏ฝ', '๏ฝ‚', '๏ฝƒ', '๏ฝ„', '๏ฝ…', '๏ฝ†', '๏ฝ‡', '๏ฝˆ', '๏ฝ‰', '๏ฝŠ', '๏ฝ‹', '๏ฝŒ', '๏ฝ', '๏ฝŽ', '๏ฝ', '๏ฝ', '๏ฝ‘', '๏ฝ’', '๏ฝ“', '๏ฝ”', '๏ฝ•', '๏ฝ–', '๏ฝ—', '๏ฝ˜', '๏ฝ™', '๏ฝš', '๏ผก', '๏ผข', '๏ผฃ', '๏ผค', '๏ผฅ', '๏ผฆ', '๏ผง', '๏ผจ', '๏ผฉ', '๏ผช', '๏ผซ', '๏ผฌ', '๏ผญ', '๏ผฎ', '๏ผฏ', '๏ผฐ', '๏ผฑ', '๏ผฒ', '๏ผณ', '๏ผด', '๏ผต', '๏ผถ', '๏ผท', '๏ผธ', '๏ผน', '๏ผบ', '๏ผ', '๏ผ‘', '๏ผ’', '๏ผ“', '๏ผ”', '๏ผ•', '๏ผ–', '๏ผ—', '๏ผ˜', '๏ผ™', '๏ผ', '๏ผŸ', '๏ผŽ', '๏ผŒ', '"', '๏ผ‡'],
        'doublestruck' => ['๐•’', '๐•“', '๐•”', '๐••', '๐•–', '๐•—', '๐•˜', '๐•™', '๐•š', '๐•›', '๐•œ', '๐•', '๐•ž', '๐•Ÿ', '๐• ', '๐•ก', '๐•ข', '๐•ฃ', '๐•ค', '๐•ฅ', '๐•ฆ', '๐•ง', '๐•จ', '๐•ฉ', '๐•ช', '๐•ซ', '๐”ธ', '๐”น', 'โ„‚', '๐”ป', '๐”ผ', '๐”ฝ', '๐”พ', 'โ„', '๐•€', '๐•', '๐•‚', '๐•ƒ', '๐•„', 'โ„•', '๐•†', 'โ„™', 'โ„š', 'โ„', '๐•Š', '๐•‹', '๐•Œ', '๐•', '๐•Ž', '๐•', '๐•', 'โ„ค', '๐Ÿ˜', '๐Ÿ™', '๐Ÿš', '๐Ÿ›', '๐Ÿœ', '๐Ÿ', '๐Ÿž', '๐ŸŸ', '๐Ÿ ', '๐Ÿก', 'โ•', 'โ”', '.', ',', '"', "'"],
        'capitalized' => ['แด€', 'ส™', 'แด„', 'แด…', 'แด‡', '๊œฐ', 'ษข', 'สœ', 'ษช', 'แดŠ', 'แด‹', 'สŸ', 'แด', 'ษด', 'แด', 'แด˜', 'q', 'ส€', '๊œฑ', 'แด›', 'แดœ', 'แด ', 'แดก', 'x', 'ส', 'แดข', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '๏น—', '๏น–', '๏น’', '๏น', '"', "'"],
        'circled' => ['โ“', 'โ“‘', 'โ“’', 'โ““', 'โ“”', 'โ“•', 'โ“–', 'โ“—', 'โ“˜', 'โ“™', 'โ“š', 'โ“›', 'โ“œ', 'โ“', 'โ“ž', 'โ“Ÿ', 'โ“ ', 'โ“ก', 'โ“ข', 'โ“ฃ', 'โ“ค', 'โ“ฅ', 'โ“ฆ', 'โ“ง', 'โ“จ', 'โ“ฉ', 'โ’ถ', 'โ’ท', 'โ’ธ', 'โ’น', 'โ’บ', 'โ’ป', 'โ’ผ', 'โ’ฝ', 'โ’พ', 'โ’ฟ', 'โ“€', 'โ“', 'โ“‚', 'โ“ƒ', 'โ“„', 'โ“…', 'โ“†', 'โ“‡', 'โ“ˆ', 'โ“‰', 'โ“Š', 'โ“‹', 'โ“Œ', 'โ“', 'โ“Ž', 'โ“', 'โ“ช', 'โ‘ ', 'โ‘ก', 'โ‘ข', 'โ‘ฃ', 'โ‘ค', 'โ‘ฅ', 'โ‘ฆ', 'โ‘ง', 'โ‘จ', '!', '?', '.', ',', '"', "'"],
        'parenthesized' => ['โ’œ', 'โ’', 'โ’ž', 'โ’Ÿ', 'โ’ ', 'โ’ก', 'โ’ข', 'โ’ฃ', 'โ’ค', 'โ’ฅ', 'โ’ฆ', 'โ’ง', 'โ’จ', 'โ’ฉ', 'โ’ช', 'โ’ซ', 'โ’ฌ', 'โ’ญ', 'โ’ฎ', 'โ’ฏ', 'โ’ฐ', 'โ’ฑ', 'โ’ฒ', 'โ’ณ', 'โ’ด', 'โ’ต', '๐Ÿ„', '๐Ÿ„‘', '๐Ÿ„’', '๐Ÿ„“', '๐Ÿ„”', '๐Ÿ„•', '๐Ÿ„–', '๐Ÿ„—', '๐Ÿ„˜', '๐Ÿ„™', '๐Ÿ„š', '๐Ÿ„›', '๐Ÿ„œ', '๐Ÿ„', '๐Ÿ„ž', '๐Ÿ„Ÿ', '๐Ÿ„ ', '๐Ÿ„ก', '๐Ÿ„ข', '๐Ÿ„ฃ', '๐Ÿ„ค', '๐Ÿ„ฅ', '๐Ÿ„ฆ', '๐Ÿ„ง', '๐Ÿ„จ', '๐Ÿ„ฉ', 'โ“ฟ', 'โ‘ด', 'โ‘ต', 'โ‘ถ', 'โ‘ท', 'โ‘ธ', 'โ‘น', 'โ‘บ', 'โ‘ป', 'โ‘ผ', '!', '?', '.', ',', '"', "'"],
        'underlinedSingle' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'underlinedDouble' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'strikethroughSingle' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"],
        'crosshatch' => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '?', '.', ',', '"', "'"]
    );

/* 2 global arrays used to convert a string to and from special UTF8 font characters. [END] */

/* A global function to convert a string into special UTF8 font characters. [BEGIN] */

/* Valid Font Types ($utf8_spcl_fnt_type): 

'serifBold', 'serifItalic', 'serifBoldItalic', 'sans', 'sansBold', 'sansItalic', 'script', 'scriptBold', 
'fraktur', 'frakturBold', 'monospace', 'fullwidth', 'doublestruck', 'capitalized', 'circled', 'parenthesized', 
'underlinedSingle', 'underlinedDouble', 'strikethroughSingle', 'crosshatch' 
*/

if (!function_exists('cvtIntoSpclUTF8FntChars'))
    {
        function cvtIntoSpclUTF8FntChars($utf8_spcl_fnt_type, $str)
            {
                global $utf8_norm_fnt_char_arr, $utf8_spcl_fnt_type_arr;

                if (isset($utf8_spcl_fnt_type) && is_array($utf8_spcl_fnt_type_arr) && array_key_exists($utf8_spcl_fnt_type, $utf8_spcl_fnt_type_arr))
                    {
                        $str = str_replace($utf8_norm_fnt_char_arr, $utf8_spcl_fnt_type_arr[$utf8_spcl_fnt_type], $str);
                    };

                return $str;
            };
    };

/* A global function to convert a string into special UTF8 font characters. [END] */

/* A global function to revert a string from special UTF8 font characters. [BEGIN] */

if (!function_exists('rvtFromSpclUTF8FntChars'))
    {
        function rvtFromSpclUTF8FntChars($str)
            {
                global $utf8_norm_fnt_char_arr, $utf8_spcl_fnt_type_arr;

                foreach ($utf8_spcl_fnt_type_arr as $utf8_spcl_fnt_char_arr)
                    {
                        $str = str_replace($utf8_spcl_fnt_char_arr, $utf8_norm_fnt_char_arr, $str);
                    };

                return $str;
            };
    };

/* A global function to revert a string from special UTF8 font characters. [END] */

?>

Usage:

echo cvtIntoSpclUTF8FntChars('script', 'Hello, World!');

becomesโ€ฆ

โ„‹โ„ฏ๐“๐“โ„ด, ๐’ฒโ„ด๐“‡๐“๐’น!

and...

echo rvtFromSpclUTF8FntChars('โ„‹โ„ฏ๐“๐“โ„ด, ๐’ฒโ„ด๐“‡๐“๐’น!');

becomes...

Hello, World!

I hope this helps someone. Enjoy!๐Ÿ˜‰

Boresome answered 26/4, 2024 at 17:5 Comment(0)

© 2022 - 2025 โ€” McMap. All rights reserved.