How can I write a Unicode symbol in lua. For example I have to write symbol with 9658
when I write
string.char( 9658 );
I got an error. So how is it possible to write such a symbol.
How can I write a Unicode symbol in lua. For example I have to write symbol with 9658
when I write
string.char( 9658 );
I got an error. So how is it possible to write such a symbol.
Lua does not look inside strings. So, you can just write
mychar = "►"
(added in 2015)
Lua 5.3 introduced support for UTF-8 escape sequences:
The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (note the mandatory enclosing brackets), where XXX is a sequence of one or more hexadecimal digits representing the character code point.
You can also use utf8.char(9658)
.
Here is an encoder for Lua that takes a Unicode code point and produces a UTF-8 string for the corresponding character:
do
local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
function utf8(decimal)
if decimal<128 then return string.char(decimal) end
local charbytes = {}
for bytes,vals in ipairs(bytemarkers) do
if decimal<=vals[1] then
for b=bytes+1,2,-1 do
local mod = decimal%64
decimal = (decimal-mod)/64
charbytes[b] = string.char(128+mod)
end
charbytes[1] = string.char(vals[2]+decimal)
break
end
end
return table.concat(charbytes)
end
end
c=utf8(0x24) print(c.." is "..#c.." bytes.") --> $ is 1 bytes.
c=utf8(0xA2) print(c.." is "..#c.." bytes.") --> ¢ is 2 bytes.
c=utf8(0x20AC) print(c.." is "..#c.." bytes.") --> € is 3 bytes.
c=utf8(0x24B62) print(c.." is "..#c.." bytes.") --> 𤭢 is 4 bytes.
Maybe this can help you:
function FromUTF8(pos)
local mod = math.mod
local function charat(p)
local v = editor.CharAt[p]; if v < 0 then v = v + 256 end; return v
end
local v, c, n = 0, charat(pos), 1
if c < 128 then v = c
elseif c < 192 then
error("Byte values between 0x80 to 0xBF cannot start a multibyte sequence")
elseif c < 224 then v = mod(c, 32); n = 2
elseif c < 240 then v = mod(c, 16); n = 3
elseif c < 248 then v = mod(c, 8); n = 4
elseif c < 252 then v = mod(c, 4); n = 5
elseif c < 254 then v = mod(c, 2); n = 6
else
error("Byte values between 0xFE and OxFF cannot start a multibyte sequence")
end
for i = 2, n do
pos = pos + 1; c = charat(pos)
if c < 128 or c > 191 then
error("Following bytes must have values between 0x80 and 0xBF")
end
v = v * 64 + mod(c, 64)
end
return v, pos, n
end
To get broader support for Unicode string content, one approach is slnunicode which was developed as part of the Selene database library. It will give you a module that supports most of what the standard string
library does, but with Unicode characters and UTF-8 encoding.
Lua code:
function utf8Char (decimal)
if decimal < 128 then
return string.char(decimal)
elseif decimal < 2048 then
local byte2 = (128 + (decimal % 64))
local byte1 = (192 + math.floor(decimal / 64))
return string.char(byte1, byte2)
elseif decimal < 65536 then
local byte3 = (128 + (decimal % 64))
decimal = math.floor(decimal / 64)
local byte2 = (128 + (decimal % 64))
local byte1 = (224 + math.floor(decimal / 64))
return string.char(byte1, byte2, byte3)
elseif decimal < 1114112 then
local byte4 = (128 + (decimal % 64))
decimal = math.floor(decimal / 64)
local byte3 = (128 + (decimal % 64))
decimal = math.floor(decimal / 64)
local byte2 = (128 + (decimal % 64))
local byte1 = (240 + math.floor(decimal / 64))
return string.char(byte1, byte2, byte3, byte4)
else
return nil -- Invalid Unicode code point
end
end
Examples:
print ('65', utf8Char (65))
print ('255', utf8Char (255))
print ('256', utf8Char (256))
print ('512', utf8Char (512))
print ('1060', utf8Char (1060))
print ('2768', utf8Char (2768))
print ('12040', utf8Char (12040))
print ('64256', utf8Char (64256))
print ('66360', utf8Char (66360))
Output:
65 A
255 ÿ
256 Ā
512 Ȁ
1060 Ф
2768 ૐ
12040 ⼈
64256 ff
66360 𐌸
Just require the utf8 library as:
utf8 = require ('utf8')
local numbers = {65, 126, 161, 255, 256, 512, 1060, 2768, 12040, 64256, 66360}
print ('symbols: '..'"'..utf8.char (unpack(numbers))..'"')
for _, num in ipairs (numbers) do
local char = utf8.char (num)
print (num, char, utf8.codepoint (char))
end
Result:
symbols: "A~¡ÿĀȀФૐ⼈ff𐌸"
65 A 65
126 ~ 126
161 ¡ 161
255 ÿ 255
256 Ā 256
512 Ȁ 512
1060 Ф 1060
2768 ૐ 2768
12040 ⼈ 12040
64256 ff 64256
66360 𐌸 66360
© 2022 - 2025 — McMap. All rights reserved.