How to write a unicode symbol in lua
Asked Answered
L

6

13

How can I write a Unicode symbol in lua. For example I have to write symbol with 9658
when I write

string.char( 9658 );

I got an error. So how is it possible to write such a symbol.

Longlegged answered 2/11, 2011 at 16:10 Comment(1)
It would help to know what encoding you want the resulting string in.Lindsylindy
D
22

Lua does not look inside strings. So, you can just write

mychar = "►"

(added in 2015)

Lua 5.3 introduced support for UTF-8 escape sequences:

The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (note the mandatory enclosing brackets), where XXX is a sequence of one or more hexadecimal digits representing the character code point.

You can also use utf8.char(9658).

Dandridge answered 2/11, 2011 at 16:18 Comment(1)
Note that this would only work if the file itself is UTF-8 encoded. Of course, you can't shove Lua script at the interpreter unless it's ASCII or UTF-8.Winebaum
M
9

Here is an encoder for Lua that takes a Unicode code point and produces a UTF-8 string for the corresponding character:

do
  local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
  function utf8(decimal)
    if decimal<128 then return string.char(decimal) end
    local charbytes = {}
    for bytes,vals in ipairs(bytemarkers) do
      if decimal<=vals[1] then
        for b=bytes+1,2,-1 do
          local mod = decimal%64
          decimal = (decimal-mod)/64
          charbytes[b] = string.char(128+mod)
        end
        charbytes[1] = string.char(vals[2]+decimal)
        break
      end
    end
    return table.concat(charbytes)
  end
end

c=utf8(0x24)    print(c.." is "..#c.." bytes.") --> $ is 1 bytes.
c=utf8(0xA2)    print(c.." is "..#c.." bytes.") --> ¢ is 2 bytes.
c=utf8(0x20AC)  print(c.." is "..#c.." bytes.") --> € is 3 bytes.  
c=utf8(0x24B62) print(c.." is "..#c.." bytes.") --> 𤭢 is 4 bytes.   
Misadventure answered 27/9, 2014 at 3:33 Comment(1)
Endianess dependent, right?Sac
R
3

Maybe this can help you:

    function FromUTF8(pos)
  local mod = math.mod
  local function charat(p)
    local v = editor.CharAt[p]; if v < 0 then v = v + 256 end; return v
  end
  local v, c, n = 0, charat(pos), 1
  if c < 128 then v = c
  elseif c < 192 then
    error("Byte values between 0x80 to 0xBF cannot start a multibyte sequence")
  elseif c < 224 then v = mod(c, 32); n = 2
  elseif c < 240 then v = mod(c, 16); n = 3
  elseif c < 248 then v = mod(c,  8); n = 4
  elseif c < 252 then v = mod(c,  4); n = 5
  elseif c < 254 then v = mod(c,  2); n = 6
  else
    error("Byte values between 0xFE and OxFF cannot start a multibyte sequence")
  end
  for i = 2, n do
    pos = pos + 1; c = charat(pos)
    if c < 128 or c > 191 then
      error("Following bytes must have values between 0x80 and 0xBF")
    end
    v = v * 64 + mod(c, 64)
  end
  return v, pos, n
end
Returnable answered 2/11, 2011 at 16:13 Comment(2)
I'm pretty sure that function is the opposite of what he wants. He has a Unicode codepoint that he wants to encode in UTF-8.Winebaum
Opposite can go a long way, too! :)Feldt
C
2

To get broader support for Unicode string content, one approach is slnunicode which was developed as part of the Selene database library. It will give you a module that supports most of what the standard string library does, but with Unicode characters and UTF-8 encoding.

Consist answered 3/11, 2011 at 5:26 Comment(0)
B
0

Lua code:

function utf8Char (decimal)
    if decimal < 128 then 
        return string.char(decimal)
    elseif decimal < 2048 then 
        local byte2 = (128 + (decimal % 64))
        local byte1 = (192 + math.floor(decimal / 64))
        return string.char(byte1, byte2)
    elseif decimal < 65536 then 
        local byte3 = (128 + (decimal % 64))
        decimal = math.floor(decimal / 64)
        local byte2 = (128 + (decimal % 64))
        local byte1 = (224 + math.floor(decimal / 64))
        return string.char(byte1, byte2, byte3)
    elseif decimal < 1114112 then
        local byte4 = (128 + (decimal % 64))
        decimal = math.floor(decimal / 64)
        local byte3 = (128 + (decimal % 64))
        decimal = math.floor(decimal / 64)
        local byte2 = (128 + (decimal % 64))
        local byte1 = (240 + math.floor(decimal / 64))
        return string.char(byte1, byte2, byte3, byte4)
    else
        return nil  -- Invalid Unicode code point
    end
end

Examples:

print ('65', utf8Char (65))
print ('255', utf8Char (255))
print ('256', utf8Char (256))
print ('512', utf8Char (512))
print ('1060', utf8Char (1060))
print ('2768', utf8Char (2768))
print ('12040', utf8Char (12040))
print ('64256', utf8Char (64256))
print ('66360', utf8Char (66360))

Output:

65  A
255 ÿ
256 Ā
512 Ȁ
1060    Ф
2768    ૐ
12040   ⼈
64256   ff
66360   𐌸
Berezina answered 25/4, 2024 at 13:48 Comment(0)
B
0

Just require the utf8 library as:

utf8 = require ('utf8')

local numbers = {65, 126, 161, 255, 256, 512, 1060, 2768, 12040, 64256, 66360}
print ('symbols: '..'"'..utf8.char (unpack(numbers))..'"')
for _, num in ipairs (numbers) do
    local char = utf8.char (num)
    print (num, char, utf8.codepoint (char))
end

Result:

symbols: "A~¡ÿĀȀФૐ⼈ff𐌸"
65  A   65
126 ~   126
161 ¡   161
255 ÿ   255
256 Ā   256
512 Ȁ   512
1060    Ф   1060
2768    ૐ   2768
12040   ⼈   12040
64256   ff   64256
66360   𐌸  66360
Berezina answered 26/4, 2024 at 9:53 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.