Ruby 1.9.2: irb throws ArgumentError: invalid byte sequence in UTF-8 when entering German Umlaut
Asked Answered
N

3

9

I want to enter German Umlauts in my irb but get a weird error. I can enter any character of äöü without problems, but each of ÄÖÜß leads to the following error:

$ irb
ruby-1.9.2-p136 :001 > ? # here I entered Ü but it displays only ?
/Users/lorenz/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:728:in
`block in lex_int2': invalid byte sequence in UTF-8 (ArgumentError)

I have looked at a lot of SO questions regarding Ruby, rvm, and UTF-8 but none helped. Most are tied to rails or database configuration. I specifically checked the following:

locale is set correctly

$ locale
LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL="de_DE.UTF-8"

Terminal.app is set to Unicode (UTF-8) and Encoding.default_external is set correctly:

$ irb
ruby-1.9.2-p136 :001 > Encoding.default_external
 => #<Encoding:UTF-8>

Why is this still so difficult in Ruby?

Nichani answered 13/2, 2011 at 15:13 Comment(6)
Maybe it's a keyboard driver problem? Have you tried pasting the characters instead of typing them?Glaswegian
To help triangulate the problem, put the commands you're using in IRB into a source file and let Ruby run them. That will tell you if it's an IRB problem, or if Ruby itself is not happy.Indefensible
Looks like it's a problem with Terminal.app. I'm getting the same question-mark problem here, with OSX 10.6.6. I can enter an uppercase U with umlaut in xterm without a problem, however. (You can access xterm by launching X11 and choosing "Terminal" from the Applications menu.) Even after this fix, though, IRB can't handle it: if I enter string = 'Ü', I get an "invalid multibyte char (UTF-8)" Ruby error.Preferment
@adamax: the same thing happens when I copy&paste instead of type.Nichani
@the Tin Man: same problem, slightly different error message: "test.rb:1: invalid multibyte char (US-ASCII)". It works fine after I add "# -- encoding : utf-8 --" as the first line.Nichani
@hansengel: I cannot enter any German characters in xterm (tried äöüÄÖÜß)Nichani
D
2

Usually you set encoding with # coding: UTF-8 for a file.

In case of irb it might be necessary to set it in advance and explicitly:

irb -E UTF-8:UTF-8

This will set both internal and external encoding to UTF-8 on irb.

Or additionally try

irb -U

which sets the internal encoding to UTF-8.

Donelson answered 7/3, 2011 at 3:47 Comment(1)
I tried both commands but get the same result: invalid byte sequence :-(Nichani
D
2

I don't know how to solve the problem but the sure thing is this is an irb only thing, I noticed many times irb has its own unique of dealing with user's inputs (it may even well be a limitation in readline) and it only works well with some characters.

You can do a simple test to check that, create a new rb file with:

# encoding: utf-8
puts "test: Ü"

and execute it, does it works ?

While it is still a nuisance, it is not a big enough problem for me until now to bother really looking for a solution.

Delaney answered 27/3, 2011 at 12:24 Comment(3)
I already tried that, but with your file the error remains the same: test.rb:1: invalid multibyte char (US-ASCII). It seems not to be an IRB problem.Nichani
Add "# encoding: utf-8" on the first line in your script.Delaney
How about adding that to your answer? :)Adhamh
S
0

If you're running on Mac OS, it might be a readline issue. See http://henrik.nyh.se/2008/03/irb-readline .

Signalman answered 29/4, 2011 at 17:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.