Incompatible Character Encoding in rails - how to just fail/skip sensibly?
Asked Answered
S

1

3

I'm having an issue when importing Email subjects via IMAP. I'm getting a problem, I think related to the £ sign in email subjects. Having spent a couple of hours touring around various answers I can't seem to find anything that works... If I try the following...

Using ruby 2.1.2 views/emails/index

=email.subject
incompatible character encodings: ASCII-8BIT and UTF-8

=email.subject.scrub
incompatible character encodings: ASCII-8BIT and UTF-8

= email.subject.encode!('UTF-8', 'UTF-8', :invalid => :replace)
invalid byte sequence in UTF-8

= email.subject.force_encoding('UTF-8')
invalid byte sequence in UTF-8

= email.subject.encode("UTF-8", invalid: :replace)
"\xA3" from ASCII-8BIT to UTF-8

/xA3 is the '£' sign which shouldn't be that unusual.

I'm currently working with the following...

-if email.subject.force_encoding('UTF-8').valid_encoding?
      =email.subject
    -else
      "Can't display"

What I would ideally do is just have something which checked if the encoding was working, and then did something like #scrub is supposed to do... I'd even take it with '/xA3' perfectly happily so long as it wasn't throwing an error and I could basically see the text.

Any ideas on either how to do it properly or a fudge to solve the issue?

Sham answered 18/10, 2014 at 0:51 Comment(0)
S
3

After much pain this is how I solved it.

You need to add default encoding to your environment.rb file, like so:

# Load the rails application
require File.expand_path('../application', __FILE__)
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
# Initialize the rails application
Stma::Application.initialize!

Apparently this is something to do with Ruby's roots in japan. When dealing with Japanese (or russian) characters this wouldn't be helpful so this sort of thing isn't there as standard.

I've then done the following:

mail_object = Mail.new(mail[0].attr["RFC822"])
subject = mail_object.subject.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if mail_object.subject
body_part = (mail_object.text_part || mail_object.html_part || mail_object).body.decoded
body = body_part.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if body_part

from = mail_object.from.join(",") if mail_object.from #deals with multiple addresses
to = mail_object.to.join(",") if mail_object.to #deals with multiple addresses

That should get all the main pieces into strings / text you can easily work with that won't fail nastily if somethings missing/unusual...etc. Hope that helps somebody...

Sham answered 28/10, 2014 at 8:48 Comment(1)
In my case, using mail.html_part.body.decoded.force_encoding('UTF-8') worked fine, while this answer removes many charactersMetonymy

© 2022 - 2024 — McMap. All rights reserved.