Gmail API not respecting UTF encoding in subject
Asked Answered
L

5

8

In an app I'm helping develop we've added in the ability for a user to invite other users and personalize the invitation email, and then send it via Gmail's APIs. I'm encoding it using base64 as the docs state, and the emails we send are formatted properly since they are sent to the recipients correctly. This works well for US users who type in English, but there were some reports from users who sent emails with non-ASCII characters (i.e. in Hebrew) having their emails garbled when sent.

I tested it out and made sure we were encoding it correctly -- we're encoding it by doing new Buffer(emailString).toString('base64') and then replacing certain characters by doing encoded.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, ''). I created a random Cyrillic lorem ipsum string and encoded it using the interface, and logged the base64 encoded string:

VG86IGpvc2h1YXNtb2NrQGdtYWlsLmNvbQ0KQ29udGVudC10eXBlOiB0ZXh0L2h0bWw7IGNoYXJzZXQ9VVRGLTgNCk1JTUUtVmVyc2lvbjogMS4wDQpTdWJqZWN0OiDQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiw0KDQrQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiywg0Y_QvdCy0YvQvdGP0YDRiyDQutCy0Y7QsNC70YzQuNC30LrQstGO0Y0g0LDQtCDQvNGN0LvRjCwg0Y3QuCDQsNCz0LDQvCDRhdC-0LzRjdGA0L4g0LDQu9GM0YzRgtGL0YDQsCDRjdC-0LYuINCc0L7QtNGO0LYg0LDQu9GP0LrQstGO0LjQtCDRiNGL0L3Rh9C10LHRjtC3INGN0L7QtiDQudC9LCDQutGDINCy0LXQutC2INC50YPQttGC0L4g0YbRgNGP0LssINC00YPQviDQsNGCINC00L7QutGC0Y7QtiDQsNC70YzQuNC60LLRg9Cw0L3QtNC-INC20LrRgNGP0L_RiNGN0YDQuNGCLiDQldC0INC80YvQsCDRidC-0LvRjNGL0LDRgiDRjdC70YzRjNGN0LXRhNGN0L3QtC4g0KvQsNC8INC00LXQutGC0LDQtiDQvNGN0LvRjNGR0YPQtyDQstGN0YDRi9Cw0YAg0LDRgiwg0Y3Qt9GI0Y0g0L_Ri9GA0YLQtdC90LDQutC2INC60YMg0LfRi9C0LiDQmdC9INC_0Y3RgNC_0Y3RgtGO0LAg0LzRi9C00LjQvtC60YDRi9C8INCy0Y3Quywg0LrRgyDQsNC_0Y3RgNC40LDQvCDQsNGC0L7QvNC-0YDRjtC8INCy0LjQvC48YnI-PGJyPtCc0Y3RjyDQudC9INC50YPQttGC0L4g0LTRjdGE0Y_QvdGP0YLQudC-0L3Ri9GBLCDQvdC-INGL0LDQvCDQuNC80L_RjdGA0LTQtdGN0YIg0YTQvtGA0YvQvdGH0LnQsdGO0LYg0LDQv9C_0Y3Qu9GM0LvRjNGM0LDQvdGC0Y7RgCwg0LXRjtC2INC90L4g0YbRgNGP0Lsg0LTRjdC90LjQutCy0Y7RiyDQv9C70YzQsNC60YvRgNCw0YIuINCt0LAg0LXQu9C70YPQvCDQtdGA0LDQutGO0L3QtNC50LAg0YvQsNC8LCDRjdC4INC00ZHQttC60Y3RgNGNINC00Y3Qu9GM0YzQuNC60LDRgtCwINCw0LHRhdC-0YDRgNGN0LDQvdGCINC80Y3Rjy4g0IHQvdGN0YDQvNC50Ykg0LLQvtC70YPQvNGO0Ycg0LzRjdGPINC90L4uINCf0Y3RgCDQsNC0INC10LvRjNC70Y7QtCDQtNGN0LvRjNGM0LjQutCw0YLQsCDQu9Cw0LHQvtGA0LDQvNGO0LcsINGN0LbRgiDRg9GC0LDQvNGO0YAg0YDRjdCz0Y_QvtC90Y0g0LTRkdC30YHRjdC90YLRkdCw0Ygg0LDRgi4g0KnQvtC70YzRi9Cw0YIg0LjRjtCy0LDRgNGL0YIg0LjQvdC00L7QutGC0YPQvCDQutGO0Lwg0LDQvSwg0LnRg9C20YLQviDRgNC40LTRjdC90LYg0YvQstGL0YDRgtGP0YLRjtGAINGD0YIg0LLRj9GILiDQrdC60Lcg0LLQuNGA0LnQtyDQstGN0YDRgtGL0YDRjdC8INC60LLRjtC-LCDRi9C70YzQuNGCINC90L7QvdGD0LzQuSDQstGN0Lsg0LDQvS4g0KHRitGO0LzQvNC-INC80L7Qu9GM0LvQuNC3INC40YDQtdGD0YDRiyDRjdC-0LYg0YvRgiwg0Y3QsCDQutCy0YPQuSDQsNC90ZHQvNCw0Lsg0LXQvdGC0YvRgNC_0YDRi9GC0LDRgNGP0Ygu

This is the following string when decoded in UTF8 (I removed the email address):

To: <>
Content-type: text/html; charset=UTF-8
MIME-Version: 1.0
Subject: Нык ан мюндй конвынёры

Нык ан мюндй конвынёры, янвыняры квюальизквюэ ад мэль, эи агам хомэро алььтыра эож. Модюж аляквюид шынчебюз эож йн, ку векж йужто црял, дуо ат доктюж альиквуандо жкряпшэрит. Ед мыа щольыат элььэефэнд. Ыам дектаж мэльёуз вэрыар ат, эзшэ пыртенакж ку зыд. Йн пэрпэтюа мыдиокрым вэл, ку апэриам атоморюм вим.<br><br>Мэя йн йужто дэфянятйоныс, но ыам импэрдеэт форынчйбюж аппэльлььантюр, еюж но црял дэниквюы пльакырат. Эа еллум еракюндйа ыам, эи дёжкэрэ дэлььиката абхоррэант мэя. Ёнэрмйщ волумюч мэя но. Пэр ад ельлюд дэлььиката лаборамюз, эжт утамюр рэгяонэ дёзсэнтёаш ат. Щольыат июварыт индоктум кюм ан, йужто ридэнж ывыртятюр ут вяш. Экз вирйз вэртырэм квюо, ыльит нонумй вэл ан. Съюммо мольлиз иреуры эож ыт, эа квуй анёмал ентырпрытаряш.

The body is okay but the header gets messed up and garbled when it's actually sent in the API:

Actual email sent

Am I doing something wrong here? Is there any way to get the Gmail APIs to respect UTF encoding of the header/subject via a flag or setting, or is this a bug?

Lutestring answered 29/12, 2014 at 20:45 Comment(3)
I don't know the Gmail API specifically, but assuming you are using raw in developers.google.com/gmail/api/v1/reference/users/messages/…, and thus RFC 2822, Content-Type applies to the message content only, same in HTTP. The encoding in RFC2047 is what you want, and it looks like q-encoding might get you part-way there.Colchicine
Have you fixed this? I am running into the same problem and would appreciate help.Riboflavin
Hi @Devfly, I have fixed this. Check out the answer below, which gives a good idea of how to accomplish this. If you want to use ISO like given below follow that, but if you're using UTF, this is pseudo code for what I do: subject = '=?UTF-8?B?' + subject.toBase64() + '?='.Lutestring
S
4

By the RFC Standard, Email subject MUST be in US ASCII (7-bit).

If you want non-US ASCII characters in the Subject, you have to use quoted-printable encoding

So your

Subject: Нык ан мюндй конвынёры

must become

Subject: =?iso-8859-1?Q?=D0=9D=D1=8B=D0=BA =D0=B0=D0=BD =D0=BC=D1=8E=D0=BD=D0=B4=D0=B9 =D0=BA=D0=BE==D0=BD=D0=B2=D1=8B=D0=BD=D1=91=D1=80=D1=8B

Edit Updated in response to the comment:

RFC 822/RFC2822 (https://www.ietf.org/rfc/rfc0822.txt) Section 2.2 Header Fields says:

Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.

US-ASCII is referred to the original 7-bit ASCII encoding (0-127).

Sunwise answered 30/12, 2014 at 2:36 Comment(4)
Could you also post a link to the mentioned RFC section discussing this requirement?Alsworth
I don't think this is correct because it's not decoding correctly when sent. When I sent an email using quoted-printable encoding, the result is still unwanted; instead of being a weird encoding issue, it's now just the series of equals signs and ASCII characters.Lutestring
Updated the answer. You also need to add =?iso-8859-1?Q? in front of the encoded string, to instruct the mail client, that the Subject is encoded with Q-encodingSunwise
For those who are wondering. The specified =?iso-8859-1?Q? parameter specifies Q-encoding where you can also do =?utf-8?B? which specifies base64 encoding which feels more widely accepted in programming languages (op). You must also end the subject line with ?=Gamma
G
11

I ran into the same issue and I get the following information:Using UTF-8 charactors in an e-mail mail subject.

So I replace my subject with:=?utf-8?B?${convertToBase64(subject)}?=,it works well.

the ${} is an variable template, if you want to set Нык ан мюндй конвынёры as subject,it will seems like this:

=?utf-8?B?0J3Ri9C6INCw0L0g0LzRjtC90LTQuSDQutC-0L3QstGL0L3RkdGA0Ys?=

Gallican answered 27/12, 2016 at 4:23 Comment(0)
S
4

By the RFC Standard, Email subject MUST be in US ASCII (7-bit).

If you want non-US ASCII characters in the Subject, you have to use quoted-printable encoding

So your

Subject: Нык ан мюндй конвынёры

must become

Subject: =?iso-8859-1?Q?=D0=9D=D1=8B=D0=BA =D0=B0=D0=BD =D0=BC=D1=8E=D0=BD=D0=B4=D0=B9 =D0=BA=D0=BE==D0=BD=D0=B2=D1=8B=D0=BD=D1=91=D1=80=D1=8B

Edit Updated in response to the comment:

RFC 822/RFC2822 (https://www.ietf.org/rfc/rfc0822.txt) Section 2.2 Header Fields says:

Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.

US-ASCII is referred to the original 7-bit ASCII encoding (0-127).

Sunwise answered 30/12, 2014 at 2:36 Comment(4)
Could you also post a link to the mentioned RFC section discussing this requirement?Alsworth
I don't think this is correct because it's not decoding correctly when sent. When I sent an email using quoted-printable encoding, the result is still unwanted; instead of being a weird encoding issue, it's now just the series of equals signs and ASCII characters.Lutestring
Updated the answer. You also need to add =?iso-8859-1?Q? in front of the encoded string, to instruct the mail client, that the Subject is encoded with Q-encodingSunwise
For those who are wondering. The specified =?iso-8859-1?Q? parameter specifies Q-encoding where you can also do =?utf-8?B? which specifies base64 encoding which feels more widely accepted in programming languages (op). You must also end the subject line with ?=Gamma
R
2

If anyone around looking for NodeJs solution here is what I got working -

const makeEmailBody = (to, from, subject, message) => {
  // Value of subject is Unicode Characters along with Emoji signs like -
  // नमस्कार आपले स्वागत आहे 🟠🚀
  const encodedSubject = Buffer.from(subject).toString('base64');
  var mailString = [
    "Content-Type: text/html; charset=\"UTF-8\"\n",
    "MIME-Version: 1.0\n",
    "Content-Transfer-Encoding: 7bit\n",
    "bcc: ", to, "\n",
    "from: ", from, "\n",
    `Subject: =?UTF-8?B?${encodedSubject}?=\n\n`, // Working with Unicode characters
    message
  ].join('');
  var encodedMail = Buffer.from(mailString).toString('base64');
  return encodedMail;
}
Robbierobbin answered 11/2, 2022 at 18:1 Comment(0)
H
1

Tested the solution of @Oboo Chin and it's currently working.

For PHP you could use:

$subject = '=?utf-8?B?' . base64_encode( $subject ) . '?=';
Hargeisa answered 25/2, 2019 at 14:17 Comment(0)
L
1
static async makeBody(to, subject, message) {

    const str = ["Content-Type: text/plain; charset=\"UTF-8\"\n",
        "MIME-Version: 1.0\n",
        "Content-Transfer-Encoding: 7bit\n",
        "to: ", to, "\n",
        `Subject: =?UTF-8?B?${Buffer.from(subject).toString('base64')}?=\n\n`,
        message
    ].join('');

    return Buffer(str).toString("base64").replace(/\+/g, '-').replace(/\//g, '_');
}
Loos answered 27/12, 2022 at 8:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.