IMAP folder path encoding (IMAP UTF-7) for Python
Asked Answered
W

4

15

I would like to know if any "official" function/library existed in Python for IMAP4 UTF-7 folder path encoding.

In the imapInstance.list() I get the following path IMAP UTF-7 encoded :

'(\\HasNoChildren) "." "[Mails].Test&AOk-"',

If I do the following encoding :

(u"[Mails].Testé").encode('utf-7')

I get :

'[Mails].Test+AOk-'

Which is UTF-7 but not IMAP UTF-7 encoded. Test+AOk- instead of Test&AOk- I'd need an official function or library to get the IMAP UTF-7 encoded version.

Watford answered 8/10, 2012 at 6:54 Comment(2)
The modified UTF-7 codec is described in RFC2060Millford
There is an issue for this: bugs.python.org/issue22598 ...unfortunately, it's currently still open.Interaction
P
11

The IMAPClient package has functionality for encoding and decoding using IMAP's modified UTF-7. Have a look in the IMAPClient.imap_utf7 module. This module could be used standalone or you could just use IMAPClient which handles doing the encoding and decoding of folder names transparently.

The project's home page is: https://github.com/mjs/imapclient

Example code:

from imapclient import imap_utf7
decoded = imap_utf7.decode(b'&BdAF6QXkBdQ-')

Disclaimer: I'm the original author of the IMAPClient package.

Poetess answered 9/10, 2012 at 16:1 Comment(2)
That example doesn't work as written in Python 3. Seems like imap_utf7.decode() is expecting bytes. So imap_utf7.decode(b'&BdAF6QXkBdQ-') does work.Intermediary
Fixed! This post was written when Python 2 usage was much more common than Python 3 :) I've also added a disclaimer.Poetess
N
11

I wrote a very simple IMAP UTF7 python 3 implementation which follows the specification, and it seems to work. ("foo\rbar\n\n\n\r\r" and many other roundtrips, '&BdAF6QXkBdQ-', 'Test&Co', "[Mails].Test&AOk-" and '~peter/mail/&ZeVnLIqe-/&U,BTFw-' behave as expected).

#works with python 3

import base64

def b64padanddecode(b):
    """Decode unpadded base64 data"""
    b+=(-len(b)%4)*'=' #base64 padding (if adds '===', no valid padding anyway)
    return base64.b64decode(b,altchars='+,',validate=True).decode('utf-16-be')

def imaputf7decode(s):
    """Decode a string encoded according to RFC2060 aka IMAP UTF7.

Minimal validation of input, only works with trusted data"""
    lst=s.split('&')
    out=lst[0]
    for e in lst[1:]:
        u,a=e.split('-',1) #u: utf16 between & and 1st -, a: ASCII chars folowing it
        if u=='' : out+='&'
        else: out+=b64padanddecode(u)
        out+=a
    return out

def imaputf7encode(s):
    """"Encode a string into RFC2060 aka IMAP UTF7"""
    s=s.replace('&','&-')
    iters=iter(s)
    unipart=out=''
    for c in s:
        if 0x20<=ord(c)<=0x7f :
            if unipart!='' : 
                out+='&'+base64.b64encode(unipart.encode('utf-16-be')).decode('ascii').rstrip('=')+'-'
                unipart=''
            out+=c
        else : unipart+=c
    if unipart!='' : 
        out+='&'+base64.b64encode(unipart.encode('utf-16-be')).decode('ascii').rstrip('=')+'-'
    return out    

Given the simplicity of this code, I set it in the public domain, so feel free to use it as you want.

Nehemiah answered 20/8, 2017 at 22:0 Comment(1)
There's a variant of this approach in github.com/cpackham/imapdu/pull/3Nathanson
C
6

The imapclient implementation is kind of broken though:

x = "foo\rbar\n\n\n\r\r"
imap_utf7.decode(imap_utf7.encode(x))

Result:

>> 'foo&bar\n\n\r-'

Edit:

After some research I found an implementation in MailPile which does not fail at roundtrip encoding on this test. I also ported it to Python3 if you're interested: https://github.com/MarechJ/py3_imap_utf7

Conch answered 17/8, 2015 at 14:37 Comment(1)
If seems that issue was resolved, at least in version 0.13: ``` In [4]: x = "foo\rbar\n\n\n\r\r" In [5]: imap_utf7.decode(imap_utf7.encode(x)) Out[5]: u'foo\rbar\n\n\n\r\r' ```Tightrope
V
4

You may use imap_tools package: https://pypi.org/project/imap-tools/

from imap_tools.imap_utf7 import encode, decode

print(encode('привет'))
>>> b'&BD8EQAQ4BDIENQRC-'

print(decode(b'&BD8EQAQ4BDIENQRC-'))
>>> привет

print(repr(decode(encode("foo\rbar\n\n\n\r\r"))))
'foo\rbar\n\n\n\r\r'

*I am lib author

Vacillating answered 9/10, 2019 at 12:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.