How to read the Content Type header and convert into utf-8 while Gmail IMAP has utf8 and Outlook has ISO-8859-7?
Asked Answered
C

5

10

So I get emails using imap from gmail and outlook.

Gmail encodes like this =?UTF-8?B?UmU6IM69zq3OvyDOtc68zrHOuc67IG5ldyBlbWFpbA==?= and outlook encodes like this =?iso-8859-7?B?UmU6IOXr6+ft6er8IHN1YmplY3Q=?=

Unfortunately I did not find yet any solution that will help me make this into readable text. Instead I am messing with:

mb_convert_encoding($body, "UTF-8", "UTF-8"); 

and

mb_convert_encoding($body, "UTF-8", "iso-8859-7");

but I am struggling to find a solution to solve this matter.

This is how I open the IMAP of my account (which has a lot of gmail and outlook messages)

$hostname = '{imappro.zoho.com:993/imap/ssl}INBOX';
$username = '[email protected]';
$password = 'password';


/* try to connect */
$inbox = imap_open($hostname,$username ,$password) or die('Cannot connect to Zoho: ' . imap_last_error());

/* grab emails */
$emails = imap_search($inbox,'UNSEEN');

Any help?

Comment answered 29/7, 2017 at 18:17 Comment(3)
Those aren't body encodings, those are header encoding. You'll need to read the Content Type header or parse the structure response.Dessau
@Dessau can you please suggest me a guide or something?Comment
Please take a look at the edit and let me know if that solved your problem.Masbate
M
5

Unfortunately I did not find yet any solution that will help me make this into readable text.

Solution Your strings are base64 encoded.

=?UTF-8?B?UmU6IM69zq3OvyDOtc68zrHOuc67IG5ldyBlbWFpbA==?=

echo base64_decode('UmU6IM69zq3OvyDOtc68zrHOuc67IG5ldyBlbWFpbA==');

prints "Re: νέο εμαιλ new email"

=?iso-8859-7?B?UmU6IOXr6+ft6er8IHN1YmplY3Q=?=

echo base64_decode('UmU6IOXr6+ft6er8IHN1YmplY3Q=');

prints out "Re: subject"

The answer is to use base64_decode in conjunction with your current solutions.

The way to identify base64 encoded text is that it's depicted as letters a-z, A-Z, numbers 0-9 along with two other characters (usually + and /) and it's usually right padded with =.

EDIT:

Sorry, I was already forgetting that the question was to convert from iso-8859-7 to UTF-8 and have it visible.

<?php
$str = base64_decode('UmU6IPP03evt+SDs3u317OE=');
$str = mb_convert_encoding($str,'UTF-8','iso-8859-7');
echo $str;
?>

The result is "Re: στέλνω μήνυμα"

Masbate answered 5/8, 2017 at 17:51 Comment(6)
But how about this? echo base64_decode('ZP3OUC66Z4ZOU86XZ4IGZP3OUC66ZR/OU86XZPDOTM63Z4I='); It returns d��P.�g�NSΗg�d��P.�e�SΗd��Lηg�Comment
Not all data is text, and not all text data is single byte per character. Is that the complete string in one of the examples you've come across, or just an excerpt? Also, what what the character encoding that preceded it in your response?Masbate
This a similar example: =?iso-8859-7?B?UmU6IPP03evt+SDs3u317OE=?=Comment
If you are looking at it in your browser you'll need to set the charset header like header('Content-Type: text/html; charset=iso-8859-7');Masbate
I had made a mistaken note about it being arabic, but really its greek I think.Masbate
Somehow I thought that since you already knew to use mb_convert_encoding that base64 was all you still needed to know.Masbate
S
2

look here

   /* connect to gmail */
    $hostname = '{imap.gmail.com:993/imap/ssl}INBOX';
    $username = '[email protected]';
    $password = 'davidwalsh';

    /* try to connect */
    $inbox = imap_open($hostname,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());

    /* grab emails */
    $emails = imap_search($inbox,'ALL');

    /* if emails are returned, cycle through each... */
    if($emails) {

        /* begin output var */
        $output = '';

        /* put the newest emails on top */
        rsort($emails);

        /* for every email... */
        foreach($emails as $email_number) {

            /* get information specific to this email */
            $overview = imap_fetch_overview($inbox,$email_number,0);
            $message = imap_fetchbody($inbox,$email_number,2);

            /* output the email header information */
            $output.= '<div class="toggler '.($overview[0]->seen ? 'read' : 'unread').'">';
            $output.= '<span class="subject">'.$overview[0]->subject.'</span> ';
            $output.= '<span class="from">'.$overview[0]->from.'</span>';
            $output.= '<span class="date">on '.$overview[0]->date.'</span>';
            $output.= '</div>';

            /* output the email body */
            $output.= '<div class="body">'.$message.'</div>';
        }

        echo $output;
    } 

    /* close the connection */
    imap_close($inbox);

for reading and decoding look here

<?php
$hostname = '{********:993/imap/ssl}INBOX';
$username = '*********';
$password = '******';

$inbox = imap_open($hostname,$username,$password) or die('Cannot connect to server: ' . imap_last_error());

$emails = imap_search($inbox,'ALL');

if($emails) {
    $output = '';
    rsort($emails);

    foreach($emails as $email_number) {
        $overview = imap_fetch_overview($inbox,$email_number,0);
        $structure = imap_fetchstructure($inbox, $email_number);

        if(isset($structure->parts) && is_array($structure->parts) && isset($structure->parts[1])) {
            $part = $structure->parts[1];
            $message = imap_fetchbody($inbox,$email_number,2);

            if($part->encoding == 3) {
                $message = imap_base64($message);
            } else if($part->encoding == 1) {
                $message = imap_8bit($message);
            } else {
                $message = imap_qprint($message);
            }
        }

        $output.= '<div class="toggle'.($overview[0]->seen ? 'read' : 'unread').'">';
        $output.= '<span class="from">From: '.utf8_decode(imap_utf8($overview[0]->from)).'</span>';
        $output.= '<span class="date">on '.utf8_decode(imap_utf8($overview[0]->date)).'</span>';
        $output.= '<br /><span class="subject">Subject('.$part->encoding.'): '.utf8_decode(imap_utf8($overview[0]->subject)).'</span> ';
        $output.= '</div>';

        $output.= '<div class="body">'.$message.'</div><hr />';
    }

    echo $output;
}

imap_close($inbox);
?>

Look here for great tutorial on email structure, and function to extract it.

Superincumbent answered 1/8, 2017 at 13:22 Comment(0)
B
0

If you want to decode header elements, there is a PHP function for that: imap_mime_header_decode().

Also, you will need some MIME parser class to decode multipart messages.

Bombe answered 1/8, 2017 at 13:7 Comment(0)
C
0

To get the headers, you would pass your stream ($inbox) to imap_headers(). There are lots of values you can get in the response, full list: imap_headerinfo

For the actual messages, plain text can be read using imap_body(), passing the stream and the number of the message you want (in $emails after your search). Getting an html/multipart email is a bit trickier. First you need imap_fetchstructure(), which identifies the parts of the message, then imap_fetchbody() to get the piece you are interested in.

Once you have a result from imap_fetchbody(), if you still need to adjust the encoding, it could be done at this point.

City answered 1/8, 2017 at 13:18 Comment(0)
A
0

I had a task to receive letters from a certain mailbox, parse them and index certain content.

I wanted to have some microservice that would provide me with the data.

  1. Downloading the required content
  2. Convert the received data into a readable format
  3. process the content

So I decided to use ready-made tools.

  1. script for getting emails - imap2maildir
  2. Unix client for processing messages mu
  3. dos2unix converter

Next, I wrote a small bash script that I placed in cron

#!/bin/bash
python /var/mail_dump/imap2maildir/imap2maildir -c /var/mail_dump/imap2maildir/deploy.conf
mu index --maildir=/var/mail_dump/dumps/new
#clean old data
rm -rf /var/mail_dump/extract/*

#search match messages
mu find jivo --fields="l" --nocolor | xargs $1 cp -t /var/mail_dump/extract
#converting
dos2unix -f /var/mail_dump/extract/*

#reassembly of messages in html
cd /var/mail_dump/extract/
for i in /var/mail_dump/extract/*
do
  mu extract --parts=0 --overwrite "$i"
  rm "$i"
done

Complete ! I got a service that constantly receives emails and prepares them for processing. php work with the prepared data without thinking about the implementation of low-level logic.

Anuska answered 8/8, 2017 at 11:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.