base64 encoded string gets truncated through fgets call while parsing IMAP
Asked Answered
E

5

6

I am parsing emails with Zend_Mail, and strangely some content gets truncated without an obvious reason and malforms the email parts.

For example

Content-Disposition: attachment; filename="file.sdv"

DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
LS0tOy0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0t
ICANCiAgICAgICAgIDA7MjAxMC4wOS4wODsyMDEwLjA5LjA4O05vcnNrO0dhcm4gICAgICAgICAg
ICAgICAgOyAgICAgIDEwMjE7RkVSU0sgICAgIDsgICAgICAgMjEwOyAgIDQwMjA5OTk7ICAgICAg
ICAyMDtFZ2Vub3ZlcnQ7ICAgICAgICAgIDsgICAzMDcyLDE2OyAgICAgICAyMTE7ICAgICAyNTMs
MiAgDQogICAgICAgICAwOzIwMTAuMDkuMDg7MjAxMC4wOS4wODtOb3JzaztHYXJuICAgICAgICAg

Gets truncated to

Content-Disposition: attachment; filename="file.sdv"

DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
LS

a var_dump on each line shows this.

string(78) "DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
"
string(78) "ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
"
string(78) "RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
"
string(78) "IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
"
string(78) "LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
"
string(5) "LS)
"
string(17) "TAG5 OK Success
"    

or in other email at

DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
LS0tOy0tLS0tLS0tLTstLS0tLS0tLS0tO

I cannot figure out why is stopping there. The transmitions should have stoped at the end of the line only. This is the line that gets the string from the IMAP Server.

$line = @fgets($this->_socket);

The encoded text contains a string like, but again this is truncated in various parts in different emails.

----------;----------;----------;-----;--------------------;----------;----------;--

I've tried to add a size to fgets() but to no results. I also enabled/disabled "auto_detect_line_endings" php_ini setting, again to no result.

I've also opened a bug report with ZF although the error does not seem to be in the library.

Do you see anything strange with this encoded string?

UPDATE

New research shows that the emails get truncated after 584 chars. Still don't know why. Sent a question to google as well. See here.

A Bad email headers :

Delivered-To: [email protected]
Received: by 10.216.3.208 with SMTP id 58cs248812weh;
    Fri, 20 Nov 2009 05:14:14 -0800 (PST)
Received: by 10.204.153.217 with SMTP id l25mr1285471bkw.108.1258722853863;
    Fri, 20 Nov 2009 05:14:13 -0800 (PST)
Return-Path: <>
Received: from MTX4.mbn1.net (mtx4.mbn1.net [213.188.129.252])
    by mx.google.com with SMTP id 2si1800716bwz.60.2009.11.20.05.14.12;
    Fri, 20 Nov 2009 05:14:13 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of MTX4.mbn1.net designates         213.188.129.252 as permitted sender) client-ip=213.188.129.252;
Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of MTX4.mbn1.net designates 213.188.129.252 as permitted sender) smtp.mail=
Resent-From: <[email protected]>
Content-Type: multipart/mixed; boundary="===============1703099044=="
MIME-Version: 1.0
From: <[email protected]>
To: <[email protected]>
CC:
Subject: some subject
Message-ID: <[email protected]>
X-OriginalArrivalTime: 20 Nov 2009 13:14:08.0121 (UTC) FILETIME=[5792C690:01CA69E3]
Date: Fri, 20 Nov 2009 14:14:08 +0100
X-STA-Metric: 0 (engine=030)
X-STA-NotSpam: tlf: vedlagt skip:__ 40 fil cc:2**0
X-STA-Spam: header:MIME-Version: charset:us-ascii header:Subject:1 to:2**0 header:From:1
X-BTI-AntiSpam: score:0,sta:0/030,dnsbl:passed,sw:off,bsn:38/passed,spf:off,bsctr:passed/1,dk:off,pbmf:none,ipr:0/3,trusted:no,ts:no,bs:no,ubl:passed
X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply
Resent-Message-Id: <[email protected]>
Resent-Date: Fri, 20 Nov 2009 14:14:11 +0100 (CET)

--===============1703099044==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="file.sdv"

DQpHUlVQUEVOQVZOICAgICAgICAgIDtLSthQRTtQUk9EQU5MO1BBS0tFTlI7TU9UVEFLTkFWTiAg
ICAgICAgICAgICAgICAgICAgO1NPTjtMQU5ESU5HU0RBO1NBTEdTREFUTyA7TkFTSiA7UkVEU0tB
UCAgIDtGSVNLRVNMQUcgO1BSRVNFUlYgICA7VElMU1RBTkQ7U1TYUlJFTFM7S1ZBTElURVQ7TUlO
U1RFUFJJUzsgICAgICAgIFZFUkRJOyAgICAgS1ZBTlRVTTsgICAgUlVORFZFS1QgICAgDQotLS0t
LS0tLS0tLS0tLS0tLS0tLTstLS0tLTstLS0tLS0tOy0tLS0tLS07LS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tOy0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLTst
LS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS07LS0tLS0tLS07LS0tLS0tLS07LS0tLS0tLS0t
LTstLS0tLS0tLS0tLS0tOy0tLS0tLS0tLS0tLTstLS0tLS0tLS0tLS0gICAgDQpMb3JlbnR6ZW4g
....

For those interested in an answer and not in the (ex) bounty, more clues.

Gmail is returning a short value in response to RFC822.SIZE, which can lead to truncated messages. (They are off by one byte for each header line, apparently not counting two characters for CR/LF.)

Erlineerlinna answered 16/3, 2011 at 7:27 Comment(4)
Why do you want to read it line-wise anyway?Gangrel
@Gangrel I don't read it, Zend Framework does it in Zend_Mail_Protocol_Imap. They are not using the imap functions and treat the imap as a stream, probably to avoid a library dependency.Erlineerlinna
You could remove the @ error suppression operator for testing purposes. Maybe a more conclusive error notice shows up.Gangrel
I know, that was the first thing I've done :), but there is no actual php error. There is something wrong with the string I suspect.Erlineerlinna
A
5

I think you're looking in the wrong place.

The imap server gives you the mail message truncated, and then returns its status line TAG5 OK Success.

I don't see how your (/php's) handling of the socket would make a few kb worth of stream disappear, to magically fix the stream right before this status line.

So either the message is truncated by itself (have you verified the message contents through some other way?) or the imap server is just broken.

The first things I would do, are:

  • find a sufficiently silent environment to put your project, where you can strace -f -s 10240 -p <pid> apache's process to verify the socket interaction (assuming a linux/apache environment)
  • and/or: use tcpdump, ethereal or equivalent to check what's coming in on the line

My guess is that you will see the exact same truncated strings coming in on the wire. Meaning you can shift your focus to the imap server.

Reassuring yourself that you're looking in the right place can save a lot of time.

Antonetteantoni answered 25/3, 2011 at 21:17 Comment(2)
Didn't I asked the same? Why did you get the bounty?Avocado
the bounty was awarded by default for the highest voted answerErlineerlinna
P
2

1: try removing the @ for more verbosity

2: try using http://www.php.net/manual/en/function.fread.php instead of fgets

This might have something to do with the IMAP server, because i see TAG5 OK Success as a response, even if its not supposed to be there.

Petronia answered 25/3, 2011 at 12:40 Comment(1)
tried both and the same. fgets is binary safe as well. I guess the same thing about the server, but what can I do to prevent it? I cannot debug gmail... this is the 300 points question.Erlineerlinna
B
0

Have you tried issuing another fgets and see if you get the rest of the data? You may be retrieving a multi-part email which would require multiple requests.

But regardless, you are using functions designed for file access on a network. Usually this works fine, but depending on the network, issues can arise. For example, you can use file_get_contents to retrieve a web page. But if the issue issues a redirect, then it fails. But using curl will be much more successfully.

If you truly want to read the network socket, you should try socket_read. That is designed with the network in mind, like curl.

Bash answered 20/3, 2011 at 0:51 Comment(1)
one fgets is executed on each line in the _nextLine function. framework.zend.com/svn/framework/standard/trunk/library/Zend/… As I said this is not a framework error, most of the time emails are passing.Erlineerlinna
F
0

Do not know Zend and forgot all about PHP but played with MIME and HTTP before (C++).

I suggest you start looking at finding way to add a Content-Length header entry. It gives a hint to the "message decoder/loader" to expect a certain size in the content (message payload). (Not sure if IMAP does that)

In the code above I would try to convince fgets to read a specific amount of expected data from the network. It could be that the data is buffered or not yet sent over the network (async communication) and fgets only reads an internal buffer thus stopping before the whole message was read.

  • To see if this is the case, send a small message that falls under your "584 breaking point".
  • Do some network tracing the see if all the data actually flows. (You would probably need to do some local setup)

The code you are referring to is here?

Fled answered 20/3, 2011 at 1:21 Comment(11)
Is there not limit on the size of memory a php session can use? I guess an error would have been printed.. right?Fled
yes this is the code. see _nextLine() function. fgets with length parameter made no difference, it was the first thing to try. is it quite difficult to do a a local setup simulating correctly gmail. Besides it happens randomly. I will check Monday the content-length header and let you know. I did not check on this particular header, because the headers were identical for a good and bad email, in terms of what headers were present.Erlineerlinna
Hmmm... if we are only talking IMAP here the header might not help. I did not try to analyze the code yet to see if this implemention really depends on the header. One suggestion is to try and get a local IMAP server and then then do some comparisons. My idea for checking the header is to form a loop that reads until the expected size is read or you reach a timeout.. to help "clearing the network buffer".Fled
there is no content-length header. I've updated the post with the headersErlineerlinna
Ok, message seems fine. multipart/mixed; boundary= says the message parts should be parsed by this boundary.. so there is no reason not to complete. You are not running via a proxy? Did network trace show a socket disconnect?Fled
the message is truncated as you can see in the var_dump part. the server where this script runs is not behind a proxy/ I cannot/know how to do a network traceErlineerlinna
If you are not on SSL you can use Wireshark to trace on port 143. See wiki.wireshark.org/IMAP. The best is to have a local IMAP to test but some like google forces you to go SSL. Perhaps even with SSL you can see if the socket disconnects or expects more data on the wire.Fled
Another idea: Increase TIMEOUT_CONNECTIONFled
Another idea: Give PHP a bit more memory in php.ini. Does the size of the message have an impact?Fled
I will check tomorrow the timeout thing, I think is 60 second for a default http zend client, but I dont know what is using internally. As for the size I have 1 GB and is only 1 email that is parsing... should be a breeze and it should trigger other kind of errors.Erlineerlinna
There is an timeout connection of 30 seconds, but this is not the cause.Erlineerlinna
A
0

Most likely one of your server hardware is compromised and thus you want to change it completely or just change the RAM modules or Disk-Drives. I've some experience with Web-and-Mail based encoding and I can confirm you that base64 encoded string is very secure. At least it uses a texture mapping algorithm.

Avocado answered 25/3, 2011 at 10:38 Comment(4)
my server hardware? meaning Gmail servers?Erlineerlinna
You lost me but I use Gmail and IMAP all the time and I have no problem whatsoever.Avocado
Just a question: is file.sdv extension allowed with googlemail? If so can you be sure that the e-mails are not spam?Avocado
sdv is a csv type of file, semicolon separated, and is allowed and I receive it most of the time. 85% more or less. is not an attachment thing.Erlineerlinna

© 2022 - 2024 — McMap. All rights reserved.