PHP - Filtering Email Body, Removing Reply Quotes
Asked Answered
M

1

11

I'm working on an email piping script that needs to save just the reply content and not the original quoted email. I'm using a mime parser class (http://www.phpclasses.org/package/3169-PHP-Decode-MIME-e-mail-messages.html) to get all the information that I need from the email:

Message ID: [email protected]
Reply ID: [email protected]

Subject: Re: MessageX
To:  [email protected]
From: Someone [email protected]

Body: Hello,
Blah Blah Blah
-Someone

On Wed, Mar 16, 2011 at 3:52 PM,  <[email protected]> wrote:
> Hello,
>
> Some other blah, blah, blah.
>
> Thank you,
> Me

In the body section, I'm getting the original quoted email. How can I filter this out? I know email clients often add ">" next to quoted content, but I'm not sure if this would be good enough. Thanks for your help.

Mckenzie answered 16/3, 2011 at 22:0 Comment(1)
It sounds a little like you are doing some sort of customer support type of email reply into a system thing. I've often seen something like a string of "=============REPLY ABOVE THIS LINE==================" in the original email to the "customer" which can then easily be found and will cut out all of the reply quotes. This obviously may not be what you are trying to do at all, but it might also be a valid option for you.Vallee
O
11

This might be doable with a regular expression. Try:

$text = preg_replace('#(^\w.+:\n)?(^>.*(\n|$))+#mi', "", $text);
Oversew answered 16/3, 2011 at 22:12 Comment(6)
Thanks for the quick answer, it's working well so far. Are there any cases where this might not work?Mckenzie
There are certainly email clients which allow or use > verbatim in mails. Or if you email source code or diffs etc. To make it a bit more resilient, you could exchange the last + for {2,} in the regex. With that it will only match at least two consecutive > lines, which is a sure sign that it's a quoted part.Oversew
There wont ever be a perfect solution to this issue. Its possible (although unlikely) to have content in the email that appears to be the replied-to section. Consider lines 10 to 13 of this snippet (I modified your original example). In all likelihood, those lines would end up being removed.Nyx
Alright, I think it's working pretty well. One question: what do the # symbols do in the regex? I couldn't find any information on it. Thanks a lot for your help.Mckenzie
@davishmcclurg, The # are just alternative regex delimiters. Using #...*# is equivalent to /...*/Oversew
@Oversew This works perfectly, but can you show more reg ex for different and popular email clients? like ... yahoo, rediff, outlook etc..Fungi

© 2022 - 2024 — McMap. All rights reserved.