PHP: Gmail's messages contain invalid HTML and random jargon
Asked Answered
F

1

7

I'm creating an email-based CMS with PHP, and I'm required to use Gmail as the email service. The script is insanely simple for now, and the only problem I'm having is dealing with Gmail's email syntax.

I was expecting something a bit more manageable, like this, when getting an email:

<u>asfasfasf</u> <u style="font-style: italic;">asdfaf</u> <ustyle="font-style: italic; font-weight: bold;">asfsaf</u> asfasf <a href="http://asfasfafs">asfasf</a>
<br />
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent sodales mauris quis nisl pellentesque eleifend. Sed convallis turpis quis turpis malesuada feugiat. Fusce sed metus non orci convallis congue. Integer egestas vulputate ipsum, sed fringilla velit elementum scelerisque. Pellentesque convallis metus sit amet enim faucibus adipiscing.

But I'm getting this instead (duck and cover):

<u>asfasfasf </u><u style=3D"font-style: italic; ">asdfaf =A0</u><u style=
=3D"font-style: italic; font-weight: bold; ">asfsaf </u>asfasf <a href=3D"h=
ttp://asfasfafs">asfasf</a><div><br></div><div><meta http-equiv=3D"content-=
type" content=3D"text/html; charset=3Dutf-8"><span class=3D"Apple-style-spa=
n" style=3D"font-family: Arial, Helvetica, sans; font-size: 11px; "><p styl=
e=3D"text-align: justify; font-size: 11px; line-height: 14px; margin-top: 0=
px; margin-right: 0px; margin-bottom: 14px; margin-left: 0px; padding-top: =
0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent sodales m=
auris quis nisl pellentesque eleifend. Sed convallis turpis quis turpis mal=
esuada feugiat. Fusce sed metus non orci convallis congue. Integer egestas =
vulputate ipsum, sed fringilla velit elementum scelerisque. Pellentesque co=
nvallis metus sit amet enim faucibus adipiscing.</p>
</span>

I tried Tidy, but it can't deal with Gmail's links and 'line breaks'. The breaks are just = at the end, which completely mess up Tidy, and the links are sometimes (at random, I think) like this: <a href=3D"http://asfasfafs">asfasf</a>, with those =\n right in the middle!

How would I train Tidy to deal with this sort of blasphemous HTML and output something I can pipe directly into a <div> inside of a website?

Thanks!

Flashover answered 6/12, 2010 at 18:50 Comment(0)
A
10

That looks like quoted-printable encoding. You should be checking the "Content-Transfer-Encoding:" header line of the message to see if there's any encoding present (such as base-64 or quoted-printable) and removing the encoding before trying to parse the content.

Aspergillum answered 6/12, 2010 at 18:54 Comment(1)
Thank you! PHP does have a function for this (quoted_printable_decode()). I'm playing with it right now, and I'm getting valid HTML!Flashover

© 2022 - 2024 — McMap. All rights reserved.