Decoding quoted-printable messages in Swift
Asked Answered
G

5

4

I have a quoted-printable string such as "The cost would be =C2=A31,000". How do I convert this to "The cost would be £1,000".

I'm just converting text manually at the moment and this doesn't cover all cases. I'm sure there is just one line of code that will help with this.

Here is my code:

func decodeUTF8(message: String) -> String
{
    var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)

    return newMessage
}

Thanks

Greenebaum answered 24/8, 2015 at 14:26 Comment(2)
This isn't a complete solution, but I'd just like to make sure you've seen this answer to a slightly different problem: https://mcmap.net/q/135161/-base64-decoding-in-ios-7Teller
Base 64 encoding I'm good with, it's the text/plain; quoted-printable that I'm having a problem with. ThanksGreenebaum
P
7

An easy way would be to utilize the (NS)String method stringByRemovingPercentEncoding for this purpose. This was observed in decoding quoted-printables, so the first solution is mainly a translation of the answers in that thread to Swift.

The idea is to replace the quoted-printable "=NN" encoding by the percent encoding "%NN" and then use the existing method to remove the percent encoding.

Continuation lines are handled separately. Also, percent characters in the input string must be encoded first, otherwise they would be treated as the leading character in a percent encoding.

func decodeQuotedPrintable(message : String) -> String? {
    return message
        .stringByReplacingOccurrencesOfString("=\r\n", withString: "")
        .stringByReplacingOccurrencesOfString("=\n", withString: "")
        .stringByReplacingOccurrencesOfString("%", withString: "%25")
        .stringByReplacingOccurrencesOfString("=", withString: "%")
        .stringByRemovingPercentEncoding
}

The function returns an optional string which is nil for invalid input. Invalid input can be:

  • A "=" character which is not followed by two hexadecimal digits, e.g. "=XX".
  • A "=NN" sequence which does not decode to a valid UTF-8 sequence, e.g. "=E2=64".

Examples:

if let decoded = decodeQuotedPrintable("=C2=A31,000") {
    print(decoded) // £1,000
}

if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
    print(decoded) // “Hello … world!”
}

Update 1: The above code assumes that the message uses the UTF-8 encoding for quoting non-ASCII characters, as in most of your examples: C2 A3 is the UTF-8 encoding for "£", E2 80 A4 is the UTF-8 encoding for .

If the input is "Rub=E9n" then the message is using the Windows-1252 encoding. To decode that correctly, you have to replace

.stringByRemovingPercentEncoding

by

.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)

There are also ways to detect the encoding from a "Content-Type" header field, compare e.g. https://mcmap.net/q/1329943/-why-my-return-is-nil-but-if-i-press-the-url-in-chrome-safari-i-can-get-data.


Update 2: The stringByReplacingPercentEscapesUsingEncoding method is marked as deprecated, so the above code will always generate a compiler warning. Unfortunately, it seems that no alternative method has been provided by Apple.

So here is a new, completely self-contained decoding method which does not cause any compiler warning. This time I have written it as an extension method for String. Explaining comments are in the code.

extension String {

    /// Returns a new string made by removing in the `String` all "soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding. 
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences. 
        return self
            .stringByReplacingOccurrencesOfString("=\r\n", withString: "")
            .stringByReplacingOccurrencesOfString("=\n", withString: "")
            .decodeQuotedPrintableSequences(enc)
    }

    /// Helper function doing the real work.
    /// Decode all "=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {

        var result = ""
        var position = startIndex

        // Find the next "=" and copy characters preceding it to the result:
        while let range = rangeOfString("=", range: position ..< endIndex) {
            result.appendContentsOf(self[position ..< range.startIndex])
            position = range.startIndex

            // Decode one or more successive "=HH" sequences to a byte array:
            let bytes = NSMutableData()
            repeat {
                let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
                if hexCode.characters.count < 2 {
                    return nil // Incomplete hex code
                }
                guard var byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.appendBytes(&byte, length: 1)
                position = position.advancedBy(3)
            } while position != endIndex && self[position] == "="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.appendContentsOf(dec)
        }

        // Copy remaining characters to the result:
        result.appendContentsOf(self[position ..< endIndex])

        return result
    }
}

Example usage:

if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // £1,000
}

if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // “Hello … world!”
}

if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
    print(decoded) // Rubén
}

Update for Swift 4 (and later):

extension String {

    /// Returns a new string made by removing in the `String` all "soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding.
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences.
        return self
            .replacingOccurrences(of: "=\r\n", with: "")
            .replacingOccurrences(of: "=\n", with: "")
            .decodeQuotedPrintableSequences(encoding: enc)
    }

    /// Helper function doing the real work.
    /// Decode all "=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {

        var result = ""
        var position = startIndex

        // Find the next "=" and copy characters preceding it to the result:
        while let range = range(of: "=", range: position..<endIndex) {
            result.append(contentsOf: self[position ..< range.lowerBound])
            position = range.lowerBound

            // Decode one or more successive "=HH" sequences to a byte array:
            var bytes = Data()
            repeat {
                let hexCode = self[position...].dropFirst().prefix(2)
                if hexCode.count < 2 {
                    return nil // Incomplete hex code
                }
                guard let byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.append(byte)
                position = index(position, offsetBy: 3)
            } while position != endIndex && self[position] == "="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.append(contentsOf: dec)
        }

        // Copy remaining characters to the result:
        result.append(contentsOf: self[position ..< endIndex])

        return result
    }
}

Example usage:

if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // £1,000
}

if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // “Hello … world!”
}

if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
    print(decoded) // Rubén
}
Puca answered 28/9, 2015 at 16:22 Comment(5)
This is the kind of thing that I was looking at. I put it in my code to try it and was instantly hit with a problem.Greenebaum
Sorry hit return too early... decodeQuotedPrintable("Rub=E9n") should return Rubén. I have tried this on motobit.com/util/quoted-printable-decoder.asp and this site decoded it OK. Any thoughts?Greenebaum
@iphaaw: It depends on the encoding (or character set) which is used in the message. That online decoder seems to detect the encoding automatically, perhaps by trying different encodings. I have added some information to the answer, let me know if that helps.Puca
Thank you that works but the compiler complains in 10.11 that stringByReplacingPercentEscapesUsingEncoding is deprecated. The docs rather unhelpfully don't suggest a replacement :-(Greenebaum
Thank you very much for you in-depth answer. That's a very elegant solution and I hope if helps many other people too. Your bounty was well and truly deserved. AndrewGreenebaum
R
1

This encoding is called 'quoted-printable', and what you need to do is convert string to NSData using ASCII encoding, then just iterate over the data replacing all 3-symbol parties like '=A3' with the byte/char 0xA3, and then converting the resulting data to string using NSUTF8StringEncoding.

Rodas answered 26/9, 2015 at 11:40 Comment(2)
That would sort of work but from my example you can see I'm getting two byte characters sometimes. I would have thought there would be a single line method I could call to do this more efficiently. BTW Thanks for pointing out the correct name of the encoding. ThanksGreenebaum
You are getting 2 bytes for single character because in UTF-8 encoding it takes two bytes. Only english letter/digits/commas and so on are encoded as one byte.Rodas
C
1

Unfortunately, I'm a bit late with my answer. It might be helpful for the others though.

var string = "The cost would be =C2=A31,000"

var finalString: String? = nil

if let regEx = try? NSRegularExpression(pattern: "={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
    let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate: "%$1")
    print(intermediatePercentEscapedString)
    finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
    print(finalString)
}
Camala answered 2/10, 2015 at 8:16 Comment(0)
Q
0

In order to give an applicable solution, a few more information is required. So, I will make some assumptions.

In an HTML or Mail message for example, you can apply one or more encodings to some kind of source data. For example, you could encode a binary file e.g. an png file with base64 and then zip it. The order is important.

In your example as you say, the source data is a String and has been encoded via UTF-8.

In a HTPP message, your Content-Type is thus text/plain; charset = UTF-8. In your example there seems also an additional encoding applied, a "Content-Transfer-Encoding": possibly Content-transfer-encoding is quoted-printable or base64 (not sure about that, though).

In order to revert it back, you would need to apply the corresponding decodings in reverse order.

Hint:

You can view the headers (Contente-type and Content-Transfer-Encoding) of a mail message when viewing the raw source of the mail.

Qualified answered 26/9, 2015 at 11:50 Comment(1)
Base 64 encoding I'm good with, it's the text/plain; quoted-printable that I'm having a problem with. ThanksGreenebaum
N
0

You can also look at this working solution - https://github.com/dunkelstern/QuotedPrintable

let result = QuotedPrintable.decode(string: quoted)
Northwestwards answered 5/7, 2016 at 21:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.