An easy way would be to utilize the (NS)String
method
stringByRemovingPercentEncoding
for this purpose.
This was observed in
decoding quoted-printables,
so the first solution is mainly a translation of the answers in
that thread to Swift.
The idea is to replace the quoted-printable "=NN" encoding by the
percent encoding "%NN" and then use the existing method to remove
the percent encoding.
Continuation lines are handled separately.
Also, percent characters in the input string must be encoded first,
otherwise they would be treated as the leading character in a percent
encoding.
func decodeQuotedPrintable(message : String) -> String? {
return message
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.stringByReplacingOccurrencesOfString("%", withString: "%25")
.stringByReplacingOccurrencesOfString("=", withString: "%")
.stringByRemovingPercentEncoding
}
The function returns an optional string which is nil
for invalid input.
Invalid input can be:
- A "=" character which is not followed by two hexadecimal digits,
e.g. "=XX".
- A "=NN" sequence which does not decode to a valid UTF-8 sequence,
e.g. "=E2=64".
Examples:
if let decoded = decodeQuotedPrintable("=C2=A31,000") {
print(decoded) // £1,000
}
if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
print(decoded) // “Hello … world!”
}
Update 1: The above code assumes that the message uses the UTF-8
encoding for quoting non-ASCII characters, as in most of your examples: C2 A3
is the UTF-8 encoding for "£", E2 80 A4
is the UTF-8 encoding for …
.
If the input is "Rub=E9n"
then the message is using the
Windows-1252 encoding.
To decode that correctly, you have to replace
.stringByRemovingPercentEncoding
by
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)
There are also ways to detect the encoding from a "Content-Type"
header field, compare e.g. https://mcmap.net/q/1329943/-why-my-return-is-nil-but-if-i-press-the-url-in-chrome-safari-i-can-get-data.
Update 2: The stringByReplacingPercentEscapesUsingEncoding
method is marked as deprecated, so the above code will always generate
a compiler warning. Unfortunately, it seems that no alternative method
has been provided by Apple.
So here is a new, completely self-contained decoding method which
does not cause any compiler warning. This time I have written it
as an extension method for String
. Explaining comments are in the
code.
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.decodeQuotedPrintableSequences(enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = rangeOfString("=", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< range.startIndex])
position = range.startIndex
// Decode one or more successive "=HH" sequences to a byte array:
let bytes = NSMutableData()
repeat {
let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
if hexCode.characters.count < 2 {
return nil // Incomplete hex code
}
guard var byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.appendBytes(&byte, length: 1)
position = position.advancedBy(3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.appendContentsOf(dec)
}
// Copy remaining characters to the result:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
Example usage:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
print(decoded) // Rubén
}
Update for Swift 4 (and later):
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.replacingOccurrences(of: "=\r\n", with: "")
.replacingOccurrences(of: "=\n", with: "")
.decodeQuotedPrintableSequences(encoding: enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = range(of: "=", range: position..<endIndex) {
result.append(contentsOf: self[position ..< range.lowerBound])
position = range.lowerBound
// Decode one or more successive "=HH" sequences to a byte array:
var bytes = Data()
repeat {
let hexCode = self[position...].dropFirst().prefix(2)
if hexCode.count < 2 {
return nil // Incomplete hex code
}
guard let byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.append(byte)
position = index(position, offsetBy: 3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.append(contentsOf: dec)
}
// Copy remaining characters to the result:
result.append(contentsOf: self[position ..< endIndex])
return result
}
}
Example usage:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
print(decoded) // Rubén
}