Why would I choose simple over relaxed canonicalization for DKIM?

Asked 28/12, 2011 at 19:25 Answered 11/9, 2022 at 14:15

DKIM supports two canonicalization schemes: relaxed and simple. The former is more lenient and allows for intermediary mailers to modify the email to a limited degree.

The only data I could find is a survey of implementations that shows the vast majority of email senders using relaxed canonicalization both for headers and body. (Noticeable fewer use relaxed for the body, but it's still a definite majority.)

The DKIM specification says that all clients have to support both canonicalization forms if they support DKIM, so that doesn't seem like a major factor. Both schemes allow intermediaries to add headers. The only distinction I can tell is in the handling of the case of header names (not values) and the whitespace within a header. Given that, it seems like relaxed will always have at least as good deliverability, which is the aim of DKIM.

(Of course, if I want to actually sign my emails to attest to their contents, I'd use S/MIME and certificates. DKIM is strictly about deliverability, right?)

Mendelism answered 28/12, 2011 at 19:25 Comment(0)

I suppose that simple canonicalization is available as a choice for senders who wish to have a less computationally intensive signing method, at the possible cost of some deliverability. The difference in complexity isn't that much, but it might make an appreciable difference for large bulk senders.

Stupefy answered 28/12, 2011 at 19:30 Comment(0)

"good deliverability, which is the aim of DKIM...DKIM is strictly about deliverability, right?"

You seem to have a faulty premise. DKIM is not strictly about deliverability. The purpose of DKIM is to authenticate who a sender is. The DKIM RFC explains it quite well:

DomainKeys Identified Mail (DKIM) defines a domain-level authentication framework for email using public-key cryptography and key server technology to permit verification of the source and contents of messages by either Mail Transfer Agents (MTAs) or Mail User Agents (MUAs).

A DKIM message is generally no more or less deliverable than an unsigned message. For example, SpamAssassin gives a message an identical score for a valid DKIM signed message as for an unsigned message. If the message fails DKIM validation, then as you'd expect, it gets a worse score.

What DKIM provides is the ability to reliably determine if a message purporting to be from Bank of America truly is. If it claims so, but the DKIM signature in the message fails validation, then I can know the message is a forgery, or a legit message that's been tampered with. In either case, it should not be delivered.

Now, whether or not I want to receive messages from that DKIM authenticated domain, and whether or not the content in the message is desirable is a whole different issue, and that has a much more profound impact upon deliverability than DKIM.

Exclamatory answered 1/5, 2013 at 20:14 Comment(3)

Words, words, words, but this doesn't address the question at all! – Lilley 21/1, 2016 at 20:27

Try re-reading the original post and see if there's another question in there you missed. Perhaps it's one that my answer specifically addresses. – Exclamatory 22/1, 2016 at 18:0

You are taking a pedantic and unhelpful view of the word "deliverability". In the sense the asker clearly intended, it means "make my mail less subject to being rejected as false/spam/etc". This is being achieved by signing it to prove who the sender is. The question then, is does "relaxed" or "simple" have an effect on that goal. – Godred 14/5, 2021 at 17:12

I don't think the difference has much to do with CPU load – the differences will insignificant compared to the time required to generate, sign, and send a message over SMTP, which you're going to have to do anyway; there's no escaping Amdahl's law.

The big payoff for relaxed canonicalization is in robustness. With simple canonicalization, the very tiniest difference, even ones which do not affect the contents, can cause validation to fail. For example:

A mail client that uses LF instead of CRLF (e.g. as PHP's mail() function on Linux does) goes through another server that converts them to CRLF (as it should).
An intermediate mail server which re-folds headers to a different line length
A spam filter that changes the case of a message header label, e.g. Content-Type -> content-type.

None of these will cause validation problems with relaxed canonicalization. So I'd say the angle is not what using simple canonicalization saves you from doing, it's what advantages you gain from doing the additional work required by relaxed canonicalization, and that's why it is more popular.

Hypothyroidism answered 2/9, 2020 at 13:10 Comment(0)

Intro:

I've spent quite some time analyzing the question of "relaxed" vs. "simple" canonicalization methods in respect to (2) measures:

Measure 1: Compatibility. The ability of exchanging mail servers to receive & compare data presented in a standardized format negating the need to "fix" things upon receipt...
Measure 2: Security. That security is not reduced to any degree based on the choice for Measure 1.

RFC 6376 gives express guidance on the "relaxed" & "simple" choices in turn each for the HEADER and then for the BODY.

It's important to note that any changes to formatting of either/both the header or body are made BEFORE being presented for signing.

It's further important that we agree about our understanding of how hashes work when considering this question: That ANY change to a message after it's been signed by the sending mail server will result in the hash being recomputed differently when it's tested by the receiving mail server causing it to fail. Were it otherwise there simply would be no point signing the message if subsequent changes could not change the hash.

HEADER:

RFC 6376 section 3.4 expressly directs:

3.4.1. The "simple" Header Canonicalization Algorithm ... does not change header fields in any way.

However: Section 3.4.2 DOES impose formatting requirements for "relaxed" and prescribes how and in what order such formatting will be done BEFORE the message is signed.

BODY:

3.4.3. ...the "simple" body canonicalization algorithm converts "CRLF" at the end of the body to a single "CRLF".

However: Once again, the "relaxed" method in Section 3.4.4 requires the same formatting as the "simple" method, but imposes the following additional requirements:

Ignore all whitespace at the end of lines. Implementations MUST NOT remove the CRLF at the end of the line.

Reduce all sequences of WSP within a line to a single SP character.

Analysis:

Measure 1: Compatibility:

It would appear if all mail servers used "relaxed/relaxed" that they could all expect to receive mail presented with a standardized format of both the body & headers. I would consider this a "GOOD" thing- unless somebody can provide a rationale for how the RFC could be flawed.

Measure 2 Security:

Once a message is signed- and whatever changes made by the sender to the formatting of the header and/or body are made BEFORE they sign the message- any later modifications will cause the hash to be computed differently and not match when tested by the receiving mail server.

Thus, whether "relaxed" or "simple" is chosen as the canonicalization method, once that message leaves the sender's mail server any changes to either/both the header and/or body either by the receiver or any forwarding mail server in between will cause the match of the hash to fail.

The only way the message could now pass "DKIM" validation would be to simply ignore that the hash has changed since it left the sender.

Conclusion:

The view that "relaxed" can allow a message which has been changed AFTER it has been signed and left the sender's mail server is misconstrued: Either the hash matches, or it does not. It cannot allow for any difference: "tiny" or otherwise. The hash is being tested and that is a binary test: yes or no: there are no shades of grey comparing hashes.

From reading RFC 6376, it would appear if DKIM senders all were to use "relaxed/relaxed", we would all receive messages with standarized, uniform formatting which appears to be big plus in respect to compatibility/interoperability. And since any change to a message under either canonicalization method after it leaves the sending server should fail testing, there doesn't appear to be any downside I can see to using "relaxed/relaxed" if it enhances uniformity of data being exchanged between different systems.

Studley answered 11/9, 2022 at 14:15 Comment(0)