Microsoft IP addresses are visiting 'broken' links on my web app after sending a valid link to one of their hosted email users. I have validated this after checking 6,924 Microsoft subnets against > 15,000 requests
After viewing some other posts, I found that their "Advanced Threat Protection" (ATP) service does routinely visit links in the incoming emails, but doesn't use an identifiable User Agent.
We send emails to users with a 'magic'-style unique / secure link to access content on our platform. Unfortunately, this is a public-facing product, so we're not sending to people that have a homogenous IT configuration (could be any OS/browser, any email program, any email host, etc.)
The format of the url is: https://domain.tld/email/[parameters-encoded-as-base64-string]
Basically, I'm using a stringified JSON object and converting it to base64.
Original/correct object:
{
"companyID": 63, // example companyID
"videoID": "CA220502FR", // example videoID
"log_click": 1 //
"userID": 123456, // example userID
}
when converted to base64, becomes ->
eyJjb21wYW55SUQiOiA2MywidmlkZW9JRCI6ICJDQTIyMDUwMkZSIiwibG9nX2NsaWNrIjoxInVzZXJJRCI6IDEyMzQ1Nn0=
The problem:
I routinely see an issue coming up in our bug tracker, where a user visits / clicks a link, but the base64 encoded code/string is 'corrupted'...
e.g.
eyJpdmVmYlZFIjoiREI1NTM4MzVBRSIsImRiemNiYWxWRSI6OTYsInlidF9keXZkeCI6NCwiaGZmZVZFIjo0MzcyN30=
What is particularly interesting, the code is not scrambled not entirely, only the alphanumeric characters (the JSON characters like braces, commas, quotes, etc.) are intact.
So, the above 'correct' object, when decoded from a "corrupted" string (which is still valid base64, however...) eyJpdmVmYlZFIjoiREI1NTM4MzVBRSIsImRiemNiYWxWRSI6OTYsInlidF9keXZkeCI6NCwiaGZmZVZFIjo0MzcyN30=
, ends up like:
{
"ivefbVE":"DB553835AE", // videoID
"dbzcbalVE":96, // companyID
"ybt_dyvdx":4, // log_click
"hffeVE":2924 // userID
}
So, I'm seeing that something is parsing and changing the alphanumeric components of a base64 string as follows:
KEYS:
videoID -> becomes -> ivefbVE
companyID -> becomes -> dbzcbalVE
userID -> becomes -> hffeVE
log_click -> becomes -> ybt_dyvdx
VALUES:
CA220502FR -> becomes -> DB553835AE
63 -> becomes -> 96
1 -> becomes -> 4
??? -> becomes -> 2924 // I don't know which user this is originating from
Conclusions:
- The process conserves the case (upper/lower) of the character.
- It affects both [a-zA-Z] and [0-9] but not punctuation.
- After some creative visualization of the conversions, I found something really interesting:
There is a relatively simple rule to follow to 'encode' the text:
Letters between a -> f gets shifted + 1 in alphabetical position
Letters between g -> m gets shifted + 13 in alphabetical position
Letters between n -> z gets shifted -13 in alphabetical position
Because that process is not reversible, it doesn't seem like it's meant to be an 'encoding' or 'encryption' of the text... but almost like character set or base issue...
Questions:
What sort of application/process would scramble the parameters, but not the structure of the JSON object syntax around them? I'm suspecting something like a Norton Chrome extension or Outlook extension that tries to avoid email trackers from marketing stuff...we are a subscription platform, so that shouldn't apply to us.
Does anyone see a relationship between the before/after of the keys that might hint as to what kind of hashing/scrambling/modification process they're going through?