In an OSS web app, we have JS code that performs some Ajax update (uses jQuery, not relevant). After the page update, a call is made to the html5 history interface History.pushState
, in the following code:
var updateHistory = function(url) {
var context = { state:1, rand:Math.random() };
/* -----> bedfore the problem call <------- */
History.pushState( context, "Questions", url );
/* -----> after the problem call <------- */
setTimeout(function (){
/* HACK: For some weird reson, sometimes something overrides the above pushState so we re-aplly it
This might be caused by some other JS plugin.
The delay of 10msec allows the other plugin to override the URL.
*/
History.replaceState( context, "Questions", url );
}, 10);
};
[Please note: the full code segment is provided for context, the HACK part is not the issue of this question]
The app is i18n'ed and is using URL encoded Unicode segments in the URL's, so just before the marked problem call in the above code, the URL argument contains (as inspected in Firebug):
"/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/scope:all/sort:activity-desc/page:1/"
The encoded segment is utf-8 in percent encoding. The URL in the browser window is: (just for completeness, doesn't really matter)
http://<base-url>/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/
Just after the call, the URL displayed in the browser window changes to:
http://<base-url>/%C3%98%C2%A7%C3%99%C2%84%C3%98%C2%A3%C3%98%C2%B3%C3%98%C2%A6%C3%99%C2%84%C3%98%C2%A9/scope:all/sort:activity-desc/page:1/
The URL encoded segment is just mojibake, the result of using the wrong encoding at some level. The correct URL would've been:
http://<base-url>/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/scope:all/sort:activity-desc/page:1/
This behavior has been tested on both FF and Chrome.
The history interface specs don't mention anything about encoded URL's, but I assume the default standard for URL formation (utf-8 and percent encoding etc) would apply when using URL's in function calls for the interface.
Any idea on what's going on here.
Edit:
I wasn't paying attention to the uppercase H in History - this code is actually using the History.js wrapper for the history interface. I replaced with a direct call to history.pushState
(notice the lowercase h) without going through the wrapper, and the code is working as expected as far as I can tell. The issue with the original code still stands - so an issue with the History.js library it seems.
%D____8 %A____7 %D____9 %8____4 %D____8 %A____3 %D____8 %B____3 %D____8 %A____6 %D____9 %8____4 %D____8 %A____9
and%C3%9 8 %C2%A 7 %C3%9 9 %C2%8 4 %C3%9 8 %C2%A 3 %C3%9 8 %C2%B 3 %C3%9 8 %C2%A 6 %C3%9 9 %C2%8 4 %C3%9 8 %C2%A 9
. – Ichneumon%C3%9
for a%D
and%C2%B
for a%B
, for example). But even then I couldn't find the exact algorithm or logic that the encode function is using, nor could I repeat this buggy behavior in a controlled environment trying to call "encodeURI" multiple times and even looking through History.js source code. If you have any tips, I would love to hear them, just for the sake of learning. – Ichneumonwindow.encodeURI(window.unescape('%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9'))
in your JS console; the output should be the same as History.js' output. My answer below goes into more detail, if that helps! – Placencia%D8
. This is bigger than7F
, so it gets split into two bytes when encoded in UTF-8, formatted as11xxxxxx 10xxxxxx
. (See the Wikipedia page on UTF-8 for why.) So first we convert it to binary and get.11011000
. Now, substitute those bits for thex
s in the format pattern, using leading zeroes to pad it out:11000011 10011000
. Convert to hex, and you'll getC3 98
. Voilà! – Placencia