@njlarsson has already explained what to do well:
The conversion you report from Firefox, Cla%CC%81ssico, did this slightly differently: it changed the á into a followed by U+0301, the COMBINING ACUTE ACCENT character. In UTF-8, U+0301 makes 0xCC 0x81.
More generally I wanted to know why and how that's correct, so here's my thinking.
Why might one be motivated to do this?
Beyond of course the original - that a Spanish user should not need to know anything about encoding or decoding (unless they're an engineer or developer tasked with fixing broken implementations), another example can be found in the Google JavaScript style guide, which applies independent of programming language:
Tip: Never make your code less readable simply out of fear that some programs might not handle non-ASCII characters properly. If that happens, those programs are broken and they must be fixed.
At a high level, in the URL using percent sign %
encoding is consistent with IETF RFC 1738 Section 2.2. Note it doesn't say what the %
encoding means, though by convention, the web is UTF-8
as can be seen from Firefox and Chrome's correct behaviour back in 2013.
Where this breaks down is that in PHP (and so in Wordpress), it's likely the file name string is not encoded in UTF-8
. Which one could be a natural question?
Encoding, decoding and re-encoding
The string could be provided as encoded initially in UTF-8, decoded to some internal format, perhaps UCS-2LE (which can make some string operations faster, but break for others, like emoji 😉 as they're encoded outside the basic multilingual plane), and then re-encoded for printing as UTF-8.
Continuing in PHP, for example using mb_convert_encoding, which may require the php-cli or server has php-mbstring installed:
php > $encoded = "http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Cla%CC%81ssico_foto-Henrique-Peron-470x120-1371827671.jpg";
php > $decoded = mb_convert_encoding($encoded, "UTF-8", "UCS-2LE");
php > $reencoded = mb_convert_encoding($decoded, "UCS-2LE", "UTF-8");
php > echo $reencoded;
http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Cla%CC%81ssico_foto-Henrique-Peron-470x120-1371827671.jpg
Or the string might not initially be encoded in UTF-8 at all, it'll depend on things like were it came from, which aren't provided here.
Aside: The $decoded
string is likely to be nonsense if naively printed - which looks like a bit like the Python 2 "mojibake" problem:
php > echo $decoded; # UCS-2LE printed naively likely shows nonsense
瑨灴⼺眯睷琮敨敭楤捡畯据汩挮浯琯獥⽴潮慮捳楩䰯ⵁ䅍归敃楢档ⵥ汃╡䍃㠥猱楳潣晟瑯ⵯ效牮煩敵倭牥湯㐭〷ㅸ〲ㄭ㜳㠱㜲㜶⸱灪?
How to perform the UTF-8 conversion?
The precise low-level details and mathematics, assuming one is curious enough to think about how the computer physically represents the data as binary or hexadecimal, can be found elsewhere on StackOverflow.