How do I transcode a Javascript string to ISO-8859-1?
Asked Answered
V

3

21

I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).

If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á appear correctly.

It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?

I've tried setting the created form to use iso-8859-1 like this:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

And also:

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

But that doesn't seem to work.

EDIT:

The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:

function cfaqs_post_message(msg) {
  var url = cfaqs_build_post_url();
  msg = escape(msg).replace(/\+/g, "%2B");
  $.ajax({
    type: "POST",
    url: url,
    processData: false,
    data: "message=" + msg + "&post=Preview Message",
    success: function(html) {
      // ...
    },
    dataType: "html",
    contentType: "application/x-www-form-urlencoded"
  });
}
Volta answered 17/2, 2010 at 19:37 Comment(1)
How are you submitting the message (e.g. full example of a failing the AJAX code)?Austroasiatic
S
24

It is my understanding that Javascript uses UTF-8 for its strings

No, no.

Each page has its charset enconding defined in meta tag, just below head element

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

or

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

And it is a good idea to define its target charset encoding on server side.

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

No, no.

The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

If you really want to translate the target charset encoding, TRY as follows

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

Here you can see in action:

You can use this link as guideline (See JavaScript escapes)

Added to original answer how I implement jQuery funcionality

var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

It works fine without any headache.

Regards,

Sherlocke answered 20/2, 2010 at 18:8 Comment(3)
Thanks for the informative answer, I'm marking it as correct even though this was not exactly the solution. My post didn't really give enough information to show the real issue. (I only found out about that after banging my head against the wall for a few more hours)Volta
@Marcos Marin Added content to original answerSherlocke
For C# : <%@ Page RequestEncoding="utf-8" ResponseEncoding="utf-8" %>Hynda
S
4

I had a very similar problem. I needed to pass a URL parameter using JQuery to make an ajax call, and most of the times parameters values included accents.

Both pages had to be set to charset=ISO-8859-1 and javascript's functions: encodeURI, encodeURIComponent etc. only uses UTF-8.

What I did was to create a link in the original page, including all parameters without any encoding, let's say:

var myLink = document.getElementById("myHiddenLink");
myLink.setAttribute("href", "México, Perú, María and any other words with accents and spaces");

and then assign the href value to a variable, like this:

var theLink = myLink.getAttribute("href");

So finally "theLink" variable value was ISO-8859-1 encoded, and everything worked just fine.

Spacing answered 22/10, 2013 at 8:3 Comment(0)
K
-1

You can now decode strings using TextDecoder:

const decoded = new TextDecoder('windows-1252').decode(encoded)

note that windows-1252 is equivalent to ISO-8859-1 for more, checkout https://developer.mozilla.org/en-US/docs/Web/API/Encoding_API/Encodings

Kutuzov answered 16/6, 2022 at 6:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.