Is it possible to set accept-charset for new FormData (XHR2) object or workaround
Asked Answered
C

2

13

Here is example code (http://jsfiddle.net/epsSZ/1/):

HTML:

<form enctype="multipart/form-data" action="/echo/html" method="post" name="fileinfo" accept-charset="windows-1251">
  <label>Label:</label>
  <input type="text" name="label" size="12" maxlength="32" value="får løbende" /><br />
  <input type="submit" value="Send standart">
</form>
<button onclick="sendForm()">Send ajax!</button>

JS:

window.sendForm = function() {
  var oOutput = document.getElementById("output"),
     oData = new FormData(document.forms.namedItem("fileinfo"));
  var oReq = new XMLHttpRequest();
  oReq.open("POST", "/echo/html", true);
  oReq.send(oData);
}

When i submit this old way via standart form submit, then request payload looks like this:

------WebKitFormBoundary2890GbzEKCmB08rz
Content-Disposition: form-data; name="label"

f&#229;r l&#248;bende

But when i submit this AJAX way, then it looks little different:

------WebKitFormBoundaryPO2mPRFKj3zsKVM5
Content-Disposition: form-data; name="label"

får løbende

As you can see, in former case there is some chars is replaced with character entities, but in case of using FormData there is plain string, which is of course good because it's utf-8, but is there any possibility to make it behave like standart form submit ?

Crawly answered 26/2, 2014 at 16:10 Comment(0)
L
15

The answer to your question is No. You cannot change it. According to XMLHttpRequest2 TR, FormData constructed data is explicitly encoded to UTF-8. With no mention of allowing to change it.

The usual mimeType or Content-Type=charset become invalid for multi-part requests, since it is handled differently for the exact same reason.

To quote,

If data is a FormData Let the request entity body be the result of running the multipart/form-data encoding algorithm with data as form data set and with UTF-8 as the explicit character encoding.

Let mime type be the concatenation of "multipart/form-data;", a U+0020 SPACE character, "boundary=", and the multipart/form-data boundary string generated by the multipart/form-data encoding algorithm.

Hope this helps!

Update

If you are willing to forgo

new FormData(document.forms.namedItem("fileinfo"));

for

new FormData().append("name", "value")

there might be a workable solution possible. Let me know if thats what you are looking for.

Another Update

Did a little bit of running around. Updated fiddle with all modes

So this is the story,

1 form with accept-charset="utf8" => default behavior

The content does not require any additional escaping/encoding. So the request fires with the text intact as får løbende

2 form with accept-charset="windows-1251" => your case

The content requires additional escaping/encoding, since the default charset of the browser here is utf8. So the content is escaped, and then fired, i.e. the content sent is f&#229;r l&#248;bende

3 FormData constructed with form element

The content does not require any additional escaping/encoding, since it defaults to utf8. So the request fires with text as får løbende.

4 FormData constructed, and then appended with escaped data

The content is still in the utf8 encoding, but it doesn't hurt to call escape(content) before appending to the form data. This means the request fires with text as f%E5r%20l%F8bende. Still no dice right?

I was wrong, nope. Looking closer[read => staring for a few minutes....] at

f&#229;r l&#248;bende and

f%E5r%20l%F8bende

Then it all fell into place - %E5 (Hexadecimal) = &#229; (Decimal). So basically escape()is Javascript's way of doing things, the % based encoding, which is not HTML friendly.

Similarly &#;, as we know is HTML's way of encoding. So I put another mode to ajax, [which is what you are looking for, I'm guessing]

5 FormData constructed, and then appended with html-escaped data

The content is still in utf8 encoding. Doesn't hurt to escape it like HTML encoding, using this wonderful piece of code from stackoverflow. And voila, the request fired with the text f&#229;r l&#248;bende

Updated fiddle with all modes

Hope this helps clear it out!

UPDATE for windows-1251 full support

This привет får løbende input was failing in earlier mode 5. Update fiddle http://jsfiddle.net/epsSZ/6/.

Uses a combination of solution here https://mcmap.net/q/906404/-encoding-conversation-utf-8-to-1251-in-javascript and mine. So the problem is escaping everything. So now escaping only characters not present in the windows-1251 charset.

This helps it I hope!

Lactometer answered 20/3, 2014 at 10:12 Comment(10)
Would be nice to see solution with append if it is not too complex, but this answer is yet satisfactory.Crawly
Updated the answer with append solution. Have to say, thank you for the question!Lactometer
@Crawly Does the solution with append help in any way?Lactometer
not quite, beacuse it escapes also all windows-1251 characters, which is not desired. Currently i'm escaping it on server with InLatin-1_Supplement unicode setCrawly
Can you tell me an example?Lactometer
Yes, this is example of perl regular expression: $data =~ s/\p{InLatin-1_Supplement}/"&#".ord($1).";"/egCrawly
Ok, is the last mode 5 in jsfiddle.net/aravindbaskaran/epsSZ/4 not doing the same? Can you give me an example input that doesn't work in the expected way for mode 5?Lactometer
Yes, here is it "привет får løbende"Crawly
Ah, yes. Think I got your problem, there is a way we can do it, just like the browser. :) Hold onLactometer
Updated the answer with windows-1251 full supportLactometer
R
1

Thank you for this question, I enjoyed myself! :)
Replace

<form enctype="multipart/form-data" action="/echo/html" method="post" name="fileinfo" accept-charset="windows-1251">

by

<form enctype="multipart/form-data" action="/echo/html" method="post" name="fileinfo" accept-charset="utf-8">

The problem is the accept-charset is windows-1251 instead of utf-8

After

oReq.open("POST", "/echo/html", true);

you can also add

oReq.overrideMimeType('text/html; charset=UTF-8');
oReq.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");

but this is not what fixes the problem.

Good luck. :)

Regenerate answered 17/3, 2014 at 19:58 Comment(2)
Unfortunately this solution didn't worked for me. This sounds weird, but i need to post this form as windows-1251 and get those character entities from browser.Crawly
Maybe you can send UTF-8 and convert windows-1251 php side with this for exampleRegenerate

© 2022 - 2024 — McMap. All rights reserved.