How to convert character encoding from CP932 to UTF-8 in nodejs javascript, using the nodejs-iconv module (or other solution)
Asked Answered
C

3

6

I'm attempting to convert a string from CP932 (aka Windows-31J) to utf8 in javascript. Basically I'm crawling a site that ignores the utf-8 request in the request header and returns cp932 encoded text (even though the html metatag indicates that the page is shift_jis).

Anyway, I have the entire page stored in a string variable called "html". From there I'm attempting to convert it to utf8 using this code:

var Iconv = require('iconv').Iconv;
var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE');

var myBuffer = new Buffer(html.length * 3);
myBuffer.write(html, 0, 'utf8')
var utf8html = (conv.convert(myBuffer)).toString('utf8');

The result is not what it's supposed to be. For example, the string: "投稿者さんの 稚内全日空ホテル のクチコミ (感想・情報)" comes out as "ソスソスソスeソスメゑソスソスソスソスソス ソスtソスソスソスSソスソスソスソスソスzソスeソスソス ソスフクソス`ソスRソス~ (ソスソスソスzソスEソスソスソスソス)"

If I remove //TRANSLIT//IGNORE (Which should cause it to return similar characters for missing characters, and failing that omit non-transcode-able characters), I get this error: Error: EILSEQ, Illegal character sequence.

I'm open to using any solution that can be implemented in nodejs, but my search results haven't yielded many options outside of the nodejs-iconv module.

nodejs-iconv ref: https://github.com/bnoordhuis/node-iconv

Thanks!

Edit 24.06.2011: I've gone ahead and implemented a solution in Java. However I'd still be interested in a javascript solution to this problem if somebody can solve it.

Concentrate answered 20/6, 2011 at 13:4 Comment(2)
Have you confused FROM and TO by chance?Barkentine
The way I have it set up matches the examples in the module documentation, but just for kicks I tried swapping it, and the result appears worse. I get this string: "e tSze N`R~ (zE)"Concentrate
D
5

I got same trouble today :)
It depends libiconv. You need libiconv-1.13-ja-1.patch.
Please check followings.

or you can avoid problem using iconv-jp try

npm install iconv-jp
Dalessandro answered 28/8, 2011 at 11:30 Comment(0)
R
5

I had same problem, but with CP1250. I was looking for problem everywhere and everything was OK, except call of request – I had to add encoding: 'binary'.

request = require('request')
Iconv  = require('iconv').Iconv

request({uri: url, encoding: 'binary'}, function(err, response, body) {
    body = new Buffer(body, 'binary')
    iconv = new Iconv('CP1250', 'UTF8')
    body = iconv.convert(body).toString()
    // ...
})
Ravin answered 10/1, 2013 at 10:26 Comment(0)
D
0

https://github.com/bnoordhuis/node-iconv/issues/19

I tried /Users/Me/node_modules/iconv/test.js node test.js. It return error.

On Mac OS X Lion, this problem seems depend on gcc.

Dalessandro answered 2/9, 2011 at 1:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.