Converting a string from utf8 to latin1 in NodeJS

Asked 18/2, 2015 at 21:42 Answered 19/4, 2021 at 9:26

I'm using a Latin1 encoded DB and can't change it to UTF-8 meaning that I run into issues with certain application data. I'm using Tesseract to OCR a document (tesseract encodes in UTF-8) and tried to use iconv-lite; however, it creates a buffer and to convert that buffer into a string. But again, buffer to string conversion does not allow "latin1" encoding.

I've read a bunch of questions/answers; however, all I get is setting client encoding and stuff like that.

Any ideas?

Saransk answered 18/2, 2015 at 21:42 Comment(1)

Man...thanks. Idk how I missed that. – Saransk 18/2, 2015 at 22:9

Since Node.js v7.1.0, you can use the transcode function from the buffer module:
https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc

For example:

const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from(utf8String), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");

Hesketh answered 19/4, 2021 at 9:26 Comment(0)

You can create a buffer from the UFT8 string you have, and then decode that buffer to Latin 1 using iconv-lite, like this

var buff   = new Buffer(tesseract_string, 'utf8');
var DB_str = iconv.decode(buff, 'ISO-8859-1');

Scute answered 18/2, 2015 at 22:16 Comment(0)

I've found a way to convert any encoded text file, to UTF8

var 
  fs = require('fs'),
  charsetDetector = require('node-icu-charset-detector'),
  iconvlite = require('iconv-lite');

/* Having different encodings
 * on text files in a git repo
 * but need to serve always on 
 * standard 'utf-8'
 */
function getFileContentsInUTF8(file_path) {
  var content = fs.readFileSync(file_path);
  var original_charset = charsetDetector.detectCharset(content);
  var jsString = iconvlite.decode(content, original_charset.toString());
  return jsString;
}

I'ts also in a gist here: https://gist.github.com/jacargentina/be454c13fa19003cf9f48175e82304d5

Maybe you can try this, where content should be your database buffer data (in latin1 encoding)

Observer answered 30/3, 2016 at 23:53 Comment(0)

Recommended topics

Hot tags