Conversion between UTF-8 ArrayBuffer and String

Asked 19/6, 2013 at 13:3 Answered 5/12, 2019 at 3:57

Solved javascript string utf-8 arraybuffer

I have an ArrayBuffer which contains a string encoded using UTF-8 and I can't find a standard way of converting such ArrayBuffer into a JS String (which I understand is encoded using UTF-16).

I've seen this code in numerous places, but I fail to see how it would work with any UTF-8 code points that are longer than 1 byte.

return String.fromCharCode.apply(null, new Uint8Array(data));

Similarly, I can't find a standard way of converting from a String to a UTF-8 encoded ArrayBuffer.

Photosynthesis answered 19/6, 2013 at 13:3 Comment(9)

@LightStyle Thanks, completely missed that spelling mistake! :P – Photosynthesis 19/6, 2013 at 13:6

var uintArray = new Uint8Array("string".split('').map(function(char) {return char.charCodeAt(0);})); – Loanloanda 19/6, 2013 at 13:10

It that is what you need I can explain you in an answer, otherwise I can keep only the comment ;) – Loanloanda 19/6, 2013 at 13:16

Will that definitely work on UTF code points that are longer than 1 byte? – Photosynthesis 19/6, 2013 at 13:19

I don't know, but it should, can't you try? – Loanloanda 19/6, 2013 at 13:21

I tried it with new Uint8Array("h€l".split('').map(function(char) {return char.charCodeAt(0);})); and it returned an array with 3 bytes, however I believe it should be 5 bytes because occording to fileformat.info/info/unicode/char/20ac/index.htm it says the UTF-8 encoding of it is 0xE2 0x82 0xAC. – Photosynthesis 19/6, 2013 at 13:24

The one-liner you posted will decode bytes in the range 0x00–0xFF to their corresponding Unicode code points U+0000–U+00FF. In other words, it can’t represent anywhere near the whole Unicode range. However, it just so happens that Unicode code points U+0000–U+00FF correspond exactly to ISO 8859-1 (Latin 1), so what you have written is in effect an ISO 8859-1 decoder. LightStyle’s oneliner is the encoder that corresponds to the decoder in the question. In other words, it is an ISO 8859-1 encoder. – Leeward 24/3, 2014 at 14:40

@TomLeese You fixed the spelling mistake and now I have no idea what it was :( – Conjoined 3/11, 2017 at 19:22

Up-to-date answer here: stackoverflow.com/questions/6965107/… – Cloakanddagger 23/4, 2020 at 12:24

function stringToUint(string) {
    var string = btoa(unescape(encodeURIComponent(string))),
        charList = string.split(''),
        uintArray = [];
    for (var i = 0; i < charList.length; i++) {
        uintArray.push(charList[i].charCodeAt(0));
    }
    return new Uint8Array(uintArray);
}

function uintToString(uintArray) {
    var encodedString = String.fromCharCode.apply(null, uintArray),
        decodedString = decodeURIComponent(escape(atob(encodedString)));
    return decodedString;
}

I have done, with some help from the internet, these little functions, they should solve your problems! Here is the working JSFiddle.

EDIT:

Since the source of the Uint8Array is external and you can't use atob you just need to remove it(working fiddle):

function uintToString(uintArray) {
    var encodedString = String.fromCharCode.apply(null, uintArray),
        decodedString = decodeURIComponent(escape(encodedString));
    return decodedString;
}

Warning: escape and unescape is removed from web standards. See this.

Welcome answered 19/6, 2013 at 13:42 Comment(11)

atob/btoa do base64 encoding/decoding, if you pass a honest utf8 byte array, it won't work: jsfiddle.net/Z9pQE/1 – Paletot 19/6, 2013 at 13:46

It is planned to work only with an UintArray of an encoded string, otherwise it is not going to work because of btoa and atob conversion. – Loanloanda 19/6, 2013 at 13:47

I probably should've specified, but the UTF-8 string in the ArrayBuffer comes from a seperate program written in a different programming language which produces pure UTF-8 strings, so as Esailija said, I can't use this as it does base64 encoding. – Photosynthesis 19/6, 2013 at 13:49

Wait. You can easily use this if the source is external, just don't use atob function. I'm going to update this with a new fiddle, just 1 minute – Loanloanda 19/6, 2013 at 13:51

Done. The same is true for the stringToUint function, just remove the btoa function and you're done :) – Loanloanda 19/6, 2013 at 13:55

You're welcome! Anyway, @Paletot your solution is great, worth +1! :D – Loanloanda 19/6, 2013 at 13:57

You saved my day! Just one addition, that if you use it with huge arrays, you can easily get: [Error] RangeError: Maximum call stack size exceeded. To fix that I use .slice() and apply it in chunks – Dineen 14/2, 2014 at 18:32

Glad to help! Feel free to edit the answer and add your solution :) – Loanloanda 14/2, 2014 at 21:27

why the btoa() call in stringToUint()? To me that's completely wrong and reducing that line to var string = unescape(encodeURIComponent(string)); works better for me. – Cajole 23/4, 2015 at 12:27

Just something that should be noted: If your array is sufficiently large, this solution will cause a stack overflow on the call to String.fromCharCode.apply. For some solutions, a loop may be better. – Mcqueen 28/7, 2016 at 16:36

This answer is outdated, go here: stackoverflow.com/questions/6965107/… – Cloakanddagger 23/4, 2020 at 12:23

109

Using TextEncoder and TextDecoder

var uint8array = new TextEncoder("utf-8").encode("Plain Text");
var string = new TextDecoder().decode(uint8array);
console.log(uint8array ,string )

Sayette answered 16/12, 2016 at 8:47 Comment(7)

Support for this feature is sorely lacking in IE and Edge. – Matrix 14/11, 2017 at 6:58

And for some reason there is only a polyfill for TextEncoder, I'm assuming TextDecoding just simply wouldn't work in IE right now. – Patronizing 13/3, 2019 at 16:56

Good answer but using "Plain Text" is misleading we aren't doing any cryptography here encode != encrypt – Eelpout 4/10, 2019 at 21:34

If you need IE support, you can you use the FastestSmallestTextEncoderDecoder polyfill, recommended by the MDN website. – Agan 5/12, 2019 at 3:37

Notice that TextEncoder c`tor doesn't accept any argument (it's always utf-8, no matter what you pass in). However the decoder does accept argument (both the documentation and how it works practically aligns with this). – Vedi 26/6, 2020 at 11:32

@JosephGarrone "plain text" isn't a term that is restricted to cryptography... – Corenecoreopsis 20/6, 2021 at 15:5

For anyone coming across this question in 2021, every major browser supports TextEncoder/Decoder now: caniuse.com/textencoder – Spillway 4/7, 2021 at 9:48

function stringToUint(string) {
    var string = btoa(unescape(encodeURIComponent(string))),
        charList = string.split(''),
        uintArray = [];
    for (var i = 0; i < charList.length; i++) {
        uintArray.push(charList[i].charCodeAt(0));
    }
    return new Uint8Array(uintArray);
}

function uintToString(uintArray) {
    var encodedString = String.fromCharCode.apply(null, uintArray),
        decodedString = decodeURIComponent(escape(atob(encodedString)));
    return decodedString;
}

I have done, with some help from the internet, these little functions, they should solve your problems! Here is the working JSFiddle.

EDIT:

Since the source of the Uint8Array is external and you can't use atob you just need to remove it(working fiddle):

function uintToString(uintArray) {
    var encodedString = String.fromCharCode.apply(null, uintArray),
        decodedString = decodeURIComponent(escape(encodedString));
    return decodedString;
}

Warning: escape and unescape is removed from web standards. See this.

Welcome answered 19/6, 2013 at 13:42 Comment(11)

atob/btoa do base64 encoding/decoding, if you pass a honest utf8 byte array, it won't work: jsfiddle.net/Z9pQE/1 – Paletot 19/6, 2013 at 13:46

It is planned to work only with an UintArray of an encoded string, otherwise it is not going to work because of btoa and atob conversion. – Loanloanda 19/6, 2013 at 13:47

Wait. You can easily use this if the source is external, just don't use atob function. I'm going to update this with a new fiddle, just 1 minute – Loanloanda 19/6, 2013 at 13:51

Done. The same is true for the stringToUint function, just remove the btoa function and you're done :) – Loanloanda 19/6, 2013 at 13:55

You're welcome! Anyway, @Paletot your solution is great, worth +1! :D – Loanloanda 19/6, 2013 at 13:57

Glad to help! Feel free to edit the answer and add your solution :) – Loanloanda 14/2, 2014 at 21:27

This answer is outdated, go here: stackoverflow.com/questions/6965107/… – Cloakanddagger 23/4, 2020 at 12:23

This should work:

// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt

/* utf.js - UTF-8 <=> UTF-16 convertion
 *
 * Copyright (C) 1999 Masanao Izumo <[email protected]>
 * Version: 1.0
 * LastModified: Dec 25 1999
 * This library is free.  You can redistribute it and/or modify it.
 */

function Utf8ArrayToStr(array) {
  var out, i, len, c;
  var char2, char3;

  out = "";
  len = array.length;
  i = 0;
  while (i < len) {
    c = array[i++];
    switch (c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                                   ((char2 & 0x3F) << 6) |
                                   ((char3 & 0x3F) << 0));
        break;
    }
  }    
  return out;
}

It's somewhat cleaner as the other solutions because it doesn't use any hacks nor depends on Browser JS functions, e.g. works also in other JS environments.

Check out the JSFiddle demo.

Also see the related questions: here, here

Pommel answered 13/3, 2014 at 8:38 Comment(2)

What about when going from string to utf-8 buffer? – Grummet 24/5, 2017 at 11:14

This is the least readable code I've ever seen to implement char-code to string conversion. I appreciate and admire the effort put into it, but there's 100s of more maintainable ways to achieve that. – Stefa 31/8, 2023 at 20:27

There's a polyfill for Encoding over on Github: text-encoding. It's easy for Node or the browser, and the Readme advises the following:

var uint8array = TextEncoder(encoding).encode(string);
var string = TextDecoder(encoding).decode(uint8array);

If I recall, 'utf-8' is the encoding you need, and of course you'll need to wrap your buffer:

var uint8array = new Uint8Array(utf8buffer);

Hope it works as well for you as it has for me.

Kc answered 13/5, 2014 at 22:5 Comment(3)

For anyone lazy like me, npm install text-encoding, var textEncoding = require('text-encoding'); var TextDecoder = textEncoding.TextDecoder;. No thanks. – Goldwin 29/11, 2016 at 6:12

@KarthikHande That's what the polyfill is for. Its not supported by all browsers so you also supply a pure js implementation as an alternative. – Matrix 14/11, 2017 at 6:56

Beware the library is HUGE – Chor 19/11, 2017 at 1:44

If you are doing this in browser there are no character encoding libraries built-in, but you can get by with:

function pad(n) {
    return n.length < 2 ? "0" + n : n;
}

var array = new Uint8Array(data);
var str = "";
for( var i = 0, len = array.length; i < len; ++i ) {
    str += ( "%" + pad(array[i].toString(16)))
}

str = decodeURIComponent(str);

Here's a demo that decodes a 3-byte UTF-8 unit: http://jsfiddle.net/Z9pQE/

Paletot answered 19/6, 2013 at 13:39 Comment(1)

You're the best person in the world. – Halsy 19/5, 2017 at 16:28

The methods readAsArrayBuffer and readAsText from a FileReader object converts a Blob object to an ArrayBuffer or to a DOMString asynchronous.

A Blob object type can be created from a raw text or byte array, for example.

let blob = new Blob([text], { type: "text/plain" });

let reader = new FileReader();
reader.onload = event =>
{
    let buffer = event.target.result;
};
reader.readAsArrayBuffer(blob);

I think it's better to pack up this in a promise:

function textToByteArray(text)
{
    let blob = new Blob([text], { type: "text/plain" });
    let reader = new FileReader();
    let done = function() { };

    reader.onload = event =>
    {
        done(new Uint8Array(event.target.result));
    };
    reader.readAsArrayBuffer(blob);

    return { done: function(callback) { done = callback; } }
}

function byteArrayToText(bytes, encoding)
{
    let blob = new Blob([bytes], { type: "application/octet-stream" });
    let reader = new FileReader();
    let done = function() { };

    reader.onload = event =>
    {
        done(event.target.result);
    };

    if(encoding) { reader.readAsText(blob, encoding); } else { reader.readAsText(blob); }

    return { done: function(callback) { done = callback; } }
}

let text = "\uD83D\uDCA9 = \u2661";
textToByteArray(text).done(bytes =>
{
    console.log(bytes);
    byteArrayToText(bytes, 'UTF-8').done(text => 
    {
        console.log(text); // 💩 = ♡
    });
});

Gerontology answered 28/10, 2017 at 13:12 Comment(0)

If you don't want to use any external polyfill library, you can use this function provided by the Mozilla Developer Network website:

function utf8ArrayToString(aBytes) {
    var sView = "";
    
    for (var nPart, nLen = aBytes.length, nIdx = 0; nIdx < nLen; nIdx++) {
        nPart = aBytes[nIdx];
        
        sView += String.fromCharCode(
            nPart > 251 && nPart < 254 && nIdx + 5 < nLen ? /* six bytes */
                /* (nPart - 252 << 30) may be not so safe in ECMAScript! So...: */
                (nPart - 252) * 1073741824 + (aBytes[++nIdx] - 128 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 247 && nPart < 252 && nIdx + 4 < nLen ? /* five bytes */
                (nPart - 248 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 239 && nPart < 248 && nIdx + 3 < nLen ? /* four bytes */
                (nPart - 240 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 223 && nPart < 240 && nIdx + 2 < nLen ? /* three bytes */
                (nPart - 224 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 191 && nPart < 224 && nIdx + 1 < nLen ? /* two bytes */
                (nPart - 192 << 6) + aBytes[++nIdx] - 128
            : /* nPart < 127 ? */ /* one byte */
                nPart
        );
    }
    
    return sView;
}

let str = utf8ArrayToString([50,72,226,130,130,32,43,32,79,226,130,130,32,226,135,140,32,50,72,226,130,130,79]);

// Must show 2H₂ + O₂ ⇌ 2H₂O
console.log(str);

Agan answered 5/12, 2019 at 3:57 Comment(1)

see up-to-date answer: stackoverflow.com/questions/6965107/… – Cloakanddagger 23/4, 2020 at 12:23

The main problem of programmers looking for conversion from byte array into a string is UTF-8 encoding (compression) of unicode characters. This code will help you:

var getString = function (strBytes) {

    var MAX_SIZE = 0x4000;
    var codeUnits = [];
    var highSurrogate;
    var lowSurrogate;
    var index = -1;

    var result = '';

    while (++index < strBytes.length) {
        var codePoint = Number(strBytes[index]);

        if (codePoint === (codePoint & 0x7F)) {

        } else if (0xF0 === (codePoint & 0xF0)) {
            codePoint ^= 0xF0;
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        } else if (0xE0 === (codePoint & 0xE0)) {
            codePoint ^= 0xE0;
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        } else if (0xC0 === (codePoint & 0xC0)) {
            codePoint ^= 0xC0;
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        }

        if (!isFinite(codePoint) || codePoint < 0 || codePoint > 0x10FFFF || Math.floor(codePoint) != codePoint)
            throw RangeError('Invalid code point: ' + codePoint);

        if (codePoint <= 0xFFFF)
            codeUnits.push(codePoint);
        else {
            codePoint -= 0x10000;
            highSurrogate = (codePoint >> 10) | 0xD800;
            lowSurrogate = (codePoint % 0x400) | 0xDC00;
            codeUnits.push(highSurrogate, lowSurrogate);
        }
        if (index + 1 == strBytes.length || codeUnits.length > MAX_SIZE) {
            result += String.fromCharCode.apply(null, codeUnits);
            codeUnits.length = 0;
        }
    }

    return result;
}

All the best !

Pockmark answered 18/6, 2017 at 12:23 Comment(5)

Thats not complete. For samplle, german umlauts are missing! – Eba 19/1, 2018 at 13:45

By the way ... I have noticed that there was invalid ordering in if statements. May be that was a problem your string was not processed. I have corrected in my codes, but forget to correct it in this post. – Pockmark 20/1, 2018 at 14:26

ö = RangeError: Invalid code point: 1581184, ü = RangeError: Invalid code point: 3678336 – Eba 21/1, 2018 at 7:48

I have changed code above. please try it one more time. There was a problem with "else if" statements ordering .. Now it must work for your case too. That code was tested for more than 30 languages including Japan, korean, Arabic etc. languages. – Pockmark 21/1, 2018 at 9:3

For example here are words I have transferred using bytes and restored to string in Javascript: Hälfte, Über, – Pockmark 21/1, 2018 at 9:10

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags