Why is the string literal considered a primitive type in JavaScript?
Asked Answered
P

1

5

The official documentation as well as tons of articles on the internet say that 'some string' is a primitive value, meaning that it creates a copy each time we assign it to a variable.

However, this question (and answer to it) How to force JavaScript to deep copy a string? demonstrates that actually V8 does not copy a string even on the substr method.

It would also be insane to copy strings every time we pass them into functions and would not make sense. In languages like C#, Java, or Python, the String data type is definitely a reference type.

Furthermore, this link shows the hierarchy and we can see HeapObject after all. https://thlorenz.com/v8-dox/build/v8-3.25.30/html/d7/da4/classv8_1_1internal_1_1_sliced_string.html enter image description here

Finally, after inspecting

let copy = someStringInitializedAbove

in Devtools it is clear that a new copy of that string has not been created!

So I am pretty sure that strings are not copied on assignment. But I still do not understand why so many articles like JS Primitives vs Reference say that they are.

Pedant answered 25/4, 2020 at 15:4 Comment(2)
The language spec dictates that the string is a primitive and behave as such as far as the JS userland is concerned. That it may not be represented by primitives under the hood is an abstract implementation detail, something the JS userland is agnostic about.Gassman
Is it because at the time of design, JS designers were not concerned about proximity to native computer memory? Think JS running in your browser (Firefox written in C/C++) instead of worrying about how will it optimally run in certain memory system. Had nodejs concepts were around back then then I am sure same designers would have put different thoughts when they speced out JS.Navarino
C
10

Fundamentally, because the specification says so:

string value

primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integer values

The specification also defines that there are String objects, as distinct from primitive strings. (Similarly there are primitive number, boolean, and symbol types, and Number and Boolean and Symbol objects.)

Primitive strings follow all the rules of other primitives. At a language level, they're treated exactly the way primitive numbers and booleans are. For all intents and purposes, they are primitive values. But as you say, it would be insane for a = b to literally make a copy of the string in b and put that copy in a. Implementations don't have to do that because primitive string values are immutable (just like primitive number values). You can't change any characters in a string, you can only create a new string. If strings were mutable, the implementation would have to make a copy when you did a = b (but if they were mutable the spec would be written differently).

Note that primitive strings and String objects really are different things:

const s = "hey";
const o = new String("hey");

// Here, the string `s` refers to is temporarily
// converted to a string object so we can perform an
// object operation on it (setting a property).
s.foo = "bar";
// But that temporary object is never stored anywhere,
// `s` still just contains the primitive, so getting
// the property won't find it:
console.log(s.foo); // undefined

// `o` is a String object, which means it can have properties
o.foo = "bar";
console.log(o.foo); // "bar"

So why have primitive strings? You'd have to ask Brendan Eich (and he's reasonably responsive on Twitter), but I suspect it was so that the definition of the equivalence operators (==, ===, !=, and !==) didn't have to either be something that could be overloaded by an object type for its own purposes, or special-cased for strings.

So why have string objects? Having String objects (and Number objects, and Boolean objects, and Symbol objects) along with rules saying when a temporary object version of a primitive is created make it possible to define methods on primitives. When you do:

console.log("example".toUpperCase());

in specification terms, a String object is created (by the GetValue operation) and then the property toUpperCase is looked up on that object and (in the above) called. Primitive strings therefore get their toUpperCase (and other standard methods) from String.prototype and Object.prototype. But the temporary object that gets created is not accessible to code except in some edge cases,¹ and JavaScript engines can avoid literally creating the object outside of those edge cases. The advantage to that is that new methods can be added to String.prototype and used on primitive strings.


¹ "What edge cases?" I hear you ask. The most common one I can think of is when you've added your own method to String.prototype (or similar) in loose mode code:

Object.defineProperty(String.prototype, "example", {
    value() {
        console.log(`typeof this: ${typeof this}`);
        console.log(`this instance of String: ${this instanceof String}`);
    },
    writable: true,
    configurable: true
});

"foo".example();
// typeof this: object
// this instance of String: true

There, the JavaScript engine was forced to create the String object because this can't be a primitive in loose mode.

Strict mode makes it possible to avoid creating the object, because in strict mode this isn't required to be an object type, it can be a primitive (in this case, a primitive string):

"use strict";
Object.defineProperty(String.prototype, "example", {
    value() {
        console.log(`typeof this: ${typeof this}`);
        console.log(`this instance of String: ${this instanceof String}`);
    },
    writable: true,
    configurable: true
});

"foo".example();
// typeof this: string
// this instanceof String: false
Constringe answered 25/4, 2020 at 15:19 Comment(4)
Thank you so much for shedding light on this topic and for the possibility to refer people this this question.Pedant
The main issue was that generally (in most languages), the actual value of reference types of local variables lies on the heap whereas the primitive value is on the stack. Hence a misconception that characters are on the stack. But for me, it was not so clear that specification can dictate such a fundamental thing.Pedant
@Pedant - Sure, specs have dictated much more fundamental things than that. But JavaScript's doesn't. It defines the semantics and behavior of the language in detail, but provided an implementation behaves the way the specification says JavaScript behaves, the implementation can do anything it wants to optimize. Notably, JavaScript's spec doesn't say where anything other than the underlying data of typed arrays is allocated. There's no reason for it to, that's an implementation detail.Constringe
@Pedant - As a side note, the idea that "reference types (objects) live in the heap" isn't always true. For instance, modern JIT systems can do escape analysis and allocate objects on the stack if the object doesn't always survive termination of the function call and isn't larger than available stack space. Oracle's JVM does that, as do some JavaScript engines. (And of course, when a primitive is a field within an object and that object is on the heap, that primitive is on the heap.)Constringe

© 2022 - 2024 — McMap. All rights reserved.