Why are string literals l-value while all other literals are r-value?
Asked Answered
D

5

80

C++03 5.1 Primary expressions §2 says:

A literal is a primary expression. Its type depends on its form (2.13). A string literal is an lvalue; all other literals are rvalues.

Similarly, C99 6.5.1 §4 says:

A string literal is a primary expression. It is an lvalue with type as detailed in 6.4.5.

What is the rationale behind this?

As I understand, string literals are objects, while all other literals are not. And an l-value always refers to an object.

But the question then is why are string literals objects while all other literals are not? This rationale seems to me more like an egg or chicken problem.

I understand the answer to this may be related to hardware architecture rather than C/C++ as programming languages, nevertheless I would like to hear the same.

Doti answered 4/4, 2012 at 3:23 Comment(10)
Lvalues are not objects. Lvalues are values which can appear on the left-hand side of an assignment, such as variables, members of structures, and array element lookups. (L = Left.)Comedown
@duskwuff: The Committee begs to differ. Per 6.3.2.1, "An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined." Per the footnote (53) referenced in that citation, an lvalue should be thought of as an "object locator value".Braga
@JohnCalsbeek C++11 'fixed' that, e.g. alias<T[N]> {} is possible now. U {}.arr is also an rvalue of array type if arr is declared as such in the class definition for U.Theater
BTW, a better approximation of lvalue is "syntactically valid operand of the & operator". I suspect that definition is actually equivalent to the standard's definition, unless I'm missing something...Braga
Update: It is only approximate. Register-storage-class objects are not valid as operands of &, but are lvalues. Also, I'm rather unclear on why it's (presumably) invalid to apply & to the return value of a function, which is specified to have object type...Braga
@r.. and in C, function designators are not lvalues.Minium
@R.. Bit field members are objects but don't have their own address.Steamroller
@curiousguy: Indeed, nor do register class. However I think (it's been a long time) I was trying to get at that with "syntactically valid".Braga
In three sentences: strings occupy memory, you can take their address. &"hello" is valid C++. Hence, they should be l-values.Pannonia
@duskwuff-inactive- Aren't the names originally from assembly language? location value and register value?Gastrointestinal
B
44

A string literal is a literal with array type, and in C there is no way for an array type to exist in an expression except as an lvalue. String literals could have been specified to have pointer type (rather than array type that usually decays to a pointer) pointing to the string "contents", but this would make them rather less useful; in particular, the sizeof operator could not be applied to them.

Note that C99 introduced compound literals, which are also lvalues, so having a literal be an lvalue is no longer a special exception; it's closer to being the norm.

Braga answered 4/4, 2012 at 3:34 Comment(18)
Isn't puts("hello") an example of an expression with an array type that could be an rvalue?Sakhuja
puts("hello") is an expression with type int.Braga
I meant where "hello" is an rvalue.Sakhuja
"hello" is not an rvalue. It's an lvalue array which decays to an expression of type pointer-to-char.Braga
Yes, but you said "no way for an array type to exist in an expression except as an lvalue.". Wouldn't that code work if the literal was an rvalue?Sakhuja
In that case it might, but what if you passed a string literal to a function that wanted to store it? You might expect the literal to go "out of scope" after the function call, so a copy would be required.Inartistic
The literal can't have array type without being an lvalue, because of the way array decay to pointers works. If it did not have object type, there would be no address of its initial element for it to decay to. As my (slightly revised) answer states, the language could have been designed such that string literals are originally of pointer type, without any decay, and then they would not need to be lvalues. But that would be a lot less useful in practice.Braga
It is possible to have rvalue array types - for example if you have struct x { int a[2]; }; struct x foo(void); then foo().a is an rvalue array. Also, given struct x bar, quux; then (1 ? bar : quux).a is an rvalue array.Telemechanics
@caf: C does not define "rvalue", which is probably a good thing, because it's always unclear whether the intended meaning is "non-lvalue" or just "any expression value". Your examples are definitely lvalues per the definition of an lvalue ("an expression with an object type...") and 6.5.2.2, which reads [starting new comment]:Braga
@R.. Could you comment on my answer below? There seems to be a strong view that I'm incorrect, but I think this may be a place where C and C++ differ. I'd like to check before I delete the answer :)Clothbound
"If the expression that denotes the called function has type pointer to function returning an object type, the function call expression has the same type as that object type, and has the value determined as specified in 6.8.6.4. Otherwise, the function call has type void. If an attempt is made to modify the result of a function call or to access it after the next sequence point, the behavior is undefined."Braga
@R.: That definition does not seem complete, because for example the expression +1 has object type (int) but is not ordinarily considered an lvalue. Note that Example 1 in C99 §6.5.2.3 specifically calls out f().x as being "a valid postfix expression but is not an lvalue".Telemechanics
The (C) standard could have defined string literals as rvalues, and then added a number of special rules to make them work as they do. Defining them as lvalues eliminates the need for most of the special rules. (In C, there's still the special rule that they don't have a const type, but you're not allowed to modify them. In C++, the special rule is that they have a const type, but there is an implicit conversion which will remove the const. In both cases, these special rules only apply to string literals.)Refractor
@Telemechanics is right that there are array "rvalues" (or just plain values), due to struct return values. The standard is pretty weak in terms of describing what one can do with them, though. The big issue in implementations is that they may (or may not) be stored in registers (for sufficiently small structures) or similar "ephemeral" storage, and array manipulation—even something as simple as subscripting to extract one element—can overwrite this storage; but "normal" array access requires a fairly durable pointer to the base of the array. How long is that pointer valid? Who knows!Perforate
@torek: If this is correct, then I believe subscripting them is illegal unless there's a special case allowing it. Even if there is, I see no reason the array would need to exist temporarily in memory...Braga
The conclusions we drew, way back when, were that the only "truly safe" thing to do with a struct-valued function was either: struct_instance = f(args); or (void) f(args);. C99 tries to make it clear that you can also select a struct element and (subsequently) an array element, but not grab hold of a pointer to the entire array. This works right in gcc, but it's probably a good test for other compilers. (I'd guess the Plum-Hall test suite has a test like this by now.)Perforate
Can you provide a citation where C99 tries to make it clear that this is allowed?Braga
Also if it is not array type, template deduction of size of string literal is not possibleWrinkle
M
19

String literals are arrays - objects of inherently unpredictable size (i.e of user-defined and possibly large size). In general case, there's simply no other way to represent such literals except as objects in memory, i.e. as lvalues. In C99 this also applies to compound literals, which are also lvalues.

Any attempts to artificially hide the fact that string literals are lvalues at the language level would produce a considerable number of completely unnecessary difficulties, since the ability to point to a string literal with a pointer as well as the ability to access it as an array relies critically on its lvalue-ness being visible at the language level.

Meanwhile, literals of scalar types have fixed compile-time size. At the same time, such literals are very likely to be embedded directly into the machine commands on the given hardware architecture. For example, when you write something like i = i * 5 + 2, the literal values 5 and 2 become explicit (or even implicit) parts of the generated machine code. They don't exist and don't need to exist as standalone locations in data storage. There's simply no point in storing values 5 and 2 in the data memory.

It is also worth noting that on many (if not most, or all) hardware architectures floating-point literals are actually implemented as "hidden" lvalues (even though the language does not expose them as such). On platforms like x86 machine commands from floating-point group do not support embedded immediate operands. This means that virtually every floating-point literal has to be stored in (and read from) data memory by the compiler. E.g. when you write something like i = i * 5.5 + 2.1 it is translated into something like

const double unnamed_double_5_5 = 5.5;
const double unnamed_double_2_1 = 2.1;
i = i * unnamed_double_5_5 + unnamed_double_2_1;

In other words, floating-point literals often end up becoming "unofficial" lvalues internally. However, it makes perfect sense that language specification did not make any attempts to expose this implementation detail. At language level, arithmetic literals make more sense as rvalues.

Maynord answered 6/12, 2012 at 1:40 Comment(3)
So expressions like 'x' or 5 in the source code are "swallowed" in the executable during the compilation and "become part of it", whereas memory is reserved for "x" and 5.5 at runtime, so that they are created by the executable, stored in memory, but are not part of the executable file itself. Have I completely missed the point?Peremptory
Fun fact: x * 2.0 will usually compile as x+x. That really emphasizes that the "hidden lvalue" thing is truly just an asm implementation detail, and not fundamental or even related to language rules. More of a fun fact, but yeah interesting to point out. (Although the as-if rule does even allow the compiler to modify string literals, e.g. turn printf("hello\n") into puts("hello").)Benge
@Peremptory The following thread could be useful: stackoverflow.com/questions/2589949/… Usually, string literals go into the read-only section of the object file.Trolley
R
12

I'd guess that the original motive was mainly a pragmatic one: a string literal must reside in memory and have an address. The type of a string literal is an array type (char[] in C, char const[] in C++), and array types convert to pointers in most contexts. The language could have found other ways to define this (e.g. a string literal could have pointer type to begin with, with special rules concerning what it pointed to), but just making the literal an lvalue is probably the easiest way of defining what is concretely needed.

Refractor answered 4/4, 2012 at 7:38 Comment(8)
Why the down vote for what is almost certainly the correct answer?Refractor
Not my downvote. So if I understand your answer correctly, the committee just accepted what was probably suggested without delving in to whether it was the best possible approach, but just that it seemed more flexible to chose at the time?Doti
For whatever it's worth, the C99 standard just took the text from the C89 standard, and in the C89 standardization process, as I recall (from reading minutes, I was never at any actual meetings) there was some minor argument about this but it never went anywhere. The big fiery arguments were about making string literals const.Perforate
@Als Even before the committee, the specification of C has been strongly motivated by pragmatic considerations, rather than language theory or more abstract considerations. Esthetically, it would be more elegant if the all of the literal types were rvalues. Pragmatically, string literals have an array type, array types work differently than other types, and making them lvalues sorts things out with the least number of other special rules.Refractor
@Perforate IIRC, the distinction was already present in K&R C (1st edition), although my copy isn't handy to check with. Pragmatically, it's easier to say that they're lvalues than it is to write several paragraphs of special rules so that they can be rvalues, but still work as they do. Pragmatically, too, it's easier to say that they are non-const (but cannot be modified), than it is to define special conversion rules (a la C++) to avoid breaking code. K&R and the C committee have always been very pragmatic about things.Refractor
@JamesKanze: Alas, I lost my original-edition White Book some number of moves ago, so I can't check. The C89 committee had a lot of implementors on it though, hence noalias; Ritchie's "noalias must go" response was grounded in both pragmatics and theory (he demonstrated that "noalias" was self-inconsistent).Perforate
@Perforate Richie is one of those exceptional people who could master both, and understood when each was appropriate. Such people are all too rare.Refractor
@JamesKanze: alas, "was". dmr migrated to great the 11/45-in-the-sky in October 2011.Perforate
M
12

An lvalue in C++ does not always refer to an object. It can refer to a function too. Moreover, objects do not have to be referred to by lvalues. They may be referred to by rvalues, including for arrays (in C++ and C). However, in old C89, the array to pointer conversion did not apply for rvalues arrays.

Now, an rvalue denotes no, limited or soon to be an expired lifetime. A string literal, however, lives for the entire program.

So string literals being lvalues is exactly right.

Minium answered 4/4, 2012 at 8:0 Comment(2)
How about the lifetime of integral literals? And how would one refer them anyways if their address can't be taken?Doti
integer literals do not refer to an object so there is no lifetime to be considered.Minium
T
1

There is a lot of valuable information in the answers and the comments. A few points are worth highlighting.

Arrays can be rvalues. More information can be found here and here. For e.g., the following code involves an rvalue array:

template <typename T>
using alias = T;

int main() {
    return alias<int[]>{23, 37, 53}[1];
}

Thus, it is not good to reason about string literals being arrays for them to be lvalues.

It is good to remember that string literals last for the lifetime of the program. Even though value category is not lifetime, it makes sense to see why string literals are lvalues based on their lifetime.

Just like many discussions about value categories, string literals being lvalues is very much driven by pragmatic considerations about what has happened in the language development so far and what is the best that can be done from where we stand at that moment in time.

Trolley answered 14/4, 2023 at 8:51 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.