Why are string literals l-value while all other literals are r-value?

Go To StackoverFlow.com

47

C++03 5.1 Primary expressions
§2:

A literal is a primary expression. Its type depends on its form (2.13). A string literal is an lvalue; all other literals are rvalues.

What is the rationale behind this?
As I understand, string literals are objects, while all other literals are not.And an l-value always refers to an object.

But the question then is why are string literals objects while all other literals are not?
This rationale seems to me more like an egg or chicken problem.

I understand the answer to this may be related to hardware architecture rather than C/C++ as programming languages, nevertheless I would like to hear the same.

Note: I am tagging this question as c & c++ both because C99 standard also has similar quotations, specifically §6.5.1.4

2012-04-04 03:23
by Alok Save
Lvalues are not objects. Lvalues are values which can appear on the left-hand side of an assignment, such as variables, members of structures, and array element lookups. (L = Left. - duskwuff 2012-04-04 03:26
@duskwuff: The Committee begs to differ. Per 6.3.2.1, "An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined." Per the footnote (53) referenced in that citation, an lvalue should be thought of as an "object locator value" - R.. 2012-04-04 03:31
@JohnCalsbeek C++11 'fixed' that, e.g. alias<T[N]> {} is possible now. U {}.arr is also an rvalue of array type if arr is declared as such in the class definition for U - Luc Danton 2012-04-04 03:50
BTW, a better approximation of lvalue is "syntactically valid operand of the & operator". I suspect that definition is actually equivalent to the standard's definition, unless I'm missing something.. - R.. 2012-04-04 04:10
Update: It is only approximate. Register-storage-class objects are not valid as operands of &, but are lvalues. Also, I'm rather unclear on why it's (presumably) invalid to apply & to the return value of a function, which is specified to have object type.. - R.. 2012-04-04 04:56
@r.. and in C, function designators are not lvalues - Johannes Schaub - litb 2012-04-04 09:35


28

A string literal is a literal with array type, and in C there is no way for an array type to exist in an expression except as an lvalue. String literals could have been specified to have pointer type (rather than array type that usually decays to a pointer) pointing to the string "contents", but this would make them rather less useful; in particular, the sizeof operator could not be applied to them.

Note that C99 introduced compound literals, which are also lvalues, so having a literal be an lvalue is no longer a special exception; it's closer to being the norm.

2012-04-04 03:34
by R..
Isn't puts("hello") an example of an expression with an array type that could be an rvalue - Pubby 2012-04-04 03:38
puts("hello") is an expression with type int - R.. 2012-04-04 03:40
I meant where "hello" is an rvalue - Pubby 2012-04-04 03:41
"hello" is not an rvalue. It's an lvalue array which decays to an expression of type pointer-to-char - R.. 2012-04-04 03:43
Yes, but you said "no way for an array type to exist in an expression except as an lvalue.". Wouldn't that code work if the literal was an rvalue - Pubby 2012-04-04 03:43
In that case it might, but what if you passed a string literal to a function that wanted to store it? You might expect the literal to go "out of scope" after the function call, so a copy would be required - John Calsbeek 2012-04-04 03:47
The literal can't have array type without being an lvalue, because of the way array decay to pointers works. If it did not have object type, there would be no address of its initial element for it to decay to. As my (slightly revised) answer states, the language could have been designed such that string literals are originally of pointer type, without any decay, and then they would not need to be lvalues. But that would be a lot less useful in practice - R.. 2012-04-04 03:48
+1. This is correct - Timothy Jones 2012-04-04 03:56
It is possible to have rvalue array types - for example if you have struct x { int a[2]; }; struct x foo(void); then foo().a is an rvalue array. Also, given struct x bar, quux; then (1 ? bar : quux).a is an rvalue array - caf 2012-04-04 04:21
@caf: C does not define "rvalue", which is probably a good thing, because it's always unclear whether the intended meaning is "non-lvalue" or just "any expression value". Your examples are definitely lvalues per the definition of an lvalue ("an expression with an object type...") and 6.5.2.2, which reads [starting new comment] - R.. 2012-04-04 04:48
@R.. Could you comment on my answer below? There seems to be a strong view that I'm incorrect, but I think this may be a place where C and C++ differ. I'd like to check before I delete the answer : - Timothy Jones 2012-04-04 04:48
"If the expression that denotes the called function has type pointer to function returning an object type, the function call expression has the same type as that object type, and has the value determined as specified in 6.8.6.4. Otherwise, the function call has type void. If an attempt is made to modify the result of a function call or to access it after the next sequence point, the behavior is undefined. - R.. 2012-04-04 04:49
@R.: That definition does not seem complete, because for example the expression +1 has object type (int) but is not ordinarily considered an lvalue. Note that Example 1 in C99 §6.5.2.3 specifically calls out f().x as being "a valid postfix expression but is not an lvalue" - caf 2012-04-04 06:24
The (C) standard could have defined string literals as rvalues, and then added a number of special rules to make them work as they do. Defining them as lvalues eliminates the need for most of the special rules. (In C, there's still the special rule that they don't have a const type, but you're not allowed to modify them. In C++, the special rule is that they have a const type, but there is an implicit conversion which will remove the const. In both cases, these special rules only apply to string literals. - James Kanze 2012-04-04 07:42
@caf is right that there are array "rvalues" (or just plain values), due to struct return values. The standard is pretty weak in terms of describing what one can do with them, though. The big issue in implementations is that they may (or may not) be stored in registers (for sufficiently small structures) or similar "ephemeral" storage, and array manipulation—even something as simple as subscripting to extract one element—can overwrite this storage; but "normal" array access requires a fairly durable pointer to the base of the array. How long is that pointer valid? Who knows - torek 2012-04-04 07:58
@torek: If this is correct, then I believe subscripting them is illegal unless there's a special case allowing it. Even if there is, I see no reason the array would need to exist temporarily in memory.. - R.. 2012-04-04 11:41
The conclusions we drew, way back when, were that the only "truly safe" thing to do with a struct-valued function was either: struct_instance = f(args); or (void) f(args);. C99 tries to make it clear that you can also select a struct element and (subsequently) an array element, but not grab hold of a pointer to the entire array. This works right in gcc, but it's probably a good test for other compilers. (I'd guess the Plum-Hall test suite has a test like this by now. - torek 2012-04-04 18:04
Can you provide a citation where C99 tries to make it clear that this is allowed - R.. 2012-04-04 23:56
Also if it is not array type, template deduction of size of string literal is not possibl - K.K 2013-04-02 05:50


10

String literals are arrays - objects of inherently unpredictable size (i.e of user-defined and possibly large size). In general case, there's simply no other way to represent such literals except as objects in memory, i.e. as lvalues. In C99 this also applies to compound literals, which are also lvalues.

Any attempts to artificially hide the fact that string literals are lvalues at the language level would produce a considerable number of completely unnecessary difficulties, since the ability to point to a string literal with a pointer as well as the ability to access it as an array relies critically on its lvalue-ness being visible at the language level.

Meanwhile, literals of scalar types have fixed compile-time size. At the same time, such literals are very likely to be embedded directly into the machine commands on the given hardware architecture. For example, when you write something like i = i * 5 + 2, the literal values 5 and 2 become explicit (or even implicit) parts of the generated machine code. They don't exist and don't need to exist as standalone locations in data storage. There's simply no point in storing values 5 and 2 in the data memory.

It is also worth noting that on many (if not most, or all) hardware architectures floating-point literals are actually implemented as "hidden" lvalues (even though the language does not expose them as such). On platforms like x86 machine commands from floating-point group do not support embedded immediate operands. This means that virtually every floating-point literal has to be stored in (and read from) data memory by the compiler. E.g. when you write something like i = i * 5.5 + 2.1 it is translated into something like

const double unnamed_double_5_5 = 5.5;
const double unnamed_double_2_1 = 2.1;
i = i * unnamed_double_5_5 + unnamed_double_2_1;

In other words, floating-point literals often end up becoming "unofficial" lvalues internally. However, it makes perfect sense that language specification did not make any attempts to expose this implementation detail. At language level, arithmetic literals make more sense as rvalues.

2012-12-06 01:40
by AnT
So expressions like 'x' or 5 in the source code are "swallowed" in the executable during the compilation and "become part of it", whereas memory is reserved for "x" and 5.5 at runtime, so that they are created by the executable, stored in memory, but are not part of the executable file itself. Have I completely missed the point - Enrico Maria De Angelis 2018-11-27 17:05
Fun fact: x * 2.0 will usually compile as x+x. That really emphasizes that the "hidden lvalue" thing is truly just an asm implementation detail, and not fundamental or even related to language rules. More of a fun fact, but yeah interesting to point out. (Although the as-if rule does even allow the compiler to modify string literals, e.g. turn printf("hello\n") into puts("hello"). - Peter Cordes 2019-02-13 14:13


8

An lvalue in C++ does not always refer to an object. It can refer to a function too. Moreover, objects do not have to be referred to by lvalues. They may be referred to by rvalues, including for arrays (in C++ and C). However, in old C89, the array to pointer conversion did not apply for rvalues arrays.

Now, an rvalue denotes no, limited or soon to be an expired lifetime. A string literal, however, lives for the entire program.

So string literals being lvalues is exactly right.

2012-04-04 08:00
by Johannes Schaub - litb
How about the lifetime of integral literals? And how would one refer them anyways if their address can't be taken - Alok Save 2012-04-04 08:26
integer literals do not refer to an object so there is no lifetime to be considered - Johannes Schaub - litb 2012-04-04 08:44


6

I'd guess that the original motive was mainly a pragmatic one: a string literal must reside in memory and have an address. The type of a string literal is an array type (char[] in C, char const[] in C++), and array types convert to pointers in most contexts. The language could have found other ways to define this (e.g. a string literal could have pointer type to begin with, with special rules concerning what it pointed to), but just making the literal an lvalue is probably the easiest way of defining what is concretely needed.

2012-04-04 07:38
by James Kanze
Why the down vote for what is almost certainly the correct answer - James Kanze 2012-04-04 07:47
Not my downvote. So if I understand your answer correctly, the committee just accepted what was probably suggested without delving in to whether it was the best possible approach, but just that it seemed more flexible to chose at the time - Alok Save 2012-04-04 08:25
For whatever it's worth, the C99 standard just took the text from the C89 standard, and in the C89 standardization process, as I recall (from reading minutes, I was never at any actual meetings) there was some minor argument about this but it never went anywhere. The big fiery arguments were about making string literals const - torek 2012-04-04 08:30
@Als Even before the committee, the specification of C has been strongly motivated by pragmatic considerations, rather than language theory or more abstract considerations. Esthetically, it would be more elegant if the all of the literal types were rvalues. Pragmatically, string literals have an array type, array types work differently than other types, and making them lvalues sorts things out with the least number of other special rules - James Kanze 2012-04-04 08:51
@torek IIRC, the distinction was already present in K&R C (1st edition), although my copy isn't handy to check with. Pragmatically, it's easier to say that they're lvalues than it is to write several paragraphs of special rules so that they can be rvalues, but still work as they do. Pragmatically, too, it's easier to say that they are non-const (but cannot be modified), than it is to define special conversion rules (a la C++) to avoid breaking code. K&R and the C committee have always been very pragmatic about things - James Kanze 2012-04-04 08:57
@JamesKanze: Alas, I lost my original-edition White Book some number of moves ago, so I can't check. The C89 committee had a lot of implementors on it though, hence noalias; Ritchie's "noalias must go" response was grounded in both pragmatics and theory (he demonstrated that "noalias" was self-inconsistent) - torek 2012-04-04 09:04
@torek Richie is one of those exceptional people who could master both, and understood when each was appropriate. Such people are all too rare - James Kanze 2012-04-04 09:32
@JamesKanze: alas, "was". dmr migrated to great the 11/45-in-the-sky in October 2011 - torek 2012-04-04 09:36
Ads