Why are string literals &str instead of String in Rust?
Asked Answered
S

2

20

I'm just asking why Rust decided to use &str for string literals instead of String. Isn't it possible for Rust to just automatically convert a string literal to a String and put it on the heap instead of putting it into the stack?

Seymour answered 25/8, 2020 at 4:39 Comment(3)
Theoretically, sure. But it would be waaaaay slower and what would be the advantage?Outrun
"put it on the heap instead of putting it into the stack", I guess string literals are placed to the ro.data section.Despicable
Re: the close vote, I don't agree that this question leads to opinion-based answers. There are very clear reasons for the design.Mediation
P
23

To understand the reasoning, consider that Rust wants to be a systems programming language. In general, this means that it needs to be (among other things) (a) as efficient as possible and (b) give the programmer full control over allocations and deallocations of heap memory. One use case for Rust is for embedded programming where memory is very limited.

Therefore, Rust does not want to allocate heap memory where this is not strictly necessary. String literals are known at compile time and can be written into the ro.data section of an executable/library, so they don't consume stack or heap space.

Now, given that Rust does not want to allocate the values on the heap, it is basically forced to treat string literals as &str: Strings own their values and can be moved and dropped, but how do you drop a value that is in ro.data? You can't really do that, so &str is the perfect fit.

Furthermore, treating string literals as &str (or, more accurately &'static str) has all the advantages and none of the disadvantages. They can be used in multiple places, can be shared without worrying about using heap memory and never have to be deleted. Also, they can be converted to owned Strings at will, so having them available as String is always possible, but you only pay the cost when you need to.

Parham answered 25/8, 2020 at 7:42 Comment(6)
Why is it a problem that the value in ro.data cannot be dropped? Could Rust just pretend it is dropped and carry on or would that cause problems? (Edit: I'm actually wondering why str exists at all, and string literals seem to be an important part of the answer.)Groschen
The problem is that String owns the data and drops that data when it is dropped itself. It might be possible to check if the data it owns is in ro.data (though it might also be difficult cross platform) and then avoid the drop, but its making the implementation much more complicated. The types String and str have their equivalent in Vec<T> and [T]. With rust's model of ownership and shared references you really need something like &str, not just because of string literals.Parham
Also, consider that String owns and can modify its data, not really something you want to do with a string literal / thing in ro.data.Parham
Thanks @Paul. I didn't know that even a non-mut String has to assume its data is writable (because it can be made mut by its owner). And string literals have to be read-only for performance reasons (because then they can stay in ro.data). So there have to be two types to represent strings.Groschen
Hi, I am a newbie. You said string literals has none of the disadvantages. Regarding reverse engineering the rust code, will string literals lead to easier to do reverse enginneering than String? I don't really know much about reverse engineering too, but I always want the program to be harder to reverse engineer.Edora
@Edora I don't know - I'm not a reverse engineering expert.Parham
T
11

To create a String, you have to:

  • reserve a place on the heap (allocate), and
  • copy the desired content from a read-only location to the freshly allocated area.

If a string literal like "foo" did both, every string would effectively be allocated twice: once inside the executable as the read-only string, and the other time on the heap. You simply couldn't just refer to the original read-only data stored in the executable.

&str literals give you access to the most efficient string data: the one present in the executable image on startup, put there by the compiler along with the instructions that make up the program. The data it points to is not stored on the stack, what is stack-allocated is just the pointer/size pair, as is the case with any Rust slice.

Making "foo" desugar into what is now spelled "foo".to_owned() would make it slower and less space-efficient, and would likely require another syntax to get a non-allocating &str. After all, you don't want x == "foo" to allocate a string just to throw it away immediately. Languages like Python alleviate this by making their strings immutable, which allows them to cache strings mentioned in the source code. In Rust mutating String is often the whole point of creating it, so that strategy wouldn't work.

Tattan answered 25/8, 2020 at 6:52 Comment(7)
"You simply couldn't just refer to the original read-only data stored in the executable." Why not? I would assume that to be possible as long as the String is not mut. And if the data is to be mutated, then it has to be copied anyway. So it seems that either way using a String should work without penalty. What am I missing?Groschen
@Groschen Why not? - Because String is guaranteed to refer to heap-allocated data. You can even convert it to Box<str> and Vec<u8> without reallocation. I would assume that to be possible as long as the String is not mut. - there is no such thing as a non-mut String - as long as you own it, you can always make it mut.Tattan
Thank you @user4815162342. I did not know that you can make immutable variables mutable. I guess it makes sense, because an important reason for immutability/consts is to ensure that different parts of a program do not mangle data that other parts rely on. Rust solves that already.Groschen
let mut s = s reuses the same memory location, so it indeed effectively makes the original String mut (instead of e.g. copying the data). So things will break if the non-mut String could refer to read-only data, something I didn't expect, thanks again.Groschen
@Groschen It must necessarily reuse the same memory location because Rust's moves are always bitwise copies of the struct itself - there are no move (or copy) constructors that could duplicate pointed-to data. Given its public API, a String is bound to be implemented as triple of (pointer, capacity, length), and moving it just copies those three values and marks the old ones as dead, so the compiler doesn't try to Drop them.Tattan
Yeah I figured that must be what was going on, so I wanted to proof it (and share the proof here for future readers).Groschen
I went looking for functions that modify str in place, to see what would happen if those are called on a string literal, Apparently these do exist, e.g. make_ascii_uppercase(), which requires a &mut str. But the only way I've been able to create a &mut str is to copy a &str through a heap-allocated structure like a String or a Box. Because let mut s = s; does not work on &str because 'it is behind an & reference'. So it does seem impossible to call such modify-in-place functions on a string literal.Groschen

© 2022 - 2024 — McMap. All rights reserved.