Why does str primarily exist in it's borrowed form? [duplicate]
Asked Answered
B

2

10

This is how the str type is used:

let hello = "Hello, world!";

// with an explicit type annotation
let hello: &'static str = "Hello, world!";

let hello: str = "Hello, world!"; leads to expected `str`, found `&str`

Why is the default type of the text not just str unlike all primitive types, vectors, and String? Why is it a reference?

Blimp answered 12/4, 2020 at 16:8 Comment(7)
&str means it's a reference (AKA pointer). A typical compiler will put all string constants (multiple bytes) in the binary, What get's passed around are just the pointers to this data. That's the same in C where you normally don't have a char[32] but a char* as the type of your variable.Cloakroom
I do not have any official source for this, but I think it might be because str is the only primitive that does not implement Copy. So for integer types, bool etc it simply makes no sense to have them be a reference or have a lifetime since they can be owned by the context by copying it from the executable without runtime issues. This is not the case with str however, since it is not cheap to clone, you'll almost always want a reference to it or make it a full-blown string that can be properly modified.Dippy
@MatthiasWimmer Thanks, could you explain what is meant by "in the binary". Isnt all code binary at machine level?Blimp
@Dippy There are other primitives that don't implement Copy, including slices and mutable references, and arrays and tuples if the elements of those don't implement it.Leonteen
@Frxstrem Well, a reference is kindof already a reference, but I agree that the same applies to arrays and tuples as well, not just stringsDippy
Does this answer your question? In Rust, can you own a string literal?Misbeliever
@Blimp With “binary” I ment the executable file that the compiler generates. When starting the program it get's loaded in the memory and already contains some values (like the string literals in your code). While the variables you work with and you manipulate are generated on the stack of on the heap.Cloakroom
W
11

The design decision that strings and slices are only accessible via references has many advantages:

  1. strings can have any length. So a variable of type str is not easily managed on the stack, while &str has just the size of a pointer on the stack (while the variable length data resides on the heap). Note that all other primitive types have a fixed length, every reference has a fixed length (not the data it is pointing to) and every struct (which is a composition).
  2. &str is an immutable reference. If you could define variables of type str you have to give semantics to let mut s: str = "str";. An immutable string on the stack is hard to manage, a string which could be appended is even harder.
  3. Owned str mean that every move would have to copy all chars, which costs performance. Just copying the reference and keeping the referenced data constant on the heap is cheaper. This is not really a zero-cost abstraction.
  4. str is not the only type that appears only as reference &str (same holds for slices, like &[i8]) so a change to the handling of strings would make other behavior odd (or it has to be changed accordingly).
  5. Let us assume that a function could manage variables of type str. Now you want to return a &str from this function. This cannot work because a reference lives at most as long as the value it points to (try this with any primitive type). Since str is a locally created value it cannot outlive the function. The convenience that a string literal is always a reference to a static string resolves this problem. This means that you will have to write additional code to put your owned str into a static variable, such that you could return &str. And since a static reference is the default behavior I need, it is quite convenient that I could write it with small overhead.
Were answered 12/4, 2020 at 17:18 Comment(4)
First of all, thank you very much for your response. Regarding your 1st point, how does this differ from String? That has the proprty you mention while being owned. Regarding point 3, im confused why this is? Is str is owned normally, then passing a reference to it qouldnt cipy things right?Blimp
1. String indeed has most properties of str. Yet it is not zero-cost. If you need mutability or ownership, you can use it at its cost, but you can also use a &str without. 3. Moving a reference (e.g. &str) means copying a pointer (sometimes the compiler can even omit this). Yet if you keep a hypothetic str` on the stack you will have to move/copy the value (all bytes). Of course you could reference this value - but then your type is &str. I will just add another point.Were
String is actually much more like &str than str. A String contains a pointer to its string data, so moving it is cheap. Also, it has a fixed size, like &str. The only difference between String and &str is that String owns its data, and that you can add/remove characters from a String.Notary
I'm not sure how to best put this in your answer, but a &str doesn't always refer to the heap. It might be in the text segment (string literals) or on the stack (e.g. [u8; N] array converted to str).Halhalafian
H
4

I will try to give a different perspective. In Rust there is a general convention: if you have a variable of some type T, it means that you own the data associated with T. If you have a variable of type &T, then you don't own the data.

Now let's consider a heap-allocated string. According to this convention, there should be a non-reference type that represents ownership of the allocation. And indeed such a type exists: String.

There is also a different kind of strings: &'static str. These strings are not owned by anyone: exactly one instance of string is placed inside the compiled binary file, and only pointers are passed around. There is no allocation and no deallocation, hence no ownership. In a sense, static strings are owned by the compiler, not by a programmer. This is why String can not be used to represent a static string.

Alright, so why not use &String to represent a static string? Imagine a world where the following code is a valid Rust:

let s: &'static String = "hello, world!";

This might look fine, but implementation-wise, this is suboptimal:

  1. String itself has a pointer to the actual data, so &String has to be basically a pointer to a pointer. This violates zero-cost abstraction principle: why do we introduce an excessive level of indirection, when actually the compiler statically knows the address of "hello, world!"?
  2. Even if somehow the compiler was smart enough to decide that an excessive pointer is not needed here (which would lead to a bunch of other problems), still String itself contains three 8-byte fields:

    • Data pointer;
    • Data length;
    • Allocation capacity - lets us know how much free space there is after the data.

    However, when we are talking about static strings, capacity makes zero sense: static strings are read-only.

So, in the end, when the compiler sees &'static String, we actually want it to store only a data pointer and length - otherwise, we are paying for what we will never use, which is against zero-cost abstraction principle. This looks like an arcane wizardry that we want from the compiler: the variable type is &String but the variable itself is anything but a reference to String.

To make this work, we actually need a different type, not &String, that only holds a data pointer and length. And here it is: &str! It is better than &String in a number of ways:

  1. Does not have an excessive level of indirection - only one pointer;
  2. Does not store capacity, which would be meaningless in many contexts;
  3. No black magic: we define str as a variable-sized type (the data itself), so &str is just a reference to the data.

Now you might wonder: why not introduce str instead of &str? Remeber the convention: having str would imply that you own the data, which you don't. Hence &str.

Headforemost answered 13/4, 2020 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.