Why does Rust have both String
and str
? What are the differences between them, and when should one be used over the other? Is one of them getting deprecated?
String
is the dynamic heap string type, like Vec
: use it when you need to own or modify your string data.
str
is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer. This means that str
most commonly2 appears as &str
: a reference to some UTF-8 data, normally called a "string slice" or just a "slice". A slice is just a view onto some data, and that data can be anywhere, e.g.
In static storage: a string literal
"foo"
is a&'static str
. The data is hardcoded into the executable and loaded into memory when the program runs.Inside a heap allocated
String
:String
dereferences to a&str
view of theString
's data.On the stack: e.g. the following creates a stack-allocated byte array, and then gets a view of that data as a
&str
:use std::str; let x: [u8; 3] = [b'a', b'b', b'c']; let stack_str: &str = str::from_utf8(&x).unwrap();
In summary, use String
if you need owned string data (like passing strings to other threads, or building them at runtime), and use &str
if you only need a view of a string.
This is identical to the relationship between a vector Vec<T>
and a slice &[T]
, and is similar to the relationship between by-value T
and by-reference &T
for general types.
1 A str
is fixed-length; you cannot write bytes beyond the end, or leave trailing invalid bytes. Since UTF-8 is a variable-width encoding, this effectively forces all str
s to be immutable in many cases. In general, mutation requires writing more or fewer bytes than there were before (e.g. replacing an a
(1 byte) with an ä
(2+ bytes) would require making more room in the str
). There are specific methods that can modify a &mut str
in place, mostly those that handle only ASCII characters, like make_ascii_uppercase
.
2 Dynamically sized types allow things like Rc<str>
for a sequence of reference counted UTF-8 bytes since Rust 1.2. Rust 1.21 allows easily creating these types.
[u8; N]
. –
Graces Rc<str>
and Arc<str>
are now usable via the standard library. –
Fontes String
gets out of scope, the slice keeps the entire original string value in memory, right? (the question probably makes more sense when thinking of a slice as a substring). –
Ethelyn &str
slice pointing into a String
that goes out of scope and is deallocated. In a garbage collected language the slice can exist after the main owner disappears, but in Rust it cannot: the compiler forces the programmer to explicitly choose how to handle it, e.g. don't share memory (by using .to_owned()
to make a separate String
), or share memory like you say (by using something like kimundi.github.io/owning-ref-rs/owning_ref/… ). –
Graces .len()
method returns it). The opposite of dynamic is "static", which you are right about too: a type like u32
has a statically known size of 4, while the type str
does not (there's know way to know how many bytes are in an arbitrary value of type str
) –
Graces I have a C++ background and I found it very useful to think about String
and &str
in C++ terms:
- A Rust
String
is like astd::string
; it owns the memory and does the dirty job of managing memory. - A Rust
&str
is like achar*
(but a little more sophisticated); it points us to the beginning of a chunk in the same way you can get a pointer to the contents ofstd::string
.
Are either of them going to disappear? I do not think so. They serve two purposes:
String
keeps the buffer and is very practical to use. &str
is lightweight and should be used to "look" into strings. You can search, split, parse, and even replace chunks without needing to allocate new memory.
&str
can look inside of a String
as it can point to some string literal. The following code needs to copy the literal string into the String
managed memory:
let a: String = "hello rust".into();
The following code lets you use the literal itself without a copy (read-only though):
let a: &str = "hello rust";
string_view
is an abomination. Imagine auto_ptr<T>
level bad. –
Allaallah It is str
that is analogous to String
, not the slice of it.
An str
is a string literal, basically a pre-allocated text:
"Hello World"
This text has to be stored somewhere, so it is stored in the data section of the executable file along with the program’s machine code, as sequence of bytes ([u8]).
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ H │ e │ l │ l │ o │ │ W │ o │ r │ l │ d │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ 72 │ 101 │ 108 │ 108 │ 111 │ 32 │ 87 │ 111 │ 114 │ 108 │ 100 │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
Since text can be of any length, they are dynamically-sized.
Now that we stored the text, we need a way to access it and that is where the slice comes in.
A slice,[T]
, is a view into a block of memory. Whether mutable or not, a slice always borrows and that is why it is always behind a pointer, &
.
Lets explain the meaning of being dynamically sized.
Some programming languages, like C, appends a zero byte (\0
) at the end of its strings and keeps a record of the starting address. To determine a string's length, program has to walk through the raw bytes from starting position until finding this zero byte.
However Rust takes a different approach: It uses a slice. A slice stores the address where a str
starts and how many byte it takes. It is better than appending zero byte because calculation is done in advance during compilation.
The text size can be known in advance but still changes with the underlying data which makes it dynamically sized.
If we go back to "Hello World" expression, it returns a fat pointer, containing both the address of the actual data and its length. This pointer will be our handle to the actual data and it will also be stored in our program. Now data is behind a pointer and the compiler knows its size at compile time.
Since text is stored in the source code, it will be valid for the entire lifetime of the running program, hence will have the static
lifetime.
So, return value of "Hello Word" expression should reflect these two characteristics, and it does:
let s: &'static str = "Hello World";
You may ask why its type is written as str
but not as [u8]
, it is because data is always guaranteed to be a valid UTF-8 sequence. Not all UTF-8 characters are single byte, some take 4 bytes. So [u8] would be inaccurate.
If you disassemble a compiled Rust program and inspect the executable file, you will see multiple str
s are stored adjacent to each other in the data section without any indication where one starts and the other ends.
Compiler takes this one step further: If identical static text is used at multiple locations in the program, Rust compiler will optimize the program by creating a single binary block for all duplicate values.
For example, compiler creates a single continuous binary with the content of "Hello World" for the following code even though we use three different literals with "Hello World"
:
let x: &'static str = "Hello World";
let y: &'static str = "Hello World";
let z: &'static str = "Hello World";
String
, on the other hand, is a specialized type that stores its value as vector of u8. Take a look at how String
type is defined in the source code:
pub struct String {
vec: Vec<u8>,
}
Being vector means it is heap allocated and resizable like any other vector value.
However, if you look carefully you will see vec
field is kept private. Being private means, we can not create a String instance directly but through provided methods. Why it is kept private is because not all stream of bytes produce a valid utf-8 characters and direct interaction with the underlying bytes may corrupt the data. Through this controlled access compiler enforces data is valid and remains valid.
The word specialized in the type definition refers to this feature, feature of not permitting arbitrary access but enforcing certain checks on the data through controlled access in order to provide certain guarantees. Other than that, it is just a vector.
In summary, a String
is a resizable buffer holding UTF-8 text. This buffer is allocated on the heap, so it can grow as needed or requested. We can fill this buffer or can change its content anyway we see fit.
There are several methods defined on String type to create String instance, new is one of them:
pub const fn new() -> String {
String { vec: Vec::new() }
}
We can use it to create a valid String.
let s = String::new();
println("{}", s);
Unfortunately it does not accept input parameter. So result will be valid but an empty string but it will grow like any other vector when capacity is not enough to hold the assigned value. But application performance will take a hit, as growing requires re-allocation.
We can fill the underlying vector with initial values from different sources:
From a string literal
let a = "Hello World";
let s = String::from(a);
Please note that an str
is still created and its content is copied to the heap allocated vector via String.from
. If we check the executable binary we will see raw bytes in data section with the content "Hello World". This is very important detail some people miss.
From raw parts
let ptr = s.as_mut_ptr();
let len = s.len();
let capacity = s.capacity();
let s = String::from_raw_parts(ptr, len, capacity);
From a character
let ch = 'c';
let s = ch.to_string();
From vector of bytes
let hello_world = vec![72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100];
// We know it is valid sequence, so we can use unwrap
let hello_world = String::from_utf8(hello_world).unwrap();
println!("{}", hello_world); // Hello World
Here we have another important detail. A vector might have any value, there is no guarantee its content will be a valid UTF-8, so Rust forces us to take this into consideration by returning a Result<String, FromUtf8Error>
rather than a String
.
From input buffer
use std::io::{self, Read};
fn main() -> io::Result<()> {
let mut buffer = String::new();
let stdin = io::stdin();
let mut handle = stdin.lock();
handle.read_to_string(&mut buffer)?;
Ok(())
}
Or from any other type that implements ToString
trait
Since String
is a vector under the hood, it will exhibit some vector characteristics:
- a pointer: The pointer points to an internal buffer that stores the data.
- length: The length is the number of bytes currently stored in the buffer.
- capacity: The capacity is the size of the buffer in bytes. So, the length will always be less than or equal to the capacity.
And it delegates some properties and methods to vectors:
pub fn capacity(&self) -> usize {
self.vec.capacity()
}
Most of the examples uses String::from
, which makes people get confused thinking why create String from another string.
It is a long read, hope it helps.
str
, only used as &str
, is a string slice, a reference to a UTF-8 byte array.
String
is what used to be ~str
, a growable, owned UTF-8 byte array.
~str
is now Box<str>
–
Petes ~str
was growable while Box<str>
is not growable. (That ~str
and ~[T]
were magically growable, unlike any other ~
-object, was exactly why String
and Vec<T>
were introduced, so that the rules were all straightforward and consistent.) –
Induct String
is an owned type... it has exclusive ownership of the contents of the string; and when it passes out of scope, the memory for the contents of the string will be freed immediately. For this reason, any substring can’t be of the type String
... [otherwise]... when one passed out of scope, the other would become invalid... And so instead it is that slices (substrings) use a type which is a reference to the contents that something else owns—&str
..." –
Selfreproach They are actually completely different. First off, a str
is nothing but a type level thing; it can only be reasoned about at the type level because it's a so-called dynamically-sized type (DST). The size the str
takes up cannot be known at compile time and depends on runtime information — it cannot be stored in a variable because the compiler needs to know at compile time what the size of each variable is. A str
is conceptually just a row of u8
bytes with the guarantee that it forms valid UTF-8. How large is the row? No one knows until runtime hence it can't be stored in a variable.
The interesting thing is that a &str
or any other pointer to a str
like Box<str>
does exist at runtime. This is a so-called "fat pointer"; it's a pointer with extra information (in this case the size of the thing it's pointing at) so it's twice as large. In fact, a &str
is quite close to a String
(but not to a &String
). A &str
is two words; one pointer to a the first byte of a str
and another number that describes how many bytes long the the str
is.
Contrary to what is said, a str
does not need to be immutable. If you can get a &mut str
as an exclusive pointer to the str
, you can mutate it and all the safe functions that mutate it guarantee that the UTF-8 constraint is upheld because if that is violated then we have undefined behaviour as the library assumes this constraint is true and does not check for it.
So what is a String
? That's three words; two are the same as for &str
but it adds a third word which is the capacity of the str
buffer on the heap, always on the heap (a str
is not necessarily on the heap) it manages before it's filled and has to re-allocate. the String
basically owns a str
as they say; it controls it and can resize it and reallocate it when it sees fit. So a String
is as said closer to a &str
than to a str
.
Another thing is a Box<str>
; this also owns a str
and its runtime representation is the same as a &str
but it also owns the str
unlike the &str
but it cannot resize it because it does not know its capacity so basically a Box<str>
can be seen as a fixed-length String
that cannot be resized (you can always convert it into a String
if you want to resize it).
A very similar relationship exists between [T]
and Vec<T>
except there is no UTF-8 constraint and it can hold any type whose size is not dynamic.
The use of str
on the type level is mostly to create generic abstractions with &str
; it exists on the type level to be able to conveniently write traits. In theory str
as a type thing didn't need to exist and only &str
but that would mean a lot of extra code would have to be written that can now be generic.
&str
is super useful to be able to to have multiple different substrings of a String
without having to copy; as said a String
owns the str
on the heap it manages and if you could only create a substring of a String
with a new String
it would have to be copied because everything in Rust can only have one single owner to deal with memory safety. So for instance you can slice a string:
let string: String = "a string".to_string();
let substring1: &str = &string[1..3];
let substring2: &str = &string[2..4];
We have two different substring str
s of the same string. string
is the one that owns the actual full str
buffer on the heap and the &str
substrings are just fat pointers to that buffer on the heap.
Rust &str
and String
String
:
- Rust's owned String type, the string itself lives on the heap and therefore is mutable and can alter its size and contents.
- Because String is owned when the variables which owns the string goes out of scope the memory on the heap will be freed.
- Variables of type
String
are fat pointers (pointer + associated metadata) - The fat pointer is 3 * 8 bytes (wordsize) long and consists of the following 3 elements:
- Pointer to actual data on the heap, it points to the first character
- Length of the string (# of characters)
- Capacity of the string on the heap
&str
:
- Rust's non-owned String type, it's immutable by default. The string itself lives somewhere else in memory usually on the heap or in
'static
memory. - Because String is non-owned when
&str
variables go out of scope the memory of the string will not be freed. - Variables of type
&str
are fat pointers (pointer + associated metadata) - The fat pointer is 2 * 8 bytes (wordsize) long and consists of the following 2 elements:
- Pointer to actual data on the heap, it points to the first character
- Length of the string (# of characters)
Example:
use std::mem;
fn main() {
// on 64 bit architecture:
println!("{}", mem::size_of::<&str>()); // 16
println!("{}", mem::size_of::<String>()); // 24
let string1: &'static str = "abc";
// string will point to 'static memory which lives throughout the whole program
let ptr = string1.as_ptr();
let len = string1.len();
println!("{}, {}", unsafe { *ptr as char }, len); // a, 3
// len is 3 characters long so 3
// pointer to the first character points to letter a
{
let mut string2: String = "def".to_string();
let ptr = string2.as_ptr();
let len = string2.len();
let capacity = string2.capacity();
println!("{}, {}, {}", unsafe { *ptr as char }, len, capacity); // d, 3, 3
// pointer to the first character points to letter d
// len is 3 characters long so 3
// string has now 3 bytes of space on the heap
string2.push_str("ghijk"); // we can mutate String type, capacity and length will also change
println!("{}, {}", string2, string2.capacity()); // defghijk, 8
} // memory of string2 on the heap will be freed here because owner goes out of scope
}
std::String
is simply a vector of u8
. You can find its definition in source code . It's heap-allocated and growable.
#[derive(PartialOrd, Eq, Ord)]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct String {
vec: Vec<u8>,
}
str
is a primitive type, also called string slice. A string slice has fixed size. A literal string like let test = "hello world"
has &'static str
type. test
is a reference to this statically allocated string.
&str
cannot be modified, for example,
let mut word = "hello world";
word[0] = 's';
word.push('\n');
str
does have mutable slice &mut str
, for example:
pub fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str)
let mut s = "Per Martin-Löf".to_string();
{
let (first, last) = s.split_at_mut(3);
first.make_ascii_uppercase();
assert_eq!("PER", first);
assert_eq!(" Martin-Löf", last);
}
assert_eq!("PER Martin-Löf", s);
But a small change to UTF-8 can change its byte length, and a slice cannot reallocate its referent.
&mut str
that did not rely on a String
, that is, without to_string()
, because why bother with str if you have String already. This works: let mut s: Box<str> = "Per Martin-Löf".into(); let (first, last) = s.split_at_mut(3); first.make_ascii_uppercase(); assert_eq!("PER Martin-Löf", &*s);
–
Dereism String
sound redundant with Vec<u8>
. The difference between the two is that a String
always contains valid UTF-8 encoded text while a Vec<u8>
can hold any sequence of bytes. –
Kandrakandy In easy words, String
is datatype stored on heap (just like Vec
), and you have access to that location.
&str
is a slice type. That means it is just reference to an already present String
somewhere in the heap.
&str
doesn't do any allocation at runtime. So, for memory reasons, you can use &str
over String
. But, keep in mind that when using &str
you might have to deal with explicit lifetimes.
str
is view
of already present String
in heap. –
Bugs In these 3 different types
let noodles = "noodles".to_string(); let oodles = &noodles[1..]; let poodles = "ಠ_ಠ"; // this is string literal
A String has a resizable buffer holding UTF-8 text. The buffer is allocated on the heap, so it can resize its buffer as needed or requested. In the example, "noodles" is a String that owns an eight-byte buffer, of which seven are in use. You can think of a String as a Vec that is guaranteed to hold well-formed UTF-8; in fact, this is how
String
is implemented.A
&str
is a reference to a run of UTF-8 text owned by someone else: it “borrows” the text. In the example, oodles is a &str referring to the last six bytes of the text belonging to "noodles", so it represents the text “oodles.” Like other slice references, a&str
is afat pointer
, containing both the address of the actual data and its length. You can think of a&str
as being nothing more than a &[u8] that is guaranteed to hold well-formed UTF-8.A
string literal
is a&str
that refers to preallocated text, typically stored in read-only memory along with the program’s machine code. In the preceding example, poodles is a string literal, pointing to seven bytes that are created when the program begins execution and that last until it exits.This is how they are stored in memory
Reference:Programming Rust,by Jim Blandy, Jason Orendorff, Leonora F . S. Tindall
Some Usages
example_1.rs
fn main(){
let hello = String::("hello");
let any_char = hello[0];//error
}
example_2.rs
fn main(){
let hello = String::("hello");
for c in hello.chars() {
println!("{}",c);
}
}
example_3.rs
fn main(){
let hello = String::("String are cool");
let any_char = &hello[5..6]; // = let any_char: &str = &hello[5..6];
println!("{:?}",any_char);
}
Shadowing
fn main() {
let s: &str = "hello"; // &str
let s: String = s.to_uppercase(); // String
println!("{}", s) // HELLO
}
function
fn say_hello(to_whom: &str) { //type coercion
println!("Hey {}!", to_whom)
}
fn main(){
let string_slice: &'static str = "you";
let string: String = string_slice.into(); // &str => String
say_hello(string_slice);
say_hello(&string);// &String
}
Concat
// String is at heap, and can be increase or decrease in its size
// The size of &str is fixed.
fn main(){
let a = "Foo";
let b = "Bar";
let c = a + b; //error
// let c = a.to_string + b;
}
Note that String
and &str
are different types and for 99% of the time, you only should care about &str
.
&str
is like a view (slice) of data", it can sometimes sound confusing; This graphic shows it very well. The only thing that this answer doesn't really help with is that String
is owned and mutable, whereas &str
is immutable and non-owned. –
Excrement For C# and Java people:
- Rust'
String
===StringBuilder
- Rust's
&str
=== (immutable) string
I like to think of a &str
as a view on a string, like an interned string in Java / C# where you can't change it, only create a new one.
In Rust, str is a primitive type that represents a sequence of Unicode scalar values, also known as a string slice. This means that it is a read-only view into a string, and it does not own the memory that it points to. On the other hand, String is a growable, mutable, owned string type. This means that when you create a String, it will allocate memory on the heap to store the contents of the string, and it will deallocate this memory when the String goes out of scope. Because String is growable and mutable, you can change the contents of a String after you have created it.
In general, str is used when you want to refer to a string slice that is stored in another data structure, such as a String. String is used when you want to create and own a string value.
String:
- A String is a dynamic, heap-allocated, grow-able sequence of characters.
- Used for storing and manipulating text data that can change in size.
- Represents owned string data.
Example:
let s: String = String::from("Hello");
println!("{}", s);
// Output: Hello
&str (String Slice):
- A &str is an immutable reference to a sequence of characters (a slice) stored in memory.
- Used for working with string data without taking ownership.
- Can be used with string literals or borrow from String.
Example:
fn print_length(s: &str) {
println!("Length: {}", s.len());
}
let greeting = "Hello";
print_length(greeting);
// Output: Length: 5
Keep in mind that String and &str are related, but they serve different purposes and have different ownership characteristics in Rust.
Here is a quick and easy explanation.
String
- A growable, ownable heap-allocated data structure. It can be coerced to a &str
.
str
- is (now, as Rust evolves) mutable, fixed-length string that lives on the heap or in the binary. You can only interact with str
as a borrowed type via a string slice view, such as &str
.
Usage considerations:
Prefer String
if you want to own or mutate a string - such as passing the string to another thread, etc.
Prefer &str
if you want to have a read-only view of a string.
&str
and &mut str
in all the replies. Two different things. –
Allaallah © 2022 - 2024 — McMap. All rights reserved.
&str
is made up of two components: a pointer to some bytes, and a length." – Equestrienne