Under the hood, are vectors in Rust returned by reference or value?

Asked 11/9, 2018 at 1:16 Answered 17/4, 2024 at 19:9

I am trying to learn the ins and outs of memory in Rust. When a vector is created inside a function and then returned, is a reference returned or is the entire vector copied?

Example:

use std::io;

fn line_to_ints() -> Vec<u32> {
    let mut line = String::new();

    io::stdin()
        .read_line(&mut line)
        .expect("Failed to read line");

    return line
        .split(" ")
        .map(|x| x.parse().expect("Not an integer!"))
        .collect();
}

Will the return behavior here also be the same for all other non-primitive data types?

Unlike Is there any way to return a reference to a variable created in a function?, I would like to know a bit more about what is happening under the hood. The answers to that question do not provide clarity as to whether or not the vector is created and then copied to a new location, or ownership of the pointer is returned I understand vectors are created on the heap so I imagine a pointer is involved.

Consignor answered 11/9, 2018 at 1:16 Comment(3)

Idiomatic Rust does not use the return keyword at the end of a block. Instead, you leave off the return and the semicolon. – Deify 11/9, 2018 at 1:19

"is a reference returned or is the entire vector copied?", that depend of what you mean by the entire vector, your function create a vec and give the ownership to the caller, it's a move, don't worry this should be fine, without any problem of performance. – Millsaps 11/9, 2018 at 1:22

Question updated – Consignor 11/9, 2018 at 1:29

is a reference returned

No. It cannot be because there's nothing to reference once the function ends. This is covered in detail in Is there any way to return a reference to a variable created in a function?.

is the entire vector copied

Yes, but probably not how you mean it. A Vec is basically defined as

struct Vec<T> {
    capacity: usize,
    length: usize,
    data: *mut T,
}

Semantically, these 3 pointer-sized fields are moved from the function to the caller. The N elements contained by the vector are not copied.

Implementation-wise, the compiler/optimizer can pick from a large bag of tricks:

Actually copy all three fields
Pass in a secret mutable reference and have the function write directly to it
Inline the function where it's called
Perform dead-code removal and never call the function in the first place
Probably others...

The only way to know which it picks is to look at the MIR / LLVM IR / assembly.

Will the return behavior here also be the same for all other non-primitive data types?

Yes. Rust's data types are all treated the same. Primitive vs. non-primitive means nothing for the semantics of the language.

See also:

Deify answered 11/9, 2018 at 1:47 Comment(1)

Thank you for the explanation. I can make sense of the linked question with this extra information. – Consignor 11/9, 2018 at 1:54

There are two parts to a Vec.

The Vec structure itself, this is a fixed size structure consisting of a pointer and two pointer-sized integers for size and capacity. So it is three pointers (12 bytes on 32-bit, 24 bytes on 64-bit) in size.
A block of memory on the heap, which stores the elements of the vector.

When a Vec is returned the vec structure itself is logically "moved" by value. The block of memory on the heap is untouched.

The rust language does not specify exactly how the vec structure is moved. However, for the most part, rustc follows the standard calling conventions for the target platform. Both amd64 and aarch64 return small structures in registers, but their normal definition of "small structures" is structures up to two pointers in size. Above that they flip to their strategy for "large structures" which is to replace them with a pointer.

To demonstrate this I built the following code on godbolt.

#[inline(never)]
pub fn vecofints(v: Vec<i32>) -> Vec<i32> {
    return v;
}

On x86-64 this resulted in.

example::vecofints::h4b8b50ca2c3019a3:
        mov     rax, rdi
        mov     rcx, qword ptr [rsi + 16]
        mov     qword ptr [rdi + 16], rcx
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        ret

We can see in this case, that both the argument and the result are passed as pointers, with a pointer to the argument being passed in rsi (the first argument passing register) and a pointer to the result being passed in rdi (the second argument passing register).

On aarch64 this resulted in.

 example::vecofints::hee34a085ad28128f:
         ldr     q0, [x0]
         ldr     x9, [x0, #16]
         str     q0, [x8]
         str     x9, [x8, #16]
         ret

Which is very similar except rather than passing the hidden pointer for the return value in one of the parameter passing registers, aarch64 has a dedicated register for that purpose.

It's possible to implement a vec-like type, where the Vec structure itself is only one pointer in size, there are a few of them out there often with names like "thinvec".

Tarpley answered 17/4, 2024 at 19:9 Comment(0)

Recommended topics

Hot tags