Why does Rust use two bytes to represent this enum when only one is necessary?
Asked Answered
G

1

5

It appears to be smart enough to only use one byte for A, but not smart enough to use one byte for B, even though there are only 8*8=64 possibilities. Is there any way to coax Rust to figure this out or do I have to manually implement a more compact layout?

Playground link.

#![allow(dead_code)]

enum A {
    L,
    UL,
    U,
    UR,
    R,
    DR,
    D,
    DL,
}

enum B {
    C(A, A),
}

fn main() {
    println!("{:?}", std::mem::size_of::<A>()); // prints 1
    println!("{:?}", std::mem::size_of::<B>()); // prints 2
}
Generosity answered 3/2, 2019 at 1:32 Comment(4)
That's because rust's enums are the size of its largest variant. In this case, A is the size of a u8, and therefore there is two bytes required to fit two As in B, as there is no compile-time micro optimizations like this. Anyway, what if the packed version of this was slower to use than the unpacked version?Freberg
one word, implemented behavior.Waneta
@OptimisticPeach: it's certainly possible that it would be worse on some platforms/use-cases, but with memory latencies nowadays usually smaller data structures make up any unpacking time through having fewer cache misses. I am going to have fairly large vectors of these objects I'm going to be accessing semi-randomly, so cache misses are a concern for my use case. I'd be fine with something I have to opt into but that still saves me the work of manually doing the packing logic myself.Generosity
Rust can do enum layout optimizations in some more limited cases, see github.com/rust-lang/rust/pull/45225 for exampleBlockage
A
16

Both bytes are necessary to preserve the ability to borrow struct members.

A type in Rust is not an ideal set of values: it has a data layout, which describe how the values are stored. One of the "rules" governing the language is that putting a type inside a struct or enum doesn't change its data layout: it has the same layout inside another type as it does standalone, which allows you to take references to struct members and use them interchangeably with any other reference.*

There's no way to fit two As into one byte while satisfying this constraint, because the size of A is one whole byte -- you can't address a part of a byte, even with repr(packed). The unused bits just remain unused (unless they can be repurposed to store the enum tag by niche-filling).

*Well, repr(packed) can actually make this untrue. Taking a reference to a packed field can cause undefined behavior, even in safe code!

Amorete answered 3/2, 2019 at 2:0 Comment(1)
I wonder if it's possible to have some sort of macro that would make a compact representation of B, that would involve generating multiple possible representations of A and implementing conversions for you to get the best of both worlds...Generosity

© 2022 - 2024 — McMap. All rights reserved.