Is there a best practice when a type should be boxed?

In C#, there are structs and classes. Structs are usually (i.e. there are exceptions) stack allocated and classes are always heap allocated. Class instances, therefore, put pressure on the GC and are considered "slower" than structs. Microsoft has a best practice guide when to use structs over classes. This says to consider a struct if:

It logically represents a single value, similar to primitive types (int, double, etc.).

It has an instance size under 16 bytes.

It is immutable.

It will not have to be boxed frequently.

In C#, using struct instances that are larger than 16 bytes is generally said to perform worse than garbage collected class instances (dynamically allocated).

When does a boxed instance (which is heap-allocated) perform better, in terms of speed, than a non-boxed equivalent instance (which is stack-allocated)? Is there any best practice about when we should dynamically allocate (on the heap) instead of sticking to the default stack allocation?

TL;DR: start with no boxing, then profile.

Stack Allocation vs Boxed Allocation

This is perhaps more clear cut:

Stick to the stack,
Unless the value is big enough that it would blow it up.

While semantically writing fn foo() -> Bar implies moving Bar from the callee frame to the caller frame, in practice you are more likely to end up with the equivalent of a fn foo(__result: mut * Bar) signature where the caller allocates space on its stack and passes a pointer to the callee.

This may not always be sufficient to avoid copying, as some patterns may prevent writing directly in the return slot:

fn defeat_copy_elision() -> WithDrop {
    let one = side_effectful();
    if side_effectful_too() {
        one
    } else {
        side_effects_hurt()
    }
}

Here, there is no magic:

if the compiler uses the return slot for one, then in case the branch evaluates to false it has to move one out then instantiate the new WithDrop into it, and finally destroy one,
if the compiler instantiates one on the current stack, and it has to return it, then it has to perform a copy.

If the type didn't need Drop, there would be no issue.

Despite these oddball cases, I advise sticking to the stack if possible unless profiling reveals a place where it'd be beneficial to box.

Inline Member or Boxed Member

This case is much more complicated:

the size of the struct/enum is affected, thus CPU cache behavior is affected:
- less frequently used big variants are a good candidate for boxing (or boxing parts of them),
- less frequently accessed big members are a good candidate for boxing.
at the same time, there are costs for boxing:
- it's incompatible with Copy types, and implicitly implements Drop (which, as seen above, disables some optimizations),
- allocating/freeing memory has unbounded latency¹,
- accessing boxed memory introduces data-dependency: you cannot know which cache line to request before knowing the address.

As a result, this is a very fine balancing act. Boxing or unboxing a member may improve the performance of some parts of the codebase while decreasing the performance of others.

There is definitely no one-size fits all.

Thus, once again, I advise avoiding boxing until profiling reveals a place where it'd be beneficial to box.

¹ Consider that on Linux, any memory allocation for which there is no spare memory in the process may require a system call, which if there is no spare memory in the OS may trigger the OOM killer to kill a process, at which point its memory is salvaged and made available. A simple malloc(1) may easily require milliseconds.

Recommended topics

Hot tags