Using Box to optimise memory allocation of optional, known length arrays
Asked Answered
P

2

15

I have a question about memory allocation strategies in a specific situation.

I have a large number of structs that are coming from a JSON API via serde. There is a specific field in the JSON response which will, for a small minority of cases, exist and contain an array of exactly 11 elements. In the majority of cases this field will not exist at all in the response. All responses need to be stored.

The majority of responses will be of the format:

{
  id: 1,
  event: "None"
}

A minority of responses will be of the format:

{
  id: 2,
  event: "something",
  event_details : [11 json objects here]
}

If I define my struct for parsing the JSON as:

#[derive(Deserialize, Debug)]
struct Event {
  id: u32,
  event: String,
  // EventDetail is a struct with an u32 and string field
  events: Option<[EventDetail; 11]>
 }

I can see using deepsizeof crate that my Vec<Event> that gets created takes up ~500MB of memory. If I change the events field to be events: Option<Box<[EventDetail; 11]>> the memory usage drops to ~150MB.

My understanding here is that since Box is allocated at runtime instead of compile time, when a JSON response comes in that is lacking the events field, then the 11 item array does not need to be allocated. Am I correct here? And is this a valid use case for Box or is there a better way to do this?

I also tried events: Option<Vec<EventDetail>>, which resulted in slightly higher memory assignment than Box. I assume this is down to padding.

Pump answered 28/12, 2023 at 23:55 Comment(0)
B
15

Yes, this is all correct. Option<[T; N]> always allocates space for the array, even if it's not used, whereas Option<Box<...>> allocates enough space for the box pointer, but not the thing it points to unless needed. Hiding the large data behind a Box indirection is 100% the correct call here.

Vec is a pointer (basically a Box) plus the size of the vector and its capacity, so Option<Vec<...>> will usually be three times the overhead, as it will allocate space for all of the pointer, the size and the capacity, of the (potentially nonexistent) vector. But at that point, we're arguing in units of bytes, so it's silly to debate size differences at that scale.

(Vec also stores the memory allocator, but the default allocator Global is a zero-sized type)

Since you know the size in advance, I'd go with Option<Box<[EventDetail; 11]>> rather than the vector, since the vector is just pointlessly pushing data to runtime that could be known at compile-time.

Burin answered 29/12, 2023 at 0:15 Comment(3)
Thanks! Maybe I missed it but I don't see this use case listed in any of the common rust guides/docs on Box use cases, to me this use case would be more common than things like Recursive structsPump
FWIW you could also just use Vec with no Option layer. This should have basically the same overhead as an Option<Box> (plus length+capacity, though) and you could use #[serde(default)] to deserialize a missing value as an empty Vec (assuming that a present but empty array is not a distinct state that needs to be distinguishable).Natterjack
Since the array is fixed size I would assume an empty array is not currently an option.Hipbone
L
1

Just a partial answer for the "or is there a better way to do this?" part.

Depending on the context of usage of the struct Event, another option is to store the event_details arrays externally to the struct. Depending on your performance constraints and how you resolve identity for Events, you could store them in another Vec or a dictionary (HashMap?), indexing by Event.id.

Latta answered 29/12, 2023 at 13:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.