Should I take `self` by value or mutable reference when using the Builder pattern?
Asked Answered
M

1

14

So far, I've seen two builder patterns in official Rust code and other crates:

impl DataBuilder {
    pub fn new() -> DataBuilder { ... }
    pub fn arg1(&mut self, arg1: Arg1Type) -> &mut Builder { ... }
    pub fn arg2(&mut self, arg2: Arg2Type) -> &mut Builder { ... }
    ...
    pub fn build(&self) -> Data { ... }
}
impl DataBuilder {
    pub fn new() -> DataBuilder { ... }
    pub fn arg1(self, arg1: Arg1Type) -> Builder { ... }
    pub fn arg2(self, arg2: Arg2Type) -> Builder { ... }
    ...
    pub fn build(self) -> Data { ... }
}

I'm writing a new crate and I'm a bit confused which pattern I should choose. I know it will be painful if I change some APIs later, so I want to make the decision now.

I understand the semantic difference between them, but which one should we prefer in practical situations? Or how should we choose between them? Why?

Malaco answered 19/12, 2021 at 1:7 Comment(1)
FWIW, the derive_builder crate lays out some pros and cons: docs.rs/derive_builder/latest/derive_builder/#builder-patterns.Zillion
M
14

Is it beneficial to build multiple values from the same builder?

  • If yes, use &mut self
  • If no, use self

Consider std::thread::Builder which is a builder for std::thread::Thread. It uses Option fields internally to configure how to build the thread:

pub struct Builder {
    name: Option<String>,
    stack_size: Option<usize>,
}

It uses self to .spawn() the thread because it needs ownership of the name. It could theoretically use &mut self and .take() the name out of the field, but then subsequent calls to .spawn() wouldn't create identical results, which is kinda bad design. It could choose to .clone() the name, but then there's an additional and often unneeded cost to spawn a thread. Using &mut self would be a detriment.

Consider std::process::Command which serves as a builder for a std::process::Child. It has fields containing the program, args, environment, and pipe configuration:

pub struct Command {
    program: CString,
    args: Vec<CString>,
    env: CommandEnv,
    stdin: Option<Stdio>,
    stdout: Option<Stdio>,
    stderr: Option<Stdio>,
    // ...
}

It uses &mut self to .spawn() because it does not take ownership of these fields to create the Child. It has to internally copy all that data over to the OS anyway, so there's no reason to consume self. There's also a tangible benefit and use-case to spawning multiple child processes with the same configuration.

Consider std::fs::OpenOptions which serves as a builder for std::fs::File. It only stores basic configuration:

pub struct OpenOptions {
    read: bool,
    write: bool,
    append: bool,
    truncate: bool,
    create: bool,
    create_new: bool,
    // ...
}

It uses &mut self to .open() because it does not need ownership of anything to work. It is somewhat similar to the thread builder since there is a path associated with a file just as there is a name associated with a thread, however, the file path is only passed in to .open() and not stored along with the builder. There's a use-case for opening multiple files with the same configuration.


The considerations above really only cover the semantics of self in the .build() method, but there's plenty of justification that if you pick one method you should use that for the interim methods as well:

  • API consistency
  • chaining (&mut self) -> &mut Self into build(self) obviously wouldn't compile
  • using (self) -> Self into build(&mut self) would limit the flexibility of the builder to be reused long-term

See also: How to write an idiomatic build pattern with chained method calls in Rust?

Margeret answered 19/12, 2021 at 2:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.