Computational Complexity of `HashSet::len` in Rust
Asked Answered
C

1

6

I'm currently working with Rust's HashSet and I'm trying to understand the computational complexity of the HashSet::len operation.

The Rust documentation provides information about the computational complexity of get and insert operations for HashMap being O(1) on average, but it doesn't explicitly mention the complexity for HashSet or the len operation.

In general, the len operation for many data structures is O(1), but I couldn't find a specific statement confirming this for HashSet::len in Rust. Here's the link to the relevant Rust documentation: Rust Collections

Could anyone clarify the computational complexity of HashSet::len? Is it O(1) as I would expect, or is there a different complexity associated with this operation in Rust's HashSet?

Colfin answered 2/5, 2022 at 12:55 Comment(3)
It's basically impossible to implement a hash map that does not keep track of the current number of elements, since you need to resize once a certain percentage of elements is filled. I don't think there is any hash map implementation in actual use that has a len() implementation with a runtime complexity of more than O(1). I can't find a statement to that end in the documentation, but it's the only thing that makes sense.Softshoe
Related discussion mentioning that .len() is O(1) for all standard lib collections: github.com/rust-lang/api-guidelines/discussions/149Softshoe
Note that for HashSet operations, the docs you linked state that: "For Sets, all operations have the cost of the equivalent Map operation." But you're right that the complexity of len is undocumented.Berniecebernier
B
13

Had to go down a bit of a rabbit hole for this one. In short, reading the source code from std::collections::HashMap indicates that the standard HashMap inherits its len() functionality from the crate hashbrown, which is a Rust port of the Google HashTable variant, SwissTable (github). Tracking down the implementation of len() in this crate leads down to the underlying class, RawTable, which contains an instance of RawTableInner<A>, generic for the entry type A. This struct contains a slot called items : usize which is returned whenever len() is called. This indicates that the len() function simply returns the value of an internally stored integer count which keeps track of the number of entries.

Overall, this suggests that the time complexity of len() will be O(1), as it isn't doing any iteration or counting, but rather simply return the value of an entry counter which has been maintained over the course of the HashTable's construction.

Busra answered 2/5, 2022 at 13:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.