Why hashmap lookup is O(1) i.e. constant time?
Asked Answered
V

4

63

If we look from Java perspective then we can say that hashmap lookup takes constant time. But what about internal implementation? It still would have to search through particular bucket (for which key's hashcode matched) for different matching keys.Then why do we say that hashmap lookup takes constant time? Please explain.

Vacuity answered 18/3, 2013 at 4:31 Comment(1)
possible duplicate of Can hash tables really be O(1)Piranha
D
68

Under the appropriate assumptions on the hash function being used, we can say that hash table lookups take expected O(1) time (assuming you're using a standard hashing scheme like linear probing or chained hashing). This means that on average, the amount of work that a hash table does to perform a lookup is at most some constant.

Intuitively, if you have a "good" hash function, you would expect that elements would be distributed more or less evenly throughout the hash table, meaning that the number of elements in each bucket would be close to the number of elements divided by the number of buckets. If the hash table implementation keeps this number low (say, by adding more buckets every time the ratio of elements to buckets exceeds some constant), then the expected amount of work that gets done ends up being some baseline amount of work to choose which bucket should be scanned, then doing "not too much" work looking at the elements there, because on expectation there will only be a constant number of elements in that bucket.

This doesn't mean that hash tables have guaranteed O(1) behavior. In fact, in the worst case, the hashing scheme will degenerate and all elements will end up in one bucket, making lookups take time Θ(n) in the worst case. This is why it's important to design good hash functions.

For more information, you might want to read an algorithms textbook to see the formal derivation of why hash tables support lookups so efficiently. This is usually included as part of a typical university course on algorithms and data structures, and there are many good resources online.

Fun fact: there are certain types of hash tables (cuckoo hash tables, dynamic perfect hash tables) where the worst case lookup time for an element is O(1). These hash tables work by guaranteeing that each element can only be in one of a few fixed positions, with insertions sometimes scrambling around elements to try to make everything fit.

Hope this helps!

Divisibility answered 18/3, 2013 at 4:37 Comment(3)
Thanks for simple and clear answer. It answers my question neatly.Vacuity
You answered correctly for getting the value. But what about bucket location? We know for each key(suppose hash function is too good that generates different hash value for each key), a table will be created which locate bucket i.e linkedlist(till java 7). My question is if there are 1000K different key-value with 1000K buckets , how hashmap makes sure it will give o(1) for even lookup of bucket location? Hope i am clear about question. Thank you.Unfortunate
@Unfortunate The buckets are typically stored as an array (either a raw one or some wrapper like ArrayList that supports efficient random access. That way, after you've computed the hash function, you can jump in time O(1) (independent of the number of buckets) to the right bucket.Divisibility
U
16

Hashtables AREN'T O(1).

Via the pigeonhole principle, you cannot be better than O(log(n)) for lookup, because you need log(n) bits per item to uniquely identify n items.

Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using. However, big O notation doesn't care about that fact, and it is a (granted, absurdly common) misuse of the notation to call hashtables O(1).

Because while you could store a million, or a billion items in a hashtable and still get the same lookup time as a single item hashtable... You lose that ability if you're taking about a nonillion or googleplex items. The fact that you will never actually be using a nonillion or googleplex items doesn't matter for big O notation.

Practically speaking, hashtable performance can be a constant factor worse than array lookup performance. Which, yes, is also O(log(n)), because you CAN'T do better.

Basically, real world computers make every array lookup for arrays of size less than their chip bit size just as bad as their biggest theoretically usable array, and as hastables are clever tricks performed on arrays, that's why you seem to get O(1)

Upandcoming answered 14/7, 2021 at 19:39 Comment(2)
You are correct that it takes log n bits to identify one item from n. From both a theoretical and practical perspective, it’s typically safe to assume that the computer can operate on Omega(log n) bits in constant time. Otherwise, an operation like “increment an index within an array” doesn’t take constant time. There are models of computation in which we don’t assume this (the decision tree model is one example), but the models we typically use for reasoning about real computers (say, the transdichotomous model) make the (reasonable) assumption that the machine word size is around log n.Divisibility
They do make that assumption, but it is still an assumption. If you ever find yourself with datasets larger than fit in RAM, or even on one machine, you will find out that it is indeed an assumption, not a fact. Even if you pile massive amounts of RAM into one machine, you still can't access it all at the same speed.Upandcoming
S
11

The key is in this statement in the docs:

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.

and

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html

The internal bucket structure will actually be rebuilt if the load factor is exceeded, allowing for the amortized cost of get and put to be O(1).

Note that if the internal structure is rebuilt, that introduces a performance penalty that is likely to be O(N), so quite a few get and put may be required before the amortized cost approaches O(1) again. For that reason, plan the initial capacity and load factor appropriately, so that you neither waste space, nor trigger avoidable rebuilding of the internal structure.

Semidome answered 18/3, 2013 at 4:37 Comment(0)
R
0

To follow up on templatetypedef's comments as well:

The constant time implementation of a hash table could be a hashmap, with which you can implement a boolean array list that indicates whether a particular element exists in a bucket. However, if you are implementing a linked list for your hashmap, the worst case would require you going through every bucket and having to traverse through the ends of the lists.

Rowenarowland answered 16/11, 2016 at 20:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.