Why is accessing an element of a dictionary by key O(1) even though the hash function may not be O(1)?

Asked 20/5, 2016 at 13:50 Answered 23/8, 2021 at 18:2

I see how you can access your collection by key. However, the hash function itself has a lot of operations behind the scenes, doesn't it?

Assuming you have a nice hash function which is very efficient, it still may take many operations.

Can this be explained?

Broussard answered 20/5, 2016 at 13:50 Comment(10)

O notation is about measuring the growth of complexity with different inputs. It's not about how many operations you have. For example: with 1 value, you have x seconds, with n values, you need roughly x*n seconds => O (n). x could be many operations combined. – Slosh 20/5, 2016 at 13:52

Data structures don't have O notation complexity, operations on them do. – Unbeknown 20/5, 2016 at 13:52

So what operation are we taking about? – Evolve 20/5, 2016 at 13:54

@PatrickHofman It does explain some facts about O(1) complexities on dictionary, maybe related is a better word. – Unbeknown 20/5, 2016 at 14:3

OP clearly says access by key is the operation in question. – Equalitarian 20/5, 2016 at 14:11

For the record, it's not that many instructions, either. Intel processors have hardware instructions for doing SHA hashes, so a dictionary lookup could be done in just a few assembly instructions. – Digged 20/5, 2016 at 16:26

I think this question is not a duplicate, because it concentrates on hash ffunction not being const time. – Florettaflorette 20/5, 2016 at 22:10

"a lot of operations" and O(1) are perfectly compatible - O(1) or constant time means that, as the number of elements approaches infinity, there exists some finite constant that bounds the execution time. That constant can be arbitrary large - using a hash function that is guaranteed to complete within one year wouldn't prevent the system from being O(1). – Urogenous 21/5, 2016 at 9:25

Think of a hash lookup as a single operation. – Irfan 21/5, 2016 at 22:13

Possible duplicate of Can hash tables really be O(1)? – Unipersonal 14/5, 2017 at 17:49

123

the HashFunc itself has a lot of operations behind the scenes

That is certainly true. However, the number of these operations depends on the size of the key, not on the size of the hash table into which the key is inserted: the number of operations to compute hash function is the same for a key in a table with ten or with ten thousand entries.

That is why the call of hash function is often considered O(1). This works fine for fixed-size keys (integral values and fixed-length strings). It also provides a decent approximation for variable-sized keys with a practical upper limit.

Generally, though, access time of a hash table is O(k), where k is the upper limit on the size of the hash key.

Florettaflorette answered 20/5, 2016 at 13:59 Comment(5)

Also consider that it is impossible to have a hash table of n distinct items unless at least one item is represented by at least log(n) bits. – Stonebroke 21/5, 2016 at 3:8

Sadly, all operations are exponential if you don't restrict the inputs' bit size. But that's not a very interesting or useful result, right? – Roughish 21/5, 2016 at 22:4

@Owen: It is also not possible to have more items in an in-memory hashtable than can be unique assigned keys that fit in a pointer-sized variable. – Arteriosclerosis 22/5, 2016 at 15:28

the number of these operations depends on the size of the key and on the size of the data being hashed. – Elaelaborate 24/5, 2016 at 19:7

k does not need to be an upper bound. The lookup time is linear in the key size, so it's indeed O(k) where k is the key size. If k is understood as an upper bound then it's actually O(1). – Dickson 25/5, 2016 at 10:53

148

O(1) doesn't mean instant. O(1) means constant without regard to the size of the data. The hash function takes a certain amount of time, but that amount of time doesn't scale with the size of the collection.

Hireling answered 20/5, 2016 at 13:52 Comment(11)

But it is possible to write a hash function that is dependent on the size of the collection. It'd be stupid, and contrived, but you can do it. The statement that searching a hashset is actually premised on the assumption that computing the hash is O(1), which is virtually always, but not necessarily, the case. – Mailer 21/5, 2016 at 4:39

@Mailer Not even necessarily all that stupid and contrived. A custom list implementation that wants to allow two lists that hold equal items to compare as equal themselves may override GetHashCode() to combine the hash codes of the items in some way. If I were to implement such a class, for an initial implementation, I would implement GetHashCode() exactly like that. I would of course also change that later though. – Clevelandclevenger 21/5, 2016 at 11:1

@hvd That would be an O(m) hash, where m is the size of the inner collections. It still wouldn't be related to the size of the outer collection (the actual hash based structure). You'd need to have the items in the collection looking at all of the items of that same hash based collection that they are currently in for those items to have an O(n) (or any function of n) for their hash code. That would be pretty stupid and contrived. – Mailer 21/5, 2016 at 17:13

@Mailer Oh, that's what you meant. Yeah, that would be stupid. :) I can't come up with any remotely plausible scenario where you might want that. – Clevelandclevenger 21/5, 2016 at 18:13

@Mailer The general point of hashing is to avoid O(n) search time, so creating a hash function that's O(n) would totally defeat the purpose. You could do it, but it would be like implementing addition recursively with Peano numbers: possible, but not really practical. – Yuki 24/5, 2016 at 18:19

@barmar sure, but while you are correct that it's a bad idea, it's possible, so you do need to consider it a possibility when you don't control the hash. – Mailer 24/5, 2016 at 19:30

@Mailer As far as I'm concerned, that would be like considering the possibility that array indexing is O(n). It could happen if arrays are implemented as linked lists, but I'm not going to worry about it. – Yuki 24/5, 2016 at 19:57

@barmar but it's not possible for an array access to be O(N), and it is possible for a GetHashCode to have any arbitrary complexity, it's just preferable for it to be O(1). – Mailer 24/5, 2016 at 20:14

@Mailer GetHashCode() doesn't receive the hash table as a parameter, so how can it be dependent on the table size? – Yuki 24/5, 2016 at 20:20

@barmar it's uncommon, but not unheard of, for an object to have a reference to its container. You could just explicitly pass it in if you wanted, but it could also come up through layers of indirection. – Mailer 24/5, 2016 at 20:54

Please remember, that for a hash to be a valid key for a dictionary, it needs to be stable, i.e. not change after the object has been added to the dictionary. So it works only by hashing over some immutable part of the object. Otherwise you'll break the contract of most dictionary implementations. Also remember that at least for Java the contract is if a.equals(b) then hash(a)==hash(b). Having the collection as part of the hash-function will cause the whole dictionary to stop working, if this causes the hash to change if the collection changes. – Infeld 31/3, 2017 at 12:15

123

the HashFunc itself has a lot of operations behind the scenes

Generally, though, access time of a hash table is O(k), where k is the upper limit on the size of the hash key.

Florettaflorette answered 20/5, 2016 at 13:59 Comment(5)

Also consider that it is impossible to have a hash table of n distinct items unless at least one item is represented by at least log(n) bits. – Stonebroke 21/5, 2016 at 3:8

Sadly, all operations are exponential if you don't restrict the inputs' bit size. But that's not a very interesting or useful result, right? – Roughish 21/5, 2016 at 22:4

@Owen: It is also not possible to have more items in an in-memory hashtable than can be unique assigned keys that fit in a pointer-sized variable. – Arteriosclerosis 22/5, 2016 at 15:28

the number of these operations depends on the size of the key and on the size of the data being hashed. – Elaelaborate 24/5, 2016 at 19:7

It means that no matter what size your collection can be, it will still take almost the same amount of time to retrieve any of its members.

So in other words Dictionary with 5 members will let's say coud take around 0.002 ms to access one of them, as well as dictionary of 25 members should take something similar. Big O means algorithmic complexity over collection size instead of actual statements or functions executed

Dormie answered 20/5, 2016 at 13:55 Comment(4)

But at the same time if your hash function is really bad, you may end up with a lot of values in the bucket, so O(1) won't hold anymore – Relive 20/5, 2016 at 13:58

@klappvisor, not necessary that has function is bad. It may be that the input data is crafted. That's why O(1) here is amortized complexity, not the "true" complexity. – Kalmick 20/5, 2016 at 19:24

It doesn't mean that every member will take the same amount of time, it just (roughly) means that the upper bound on that access time does not grow with the size of the collection. Consider how a hash table handles disambiguating collisions. Similarly, looking up an item for a binary search tree is O(log2 n) because the worst case is log2 with the size of N, but an item near the root will take less time than a leaf item, for example. – Troublous 21/5, 2016 at 4:28

@Kalmick That's not actually what the "amortized" clarification of the O(1) means. The fact that it's an ammortized O(1) is to account for the fact that approximately 1/N of the additions (if you're adding to the set) will require reallocating a new backing array, which is an O(N) operation, so you can perform N additions in O(N) time, for an amortized O(1) addition, while a single addition is actually also O(N) (when not amortized). It's a separate clarification of the asymptotic complexity that it assumes the hashes are sufficiently well distributed. – Mailer 21/5, 2016 at 4:43

If a dictionary/map is implemented as a HashMap, it has a best case complexity of O(1), since i best case it requires exactly the calculation of the hash-code of the key element for retrieval, if there are no key collisions.

A hash-map may have a worst-case runtime complexity of O(n) if you have a lot of key collisions or a very bad hash function, since in this case it degrades to a linear scan of the entire array which holds the data.

Also, O(1) doesn't mean instantly, it means it has a constant amount. So choosing the right implementation for a dictionary may as well depend on the number of elements in the collection, since having a very high constant cost for the function will be much worse if there are only a few entries.

That's why dictionaryies/maps are implemented differently for different scenarios. For Java there are multiple different implementations, C++ uses red/black-trees, etc. You chose them based on the number of data and based on their best/average/worst-case runtime-efficiency.

Infeld answered 20/5, 2016 at 14:2 Comment(2)

It doesn't have to be that way, e.g. Java 8's HashMap resorts to a balanced tree in case it detects several collisions. – Kristlekristo 20/5, 2016 at 14:4

@Kristlekristo may be true, but then it is the classic hash-map any more. There are many different implementations for maps/dictionaries, fur exactly this case. I have amended the answer to point this out. – Infeld 20/5, 2016 at 14:7

Theoretically it is still O(n), because in the worst case all your data may end up having identical hash and be bundled together in which case you have to linearly go through all of it.

Hartman answered 20/5, 2016 at 16:7 Comment(0)

Please see post What does "O(1) access time" mean?

The number of operations in a hash function is irrelevant as long as it takes the same (constant) amount of time for EVERY element in the collection. For example, accessing one element in a collection of 2 elements takes .001 ms, but also accessing one element in a collection of 2,000,000,000 elements takes .001 ms. Although the hash function can contain hundreds of if statements and multiple calculations.

Transport answered 20/5, 2016 at 14:4 Comment(2)

Constant amount of time, not linear. – Snowwhite 20/5, 2016 at 15:54

Wouldn't a hash function need to contain more "if statements and multiple calculations" to produce a long enough hash value to uniquely identify 2 billion elements than it would for 200? – Every 20/5, 2016 at 18:50

from the docs:

Retrieving a value by using its key is very fast, close to O(1), because the T:System.Collections.Generic.Dictionary`2 class is implemented as a hash table.

So it can be O(1) but might be slower. Here you can find another thread regarding hashtable performance: Hash table - why is it faster than arrays?

Romonaromonda answered 20/5, 2016 at 14:4 Comment(0)

Once you allow for the fact that larger and larger dictionaries take up more memory, going further down the cache hierarchy and eventually out to slow swap space on disk, it's hard to argue that it is truly O(1). The performance of the dictionary will get slower as it gets bigger, probably giving O(log N) time complexity. Don't believe me? Try it for yourself with 1, 100, 1000, 10000 and so on dictionary elements, up to say 100 billion, and measure how long it takes in practice to look up an element.

However if you make the simplifying assumption that all memory in your system is random access memory, and can be accessed in constant time, then you can claim that the dictionary is O(1). This assumption is common, even though it's not really true for any machine with disk swap space, and still pretty debatable in any case given the various levels of CPU cache.

Chrysarobin answered 20/5, 2016 at 16:19 Comment(2)

You do have a point, but when we talk about algorithmic complexity it does make sense to assume perfect hardware. The point is to define characteristics of an algorithm, not different real life hardware implementations. Besides, if you have big enough data, the complexity of the algorithm is really what matters most: is it e.g. O(1), (logN), O(n) or O(n^2). – Te 22/5, 2016 at 9:24

There is also the issue of hash key collision with larger size dictionaries. Once you get big enough the majority of new entries will be colliding with an existing entry, causing a linear search through each hash bucket and ending up as O(n). Unless you make the hash keys grow longer as the size increases... but then you don't have O(1) either. I agree that in practice you may treat it as constant time but I would prefer to stay away from the formal O-notation for something that is only a rough approximation for small enough sizes, not a formal proof for any size. – Chrysarobin 23/5, 2016 at 9:50

We know that hash function take O(1) to access value by key...so it doesn't mean that it will take only 1 step to get the value, It means constant time "t" where that "t" does not depend on size of your data structure(eg:-python dict()).

Iridize answered 23/8, 2021 at 18:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags