How does linear probing handle deletions without breaking lookups?

Asked 11/3, 2020 at 16:58 Answered 29/5 at 5:7

Solved data-structures hashtable linear-probing

Here is my understanding of linear probing.

For insertion: - We hash to a certain position. If that position already has a value, we linearly increment to the next position, until we encounter an empty position, then we insert there. That makes sense.

My question revolves around lookup. From descriptions I have read, I believe lookup works like this:

We look at the position the item we are looking for hashes to.
If the position is empty, we return Not Found
If the position is full, we move linearly to positions until we either encounter the value we are looking for, or we encounter an empty position (meaning not found)

So how does this work when we delete an item from a hash? Wouldn't that mess up this lookup? Say two items hash to the same position. We add both items, then delete the first one we added. So now, the expected position of the second item (which had to be moved to a different position, since the first item originally occupied it) is empty. Does deleting handle this in some way?

Pinchbeck answered 11/3, 2020 at 16:58 Comment(0)

Great question! You are absolutely right that just removing an item from a linear probing table would cause problems in exactly the circumstance that you are reporting.

There are a couple of solutions to this. One is to use tombstone deletion. In tombstone deletion, to remove an element, you replace the element with a marker called a tombstone that indicates "an element used to be here, but has since been removed." Then, when you do a lookup, you use the same procedure as before: jump to the hash location, then keep stepping forward until you find a blank spot. The idea here is that a tombstone doesn't count as a blank spot, so you'll keep scanning over it to find what you're searching for.

To keep the number of tombstones low, there are nice techniques you can use like overwriting tombstones during insertions or globally rebuilding the table if the number of tombstones becomes too great.

Another option is to use Robin Hood hashing and backward-shift deletion. In Robin Hood hashing, you store the elements in the table in a way that essentially keeps them sorted by their hash code (wraparound at the front of the table makes things a bit more complex than this, but that's the general idea). From there, when doing a deletion, you can shift elements backwards one spot to fill the gap from the removed element until you either hit a blank or an element that's already in the right place and doesn't need to be moved.

For more information on this, check out these lecture slides on linear probing and Robin Hood hashing.

Hope this helps!

Salable answered 11/3, 2020 at 21:24 Comment(0)

Deletion in linear probing (open addressing) is done in such a way that index at which the value is deleted is assigned any marker such as "Deletion". [One can type any value at that index other than None to indicate that value at this index is deleted]. Keep a look at the below code snippet to indicate how i used "Deletion" marker to fill index where value is deleted

        if self.table[index] == value:
          print("key {} is found in the table and hence deletion tag is updated at that position".format(value))
          self.table[index] = "Deletion"

Now, what happens when again search is done, this position is not None and search will continue. See the below snippet how search is implemented in the linear probe

    def search(self, value):
      index = value % self.table_size

      if self.table[index] != value:
        while self.table[index] is not None and self.table[index] != value:
            index = (index + 1) % self.table_size
      if self.table[index] == value:
        print("Key is found in the table")
      else:
        print("key is not found in the table")

One can also look at the github code explaining deletion in linear probing without breaking lookups.

Hereon answered 18/2, 2022 at 13:25 Comment(0)

When a deletion happens under linear probing, there is an algorithm which avoids placing tombstones into the array. The algorithm walks all the entries starting with the deleted entry V until the first empty spot (or all the way around the table, if it is full).

Successive entries E are visited. We find the first such E which has a hash code such that when E is retrieved by a lookup operation V has to be traversed to get to E. When we find such an E, we relocate it over top of V. Then we regard E as being the new victim node V and continue at step 1.
If in (1) we do not find such a node that can be moved over V, it means that all nodes after V are hashed and probed without crossing V. V can therefore be safely obliterated.

A trivial case of 2 occurs when we V is immediately before an empty spot.

A non-trivial case of 2 occurs when the search for E wraps around back to V. That can only happen when the table is full.

For instance, suppose the table is full and perfectly hashed: every entry E is located at (hash(E) mod size(table)), its ideal location. Further suppose that the V we are deleting is the first node at the bottom of the table. For every subsequent entry E, we will find that navigating to that entry does not cross V (since the hashing proceeds directly to that node without any probing), and so we cannot move that entry into the V location. Eventually we visit all of them and wrap back to V. At that point we know that no nodes which follow V in probing order require a traversal of V, and V can be safely obliterated into an empty spot.

The condition in (1) is tricky. This SO answer shows it in Python code, where the xor operator simplifies the condition. If you're working in a language without an xor, I think this might work. Here I'm using the same variable names as that answer:

if (j > i and (k <= i or k > j)) or (j < i and k <= i and k > j):
    slot[i] = slot[j]
    i = j

The j < i case is the wraparound case; j is searching for what I called the node E, and hit the end of the table, so j is now below i. i is the victim node. k is the hash location of the j node being examined: the place where the table is first probed for j.

Indevout answered 20/6, 2023 at 2:47 Comment(0)

Other than tombstones that were mentioned, another method to handle deletions in a linear probing hash table is to remove and reinsert entries following the removed entry until an empty position in the hash table is reached. All entries not separated by an empty position could be in a cluster caused by hash collisions.

A follow up question might be, why not just stop at the first entry where the key is in the exact position calculated by the table's hash function? This unfortunately does not work because it's possible another collision occurred during a subsequent insertion. It would not be possible to detect this later collision without analyzing every entry in the cluster.

Amphictyon answered 26/3 at 19:52 Comment(1)

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Nealneala 1/4 at 12:42

Let's say we have a hash table consisting of 5 slots. To find the position, we will use a simple hash function k mod m where k is the key that is being hashed and m is the size of the hash table. We will also use probing for the collision resolution strategy.

Our initial hash table is as follows: Initial table

Now we want to delete 23.

To find 23, we hash it and find that it hashes to position 1.
We then mark it as deleted. The new hash table looks like the following: after deletion

Search and insertion: If we need to search or insert an item that hashes to position 1, we treat the slot as empty.

Unpaid answered 29/5 at 5:7 Comment(0)

Recommended topics

Hot tags