Why deletion of elements of hash table using doubly-linked list is O(1)?
Asked Answered
D

8

29

On CLRS's textbook "Introduction to Algorithm", there's such paragraph on pg. 258.

We can delete an element in O(1) time if the lists are doubly linked. (Note that CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don't have to search for x first. If the hash table supports deletion, then its linked list should be doubly linked so that we can delete an item quickly. If the lists were only singly linked, then to delete element x, we would first have to find x in the list so that we could update the next attribute of x's predecessor. With singly linked lists, both deletion and searching would have the same asymptotic running times).

What puzzle me is this big parenthses, I failed to understand its logic. With doubly linked list, one still have to find x in order to delete it, how is this different from singly linked list? Please help me to understand it!

Dogleg answered 12/11, 2011 at 16:45 Comment(0)
M
34

The problem presented here is : consider you have are looking at a particular element of a hashtable. How costly is it to delete it?

Suppose you have a simple linked list :

v ----> w ----> x ----> y ----> z
                |
            you're here

Now if you remove x, you need to connect w to y to keep your list linked. You need to access w and tell it to point to y (you want to have w ----> y). But you can't access w from x because it's simply linked! Thus you have to go through all your list to find w in O(n) operations, and then tell it to link to y. That's bad.

Then, suppose you're doubly-linked :

v <---> w <---> x <---> y <---> z
                |
            you're here

Cool, you can access w and y from here, so you can connect the two (w <---> y) in O(1) operation!

Missus answered 12/11, 2011 at 16:53 Comment(7)
In your explanation, you assume you know the pointer to x, not simply x, but the textbook didn't say that! Or is it implied somewhere in the textbook?Dogleg
Note that CHAINED-HASH-DELETE takes as input an element x and not its key k. Yes, the textbook say you're already there =). It is assumed you know the pointer to x. That's why I re-wrote the problem in the first line of my answer, because I thought you overlooked that point. (This also implies that you're generally right, if you don't know x, it's going to cost you O(n) operation to find x, singly or doubly-linked)Missus
If you don't know x, it will take approximately O(1) to locate it, not O(n). It is a hash table, after all.Dialogue
there's a picture before that paragraph in the book (11.3). the value of a key in the hash collection is actually a pointer in their representation, so x is a pointerPeag
Although I think this answer makes sense. I still think the textbook does not do a good job here. It is not clear in every way, and makes people confused. Think about we have key-value x pairs (key, value x) in the hash table. Elements X could be anything, it is not necessarily the pointer or containing a pointer of the linked list. The textbook assumes that the elements is "an element in the linked list", but did not mention this in any where. It would be good that the textbook actually defines the data structure of element x as a structure containing not only values but also pointers.Dennison
I am not sure how you can get element x with out searching the linked list. The context here is that we are trying to delete an object v that has key k, and the hash table uses chaining as its collision resolution mechanism. If I have element x (which wraps object v and pointers to its previous and next elements), then yes its helpful but in practice we just have v, so deletion still takes O(n) in worst case because you have to find x first. I don't know what I have missed, but I don't see doubly linked list helped.Africander
Thanks for explanation, somehow the book might confused people with too many notations and specific terminology. Glad find this discussion.Hen
R
2

It seems to me that the hash table part of this is mostly a red herring. The real question is: "can we delete the current element from a linked list in constant time, and if so how?"

The answer is: it's a little tricky, but in effect yes, we can -- at least usually. We do not (normally) have to traverse the entire linked list to find the previous element. Instead, we can swap the data between the current element and the next element, then delete the next element.

The one exception to this is when/if we need/want to delete the last item in the list. In this case, there is no next element to swap with. If you really have to do that, there's no real way to avoid finding the previous element. There are, however, ways that will generally work to avoid that -- one is to terminate the list with a sentinel instead of a null pointer. In this case, since we never delete the node with the sentinel value, we never have to deal with deleting the last item in the list. That leaves us with relatively simple code, something like this:

template <class key, class data>
struct node {
    key k;
    data d;
    node *next;
};

void delete_node(node *item) {
    node *temp = item->next;
    swap(item->key, temp->key);
    swap(item->data, temp->data);
    item ->next = temp->next;
    delete temp;
}
Rosner answered 12/11, 2011 at 17:10 Comment(0)
S
1

In general you are correct - the algorithm you posted takes an element itself as input though and not just its key:

Note that CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don't have to search for x first.

You have the element x - since it is a double linked list you have pointers to predecessor and successor, so you can fix those elements in O(1) - with a single linked list only the successor would be available, so you would have to search for the predecessor in O(n).

Stuccowork answered 12/11, 2011 at 16:50 Comment(0)
A
1

suppose you want to delete an element x , by using doubly link list you can easily connect the previous element of x to next element of x. so no need to go through all the list and it will be in O(1).

Agrestic answered 19/7, 2012 at 5:53 Comment(0)
D
0

Find(x) is, in general, O(1) for a chained hash table -- it is immaterial whether or not you use singly linked lists or doubly linked lists. They are identical in performance.

If, after having run Find(x), you decide that you'd like to delete the object returned, you will find that, internally, a hash table might have to search for your object again. It's still usually going to be O(1) and not a big deal, but you find that you delete an awful lot, you can do a little better. Instead of returning a user's element directly, return a pointer to the underlying hash node. You can then take advantage of some internal structures. So if in this case, you chose a doubly linked list as the way to express your chaining, then during the delete process, there is no need to recompute the hash and search the collection again -- you can omit this step. You have enough information to perform a delete right from where you are sitting. Additional care must be taken if the node you are submitting is the head node, so an integer might be used to mark the location of your node in the original array if it is the head of a linked list.

The trade-off is the guaranteed space taken by the extra pointer versus a possible faster delete (and slightly more complicated code). With modern desktops, space is usually very cheap, so this might be a reasonable trade-off.

Dialogue answered 13/3, 2013 at 18:28 Comment(0)
G
0

Coding point of view: one can use unordered_map in c++ to implement this.

unordered_map<value,node*>mp;

Where node* is a pointer to a structure storing the key, left and right pointers!

How to use:

If you have a value v and you want to delete that node just do:

  1. Access that nodes value like mp[v].

  2. Now just make its left pointer point to the node on its right.

And voila, you are done.

(Just to remind, in C++ unordered_map takes an average O(1) to access a specific value stored.)

Garotte answered 3/6, 2015 at 17:0 Comment(0)
R
0

While going through the textbook, I also got confused on the same topic(whether "x" is a pointer to an element or the element itself) and then eventually landed upon this question. But after going through the above discussion and referring textbook again, I think in the book "x" is implicitly assumed to be a "node" and its possible attributes are "key", "next".

Some lines form the textbook..

1)CHAINED-HASH-INSERT(T,x) insert x at the head of list T[h(x.key)]

2)If the lists were only singly linked, then to delete element x, we would first have to find x in the list T[h(x.key)] so that we could update the next attribute of x’s predecessor.

Hence we can assume that the pointer to the element is given and I think Fezvez has given a good explanation for the asked question.

Rampant answered 6/1, 2019 at 3:41 Comment(0)
C
-3

The textbook is wrong. The first member of a list has no usable "previous" pointer, so additional code is needed to find and unlink the element if it happens to be the first in the chain (typically 30 % of the elements are the head of their chain, if N=M, (when mapping N items into M slots; each slot having a separate chain.))

EDIT:

A better way then using a backlink, is to use a pointer to the link that points to us (typically the ->next link of the previous node in the list)

struct node {
   struct node **pppar;
   struct node *nxt;
   ...
   }

deletion then becomes:

*(p->pppar) = p->nxt;

And a nice feature of this method is that it works equally well for the first node on the chain (whose pppar pointer points to some pointer that is not part of a node.

UPDATE 2011-11-11

Because people fail to see my point, I'll try to illustrate. As an example there is a hashtable table (basically an array of pointers) and a buch of nodes one, two, three one of which has to be deleted.

    struct node *table[123];
    struct node *one, *two,*three;
    /* Initial situation: the chain {one,two,three}
    ** is located at slot#31 of the array */
    table[31] = one, one->next = two , two-next = three, three->next = NULL;
                one->prev = NULL, two->prev = one, three->prev = two;


    /* How to delete element one :*/
    if (one->prev == NULL) {
            table[31] = one->next;
            }
    else    {
            one->prev->next = one->next
            }
    if (one->next) {
            one->next->prev = one->prev;
            }

Now it is obvious that the obove code is O(1), but there is something nasty: it still needs array, and the index 31, so in most cases a node is "self contained", and a pointer to a node is sufficient to delete it from its chain, except when it happens to be the first node in its chain; additional information will then be needed to find table and 31.

Next, consider the equivalent structure with a pointer-to-pointer as a backlink.

    struct node {
            struct node *next;
            struct node **ppp;
            char payload[43];
            };

    struct node *table[123];
    struct node *one, *two,*three;
    /* Initial situation: the chain {one,two,three}
    ** is located at slot#31 of the array */
    table[31] = one, one-next = two , two-next = three, three->next = NULL;
                one->ppp = &table[31], two->ppp = &one->next, three->ppp = &two-next;

    /* How to delete element one */
    *(one->ppp) = one->next;
    if (one->next) one->next->ppp = one->ppp;

Note: no special cases, and no need to know the parent table. (consider the case where there is more than one hashtable, but with the same nodetypes: the delete operation would still need to know from which table the node should be removed).

Often, in the {prev,next} scenario, the special cases are avoided by adding a dummy node at the start of the double linked list; But that needs to be allocated and initialised, too.

Clarino answered 12/11, 2011 at 16:53 Comment(8)
I don't think you thought this one through. Think about how much effort this additional code is in Big-O terms.Stuccowork
You need a some extra code to assign head to the new head, but this is still constant time.Pollinosis
(typically 30 % of the elements are the head of their chain, if N=M) I totally fail to understand what this means... could you explain?Missus
@BrokenGlass: off course finding the head is O(1), but having a special code path for this case only pays of when the chains are long. Storing and maintaining the prev pointers is also a consideration.Clarino
Are we still talking about a doubly linked list here?Stuccowork
We are talking about hash tables and the usability of (double) linked list, IMHO.Clarino
+1 I believe that people did miss your point -- that this is question about hash tables which use doubly linked lists to mitigate collisions. Still, the problems that you outline here are easily correctable. First -- store your linked list as a circular linked list. Second, keep an integer with each node. If the integer is -1, no extra processing should be done. If it is positive, then it marks the index in the array whose head pointer should be updated to your own next pointer. Finally, you will set the index stored in your next pointer to your own. @Pollinosis has it right.Dialogue
@MichaelHays People miss the point completely. Of course the double pointer has to be updated on insertion, but the cost is similar to that of maintaining a p->prev pointer (the storage size is also the same) . BUT: in the p->prev - pointer case, there needs to be a separate code path for the case where the p->prev pointer is NULL (the head of the list). And similar for the p->next->prev pointer, which can only be updated if (p->next != NULL)Clarino

© 2022 - 2024 — McMap. All rights reserved.