Graph Implementations: why not use hashing?

Asked 20/8, 2013 at 0:31 Answered 25/11, 2016 at 10:33

I'm doing interview prep and reviewing graph implementations. The big ones I keep seeing are adjacency list and adjacency matrices. When we consider the runtime of basic operations, why do I never see data structures with hashing used?

In Java, for instance, an adjacency list is typically ArrayList<LinkedList<Node>>, but why don't people use HashMap<Node, HashSet<Node>>?

Let n = number of nodes and m = number of edges.

In both implementations, removing a node v involves searching through all of the collections and removing v. In the adjacency list, that's O(n^2), but in the "adjacency set", it's O(n). Likewise, removing an edge involves removing node u from v's list and node v from u's list. In the adjacency list , that's O(n), while in the adjacency set, it's O(1). Other operations, such as finding a nodes successors, finding if there exists a path between two nodes, etc. are the same with both implementations. The space complexities are also both O(n + m).

The only downside to the adjacency set I can think of is that adding nodes/edges is amortized O(1), while doing so in the adjacency list is truly O(1).

Perhaps I'm not seeing something or I forgot to consider things when calculating the runtimes, so please let me know.

Grocery answered 20/8, 2013 at 0:31 Comment(2)

My first thought is that iterating (traversing) could be slower. Computers like sequences. – Animate 20/8, 2013 at 0:37

Related: replacing linked lists with hash tables in adjacency lists – Chromatid 31/10, 2016 at 2:48

In the same vein of thought as DavidEisenstat's answer, graph implementations vary a lot. That's one of the things that doesn't come across well in lecture. There are two conceptual designs:

1) Adjacency list
2) Adjacency matrix

But you can easily augment either design to gain properties like faster insertion/removal/searches. The price is often just storing extra data! Consider implementing a relatively simple graph algorithm (like... Euler's) and see how your graph implementation causes huge effects on the run-time complexity.

To make my point a bit clearer, I'm saying that an "adjacency list" doesn't really require you to use a LinkedList. For instance, wiki cites this on their page:

An implementation suggested by Guido van Rossum uses a hash table to associate each vertex in a graph with an array of adjacent vertices. In this representation, a vertex may be represented by any hashable object. There is no explicit representation of edges as objects.

Ayah answered 20/8, 2013 at 2:23 Comment(0)

We probably don't usually see this representation because checking if an arbitrary edge is in a graph is rarely needed (I can't think of any everyday graph algorithm that relies on that), and where it is needed, we can use just one hash map for the whole graph, storing pairs (v1, v2) to represent the edges. This seems more efficient.

(Most of the common graph algorithms say something like "for every neighbour of vertex v, do ...", and then an adjacency list is perfect.)

Keefe answered 25/11, 2016 at 10:33 Comment(0)

why don't people use HashMap<Node, HashSet<Node>>?

Unless there are multiple graphs on the same set of nodes, the HashMap can be replaced by a member variable of Node.

The question of HashSet versus LinkedList is more interesting. I would guess that, for sparse graphs, LinkedList would be more efficient both in time (for operations of equivalent asymptotic complexity) and in space. I don't have much experience with either representation, because depending on the algorithm requirements I usually prefer either to (i) store the adjacency lists as consecutive subarrays or (ii) have for each edge an explicit object or pair of objects that stores information about the edge (e.g., weight) and participates in two circular doubly linked lists (my own implementation, because the Java and C++ standard libraries do not support intrusive data structures), making node deletion proportional to the degree of the node and edge deletion O(1).

The running times you quote for the hashes are not worst-case, only high-probability against an oblivious adversary, though they can be unamortized at the cost of further degrading the constant factors.

Concierge answered 20/8, 2013 at 1:27 Comment(0)

Many theory problems involve a fixed set of vertices and edges - there's no removal.

Many / most graph algorithms involve either simply iterating through all edges in the adjacency list or something more complex (for which an additional data structure is required).

Given the above, you get all of the advantages of an array (e.g. O(1) random access, space efficient) to represent vertices with none of the disadvantages (e.g. fixed size, O(n) search / index insert / remove), and all the advantages of a linked-list (e.g. O(1) insert, space efficient for unknown number of elements) to represent edges with none of the disadvantages (O(n) search / random access).

But... what about hashing?

Sure, hashing has comparable efficiency for the required operations, but the constant factors are worse and there's an unpredictability since the performance is dependent on a good hash function and well-distributed data.

Now it's not a rule that you shouldn't use hashing, if your problem calls for it, go for it.

Jezabella answered 20/8, 2013 at 7:35 Comment(2)

Given the above, you get all of the advantages of an array ... and all the advantages of a linked-list - what data structure are you referring to? I wasn't sure if you meant the OP idea of using HashSet or something else. – Pugilist 24/11, 2016 at 18:15

@Pugilist Array to represent vertices, linked-list to represent edges (so ArrayList<LinkedList>). Sometimes you won't benefit that much from some of the advantages, and the disadvantages will affect you greatly, in which case another data structure is probably preferable. – Jezabella 4/12, 2016 at 10:26

Recommended topics

Hot tags