Model an undirected graph in Rails?
Asked Answered
G

3

21

Importing the language of graph databases, understand

  1. nodes (represented by circles),
  2. edges (represented by arrows), and
  3. properties (metadata of nodes / edges)

Graph Database Property Graph

The graphic (courtesy of wikipedia) describes a directed graph.

What's the best way to model an undirected graph in Rails?

That is to say, a graph where all edges are reciprocal (as in above graphic), and where the properties of each edge are the same regardless of direction (contrary to above graphic).

Let's assume a default Rails 3 setup using a sql store via ActiveRecord.

A double polymorphic association would create a directed graph, able to model the data described by the above image.

def Edge < ActiveRecord::Base
  belongs_to :head, polymorphic: true
  belongs_to :tail, polymorphic: true
end

class Node < ActiveRecord::Base
  has_many :from, as: :head
  has_many :to, as: :tail
end

class Group < ActiveRecord::Base
  # a Node of Type: Group
  has_many :from, as: :head
  has_many :to, as: :tail
end

Should one extend this model to manage inverse relationships, or is a better model available?


One element of an app may be a graph problem, but it does not mean the app is centered around the problem, that graph transversals must be performed on the data, nor that the dataset is larger than available memory.

Genaro answered 2/11, 2011 at 5:45 Comment(3)
If you need high performance with large graphs, You need to work on your assumptions. This is a bad fit for an (sql) RDBMS.Giule
A bad fit for large graphs? Absolutely. But possible nonetheless. Swapping or modifying a storage layer after an initial prototype once one has an example of the real data one will be dealing with is preferable to initial added complexity in my book. (invoke Knuth "premature optimization...")Genaro
Correct tool and design choices are not the same as premature optimization. You know how to use a hammer really well, and you can drive a screw with a hammer, but that doesn't mean it is the best tool for the job. Switching to a screwdriver at this point isn't a premature optimization. If you intend to take this project seriously, and it is more than a toy, then considerations like this make total sense upfront. If this is merely an experiment to see how well a relational database can store a graph, that's ok too, but let's add it to the question so we know that is main intent.Sphygmoid
S
13

In an undirected graph, the only thing you need to know, is whether a node is connected to another node. And there is no such thing as a direction.

Simple approach:

class Node
  has_many :connected_nodes
  has_many :nodes, :through => :connected_nodes
end

class ConnectedNode
  belongs_to :node
  belongs_to :connected_node, :class_name => 'Node'
end

This is also called an adjacency list: for each node we can easily get the list of adjacent (connected) nodes.

A possible problem with this approach: we store the connections twice. A is connected to B and B is connected to A.

So it seems better normalized to store each connection only once, and then we get really close to your original proposal.

class Connection
  belongs_to :node1, :class_name => 'Node'
  belongs_to :node2, :clasS_name => 'Node'
end

Only we do our very best to not impose any order or direction through the naming.

Retrieving the connected nodes is all the nodes connected to as node1 or as node2, hence effectively disregarding any possible direction.

In this case you also need to express a validation that a connection with (node1, node2) is unique, but that (node2, node1) is actually the same and cannot be inserted twice.

My personal choice would be to use the second schema, although maintaining the first solution might be quicker (see also this question).

I also found a very interesting article where the author explains how graphs can be stored in the database. Very profound, but more database centric.

Hope this helps.

Spies answered 11/11, 2011 at 21:3 Comment(3)
I agree that I only want to store connections/edges once in the database, so I prefer your second example. But how would my Node class look in this example? It seems like the has_many relation of ActiveRecord is always directed, isn't it?Saretta
node1.connections will yield node2. but node2.connections will not yield anything. @SpiesTwinned
I did not show how to implement it (but described it: look for all nodes connected as node1 or as node2). It seems you only look for one kind? Please ask another question, where you can show what you tried and what is going wrong and put the link here and I will have a look.Spies
D
3

Instead of using polymorphic associations, try using has_many, :through

class Group < ActiveRecord::Base
  has_many :memberships
  has_many :persons, :through => :memberships
end

class Membership < ActiveRecord::Base
  belongs_to :group
  belongs_to :person
end

class Person < ActiveRecord::Base
  has_many :memberships
  has_many :groups, :through => :memberships
end

You can store the properties of the edge int the Membership model.

Delhi answered 2/11, 2011 at 17:49 Comment(1)
By my understanding, a has_many through would create an efficient undirected graph with the addition of an add_index :memberships, [:group_id, :person_id], unique: true in the migration at the cost of table sprawl. Attempting to precisely model the diagram, an additional table is needed in your example to handle the self referential knows edge on the Person class.Genaro
Q
3

Why not use Neo4J?

http://wiki.neo4j.org/content/Ruby

https://github.com/andreasronge/neo4j-rails-example

https://github.com/andreasronge/neo4j

Quinquefid answered 7/11, 2011 at 11:7 Comment(3)
Considering graph databases are the first link in the question, let's assume people have read both preexisting posts. This question arose through my own prototyping, and IMHO breaking out a graph database when writing the first lines of code is overkill. If you disagree, an explanation would be greatly appreciated.Genaro
I completely missed the 'using a sql store' point. GDBs are a good solution for these tasks since they provide good link walking performance and queries. If no serious load or long link walks are intended, join table with additional fields is a good solution also.Quinquefid
For a small graph, just keep it in memory and store it as a blob if persistence is needed. For a large graph, just count the number of disk accesses needed. RDBMS joins kill performance.Giule

© 2022 - 2024 — McMap. All rights reserved.