Represent graph data as a key-value object
Asked Answered
O

6

21

I'm starting to dig into graph databases, but i have no idea, how these graphs are stored internally. Let's say i have this graph (taken from Wikipedia):

How do i serialize this graph as a key-value object? (a Python dict, for example)

I imagine two dicts, one for vertices and one for edges:

{'vertices':
 {'1': {'Name': 'Alice', 'Age': 18},
  '2': {'Name': 'Bob', 'Age': 22},
  '3': {'Type': 'Group', 'Name': 'Chess'}},
 'edges':
 {'100': {'Label': 'knows', 'Since': '2001/10/03'},
  '101': {'Label': 'knows', 'Since': '2001/10/04'},
  '102': {'Label': 'is_member', 'Since': '2005/7/01'},
  '103': {'Label': 'Members'},
  '104': {'Label': 'Members'},
  '105': {'Label': 'is_member', 'Since': '2011/02/14'}},
 'connections': [['1', '2', '100'], ['2', '1', '101'],
                 ['1', '3', '102'], ['3', '1', '103'],
                 ['3', '2', '104'], ['2', '3', '105']]}

But i'm not sure, whether this is the most practical implementation. Maybe the "connections" should be inside "vertices" dict. So, what is the best way to implement graph datastore using key-value objects? What and where can i read more about it?

Possibly related, but not a duplicate: How to represent a strange graph in some data structure

Overly answered 5/7, 2012 at 10:47 Comment(1)
Are you requiring that a key-value be used or can any data structure be used? If not then I might post an answer.Pietra
P
12

The normal pattern is to not have a separate connections structure but to put that information in the edges structure. This gives something like:

{
'vertices': {
    '1': {'Name': 'Alice', 'Age': 18},
    '2': {'Name': 'Bob', 'Age': 22},
    '3': {'Type': 'Group', 'Name': 'Chess'} },
'edges': [
    {'from': '1', 'to': '2', 'Label': 'knows', 'Since': '2001/10/03'},
    {'from': '2', 'to': '1', 'Label': 'knows', 'Since': '2001/10/04'},
    {'from': '1', 'to': '3', 'Label': 'is_member', 'Since': '2005/7/01'},
    {'from': '3', 'to': '1', 'Label': 'Members'},
    {'from': '3', 'to': '2', 'Label': 'Members'},
    {'from': '2', 'to': '3', 'Label': 'is_member', 'Since': '2011/02/14'} ] }
Pilpul answered 18/11, 2013 at 7:4 Comment(0)
B
5

seems ok - each object has its it, there is no duplications. it's good for 'read and process purpose'. but there is no 'best' representation. it always depends on your purpose. do you want to be able to quickly find vertices by name? or edges by date? or maybe you want to quickly test if two vertices are connected? or the opposite - you want to quickly modify some parts of the graph? each purpose requires different data structures of database tables

Brilliant answered 15/7, 2012 at 23:2 Comment(0)
W
4

how these graphs are stored internally

how do I serialize this graph as a key-value object

These questions are different and they need different answers.

In the former case, the main requirement is probably to perform complex queries efficiently.
I'd suggest to investigate existing industrial-strength solutions.

In NoSQL terms, these nested key-value objects are documents. Hence, one could look into how graphs are stored in "layered" multi-model databases that:

  • support graph data model, and
  • use underlying document data model.

Examples of such databases are ArangoDB, OrientDB, Azure CosmosDB.

You could also replace "document data model" with "wide column data model", because wide column data model can be conidered as two-dimensional key-value model.

Examples of such databases are DataStax Enterprise Graph and perhaps Grakn.


For instance, in ArangoDB, edges are stored as regular documents, but in special collections.

Obviously, data structures used may be accompanied with additional indexes etc. (or not).


So, what is the best way to implement graph datastore using key-value objects?

What and where can i read more about it?

I'd suggest another one article from ArangoDB:

Withershins answered 29/10, 2018 at 23:33 Comment(0)
C
2

I'd make few changes in Eamonn's answer.

Every vertex and edge has 3 things.. id, Label and Properties

{
'vertices': {
    '1': {'Label' : Person, 'Properties' : { 'Name': 'Alice', 'Age': 18}},
    '2': {'Label' : Person, 'Properties' : {'Name': 'Bob', 'Age': 22}},
    '3': {'Label': 'Group', 'Properties' : { 'Name': 'Chess'} },
'edges': [
    '4' : {'from': '1', 'to': '2', 'Label': 'knows', 'Properties':{'Since': '2001/10/03' , 'Until' : '2001/10/03'}},
    '5' : {'from': '2', 'to': '1', 'Label': 'knows', 'Properties':{'Since': '2001/10/04', 'Until' : '2001/10/05'}}
 ]
}

This way you can do query by vertex/edge, and their Labels and their properties.

Congo answered 31/10, 2018 at 10:5 Comment(0)
R
1

I would serialize it like this, except you should choose the keys based on what you are looking up by. I assumed you are using the id, but perhaps using the name could be better.

{
    'members': {
        '1': {
            'id': '1',
            'name': 'Alice',
            'age': 18,
            'groups': {
                '3': {
                    'path': 'groups.3',
                    'since': '2005-07-01'
                }
            },
            'knows': {
                '2': {
                    'path': 'members.2',
                    'since': '2001-10-03'
                }
            }
        },
        '2': {
            'id': '2',
            'name': 'Bob',
            'age': 22,
            'groups': {
                '3': {
                    'path': 'groups.3',
                    'since': '2011-02-14'
                }
            },
            'knows': {
                '1': {
                    'path': 'members.1',
                    'since': '2001-10-04'
                }
            }
        }
    },
    'groups': {
        '3': {
            'id': '3',
            'name': 'Chess',
            'members': {
                '1': { 'path': 'members.1' },
                '2': { 'path': 'members.2' }
            }
        }
    }
}

You can serialize graphs directly into key-value pairs if you have a way of serializing references to other parts of the graph, which is what I use 'path' for. If I was deserializing it into a dict, I may consider replacing the path values with the actual dictionaries they refer to. Keep in mind that this may cause circular references which could cause problems if you were serializing it into json or something.

Repulse answered 31/10, 2018 at 13:33 Comment(0)
J
1

I would add an adjacency to the structure too. My take would be like this,

{
  'vertices': {
    '1': {'Name': 'Alice', 'Age': 18},
    '2': {'Name': 'Bob', 'Age': 22},
    '3': {'Type': 'Group', 'Name': 'Chess'} 
   },
'edges': {
 '100' : {'from': '1', 'to': '2', 'Label': 'knows', 'Since': '2001/10/03'},
 '101': {'from': '2', 'to': '1', 'Label': 'knows', 'Since': '2001/10/04'},
 ....
  },
'adjacency': {
  '1': ['101', '102'],
  ...
  }
}

This way I can easily find which edges are adjacent to my vertices instead of iterating through all the edges.

January answered 31/10, 2018 at 14:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.