When to use graph databases, ontologies, and knowledge graphs

Asked 15/7, 2021 at 17:6 Answered 30/11, 2023 at 16:7

graph-databases ontology knowledge-graph

I've been struggling to understand when these technologies are useful from a practical standpoint, and how they are different from each other. Could an expert check my understanding?

Graph databases: These are easier to understand and manage than a relational databases when relationships are complex, inherited, inferred with varying degrees of confidence, and likely to change. Some examples: a user doesn't know how much depth they'll need in a hierarchy; is inferring relationships from social media with varying degrees of confidence in ID resolution, topic resolution, and the strength of a relationship; or doesn't know what kinds of call center data they're going to want to store; all of these can be stored in relational databases, but they will need constant updates. They're also more performative for certain tasks.
Ontologies: These formal and standardized representations of knowledge are used to break down data silos. For instance, let's say a B2B sales company derives revenue from several different lines of business, which take one-time payments, subscriptions, sales of IP, and consulting services. The revenue data is stored in many different databases with lots of idiosyncrasies. An ontology allows the user to define a "customer payment" as anything that "creates or refunds revenue," so that subject matter experts can appropriately label the payments in their databases. Ontologies can be used with either graph databases or relational databases, but the emphasis on class inheritance makes them far easier to implement in a graph database, where the taxonomy of classes can be easily modeled.
Knowledge graph: A knowledge graph is a graph database where language (meaning, the entity and node taxonomies) are governed by an ontology. So in our B2B example, "customer payment" edges have one-time payments, subscription, etc subtypes, and connect "customer" classes to "line of business" classes.

Is that basically correct?

Informer answered 15/7, 2021 at 17:6 Comment(1)

Sounds basically correct. – Mab 12/12, 2021 at 14:43

A graph database (GD) is a database that can store graph data, which primarily has three types of elements: nodes, edges, and properties. Two popular types of graph databases are (1) Resource Description Framework (RDF)-based graph databases eg. Blazegraph and (2) Label Propagation Graph (LPG)-based graph databases eg. Neo4j. RDF represents knowledge in the form of subject, verb, and object (S-V-O) triplet, such as John livesIn London, and as its nodes and edges cannot hold properties, additional nodes or literals needs to be added to represent properties. LPG represents knowledge in the form of edges, nodes, and attributes where nodes and edges can hold properties in the form of key:value. Eg., a node can have label Person, and Person can have properties name:Tom Hanks, born:1956.

An ontology is a description of the concepts and their relationships, using instances of concepts, attributes of instances (and classes), restrictions of classes, and rules (if-then statements). These rules describe the logical inferences that can be drawn from the assertions/axioms that comprise the overall theory that the ontology describes. Upper level ontology (eg. DOLCE) describes general concepts and relations, whereas domain ontology (eg. Gene Ontology) describes concepts and relations in a particular domain. A graph database may have an ontology in its schema level for logical consistency checking.

Generally, a knowledge graph (KG) is an organization of a knowledge base as a graph having nodes and links between the nodes. An example of an early KG is Wordnet which captures semantic relationships between words and their meanings. Later Google developed their Google Knowledge Graph (GKG) building on DBpedia and Freebase using RDFa, Microdata and JSON-LD content extracted from indexed web pages, and used schema.org vocabulary to organize the nodes. Google reported that it held around 70 billion facts in GKG.

Graph databases supports queries, but not logical inference which needs an ontology. If the connections within the data are of primary focus (eg. friends of a friend), retrieval more important than storage, and data model changes often, then graph database would be a good fit.

Ontology is used when we need to infer new knowledge from the given knowledge. For eg, if given (1) Socrates is Man, and (2) AllMen are Mortal, then the reasoner or inference engine in an ontology can infer a new knowledge (3) Socrates is Mortal. This is made possible by description logic axioms that the Web Ontology Language (OWL) uses to describe resources. The OWL is serialized using Resource Description Framework (RDF).

Ontology is also used when we need to check consistency in the data model. For eg, if an axiom says Human and Sponge are disjoint classes and we make John (a human) instance of both Human and Sponge classes then it will fail consistency test.

Taxonomy is the IS-A class hierarchy which forms the backbone of an ontology.

Knowledge Graphs are often associated with linked open data (LOD) projects built upon standard Web technologies such as HTTP, RDF, URIs, and SPARQL. KG may use ontologies for reasoning and graph databases to store the knowledge. Several large organizations have introduced their KGs.

Unprecedented answered 12/12, 2021 at 12:50 Comment(0)

-1

Excellent answer!

Adding additional examples:

A taxonomy is used to classify collections of data. Two examples:

Libraries use the Dewey Decimal Classification, first published in 1876, to classify library holdings for rapid retrieval.
The National Library of Medicine (NLM) uses its Medical Subject Headings (MeSH) taxonomy to classify citations for retrieval via PubMed.

An ontology can be used to used to infer gaps in existing taxonomies by linking multiple taxonomies into a meta-taxonomy. For example, the MeSH taxonomy does not contain a classification for stage IIIa non-small-cell lung cancer (NSCLC). But the NCI Thesaurus, a taxonomy developed by the National Cancer Institute, does have a classification for stage IIIa NSCLC. The NLM's Unified Medical Language System Metathesaurus and Semantic Network comprise a meta-ontology -- a superset of MeSH, the NCI Thesaurus, and more than 200 other clinical taxonomies, that allows for the classification and retrieval of citations that discuss the treatment of stage IIIa NSCLC even though MeSH does not include a classifier for stage IIIa NSCLC.

Airing answered 30/11, 2023 at 16:7 Comment(0)

Recommended topics

Hot tags