anybody tried neo4j vs titan - pros and cons [closed]
Asked Answered
G

2

64

Can anybody please provide or point out to a good comparison between Neo4j and Titan? One thing i can see is in terms of scale - Titan is scaleout and requires an underlying scalable datastore like cassandra. Neo4j is only for HA and has its own embedded database. Any other pros and cons? Any specific usecases. (Is Titan being used anywhere currently?)

I also have the following link: http://architects.dzone.com/articles/16-graph-databases-compared that gives a objective compare for graph databases but not much on pros and cons between Neo4j and Titan.

Gimcrackery answered 24/6, 2013 at 6:34 Comment(1)
You may have a look @ [ Titan vs Neo4j](groups.google.com/d/msg/aureliusgraphs/vkQkzjN8fo0/9YYgqI4TA0QJ), it may help you..Bases
C
27

We have a social graph in which in a day we add almost 1 millions of node and twice as many edges. We started with neo4j graph because yes, it is very fast due to fact that its storage is on the same machine on which graph engine runs. But following are the experiences that we would like to share with you about neo4j.

  1. Not good fit for real time query. We have social structure like twitter. We have to show latest 20 activities (and its associated activities) of all the users that a user follow on his time line. We have some users who follows more than 1000 users. The gremlin query that we wrote for this (if you are interested then we can share gremlin query) really produced so much GC that a server with 8 cpu and 48 gb ram used to freeze and we had to restart the server to get it online again.
  2. Many a time network partition observed.
  3. There is not vertex centric index that is very much required in graoh database.

Ultimately we are so much fade up with server performance with gremlin query that we had to change the database to titan.

On titan we are getting reasonable performance and also scaling is very easy as we are using cassandra as backend storage. But mind you that .. using gremlin here also not a good idea as multiget query is very ugly to write and without multiget its query becomes very slow.

Capet answered 7/6, 2014 at 18:33 Comment(5)
Hi. I would be really interested in knowing more about your setup. Would be cool if you could write a blog. If you prefer to talk privately im sorenbs on twitter or gmail.Hands
Why Gremlin and not Cipher? And was this on Neo 1.9 or 2? Just curious.Evangelin
Hi, its been now more than a year since we used it. It was definitely not 2, either 1.6 or 1.7. I don't remember exactly. At that time cypher was not that popular, it was still in nascent form. The main advantage of titan over neo4j as i perceive now is, its ability to scale and providing more than one vertex centric index (VCI) which in our case become very important because we have generally very large set of childs and without VCI it become very lethargic.Capet
At the same time, it takes a while for titan to be stable as data corruption happens. Though we observed data corruption on neo4j as well once but we had enterprise support and they fixed it by a patch.Capet
@Capet so you're saying Neo4J doesn't scale? Funny how that's the big word used to describe entire reasons to use N4J is it's ability to scale.Bonnette
W
17

Great to see you exploring graph databases. I will speak to the Neo4j part of your question:

More than 30 of the Global 2000 now use Neo4j in production for a wide range of use cases, many of them surprising, even to us! (And we invented the property graph!)

A partial list of customers can be found below: www.neotechnology.com/customers

Neo4j has been in 24x7 production for 10 years, and while the product has of course evolved significantly since then, it's built on a very solid foundation.

Most the companies moving to graph databases--speaking for Neo4j, which is what I know about-- are doing so because either a) their RDBMSs weren't able to handle the scope & scale of their connected query requirements, and/or b) the immense convenience and speed that comes from modeling domains that are a graph (social, network & data center management, fraud, portfolios, identity, etc.) as a graph, not as tables.

For kicks, you can find a number of customer talks here, from the four (soon five) GraphConnect conferences that were held this year in major cities around the world:

http://watch.neo4j.org/

If you're in London, the last one will be held next week: http://www.graphconnect.com

You'll find a summary below of some of the technology behind Neo4j, with some customer examples. To speak very directly to your question about scaling: Neo4j has a unique architecture designed to maximize query response time & query predictability, by allowing horizontal scale-out in such a way that each instance can access the graph without having to hop over the network. (Need more read throughput. Just add instances.) It turns out that this approach works well for 95+% of the graphs out there, including some production customers who have more than half of the Facebook social graph running in a single Neo4j cluster, backing an "always on" 24x7 web site.

www.neotechnology.com/neo4j-scales-for-the-enterprise/

One of the world's largest postal delivery services does all of their real-time package routing with Neo4j. Railroads are building routing systems on Neo4j. Some of the world's largest customers are using them for HR and data governance, alternate-path routing, network & data center management, real-time fraud detection, bioinformatics, etc.

Neo4j's Cypher query language is the only declarative query language built expressly for property graphs. It takes all of the lessons learned from our 13-year old native Java API (which was the basis for Blueprints, which some of the other graph databases have since adopted) and rolls them into a next-generation language. Cypher is a great way to learn graphs, and to develop applications; and there's always the native Java API if you have special needs or value "bare metal" performance (i.e. sub millisecond vs. single-digit millisecond) performance above convenience. Neo4j is built from the ground up to support graphs, and has a graph storage engine that is built to store graphs; unlike some of the more recent additions to the graph database ecosystem, which are architected as graph libraries on top of non-graph databases, and are subject to some of the inherent limitations. (e.g. FlockDB, because it is based on MySQL, will still be very slow for anything greater than one hop.)

Definitely feel free to contact the Neo team if you need anything more specific. We'll be more than happy to help you! http://info.neotechnology.com/ContactUs.html

Good luck!

Weeds answered 11/11, 2013 at 23:38 Comment(2)
Thanks Philip! Neo4j is sufficient for my requirements as of now and so i have already gone ahead with it. Also i believe it should be able to scale up nicely and scaleout in its own sense. Plus i also saw that the neo4j clustering and backup is free for small starups (less than 3 employees or $100k in revenue) which is really great for me. Also I realize Neo4j has a much larger footprint and fan following. I was going through Titan and it seems to be really suitable for ultra huge graphs and so i brought up this post. I believe with huge organizations like....Gimcrackery
....facebook and linkedin having a proper horizontal scaleout architecture would be imperative. But as i said i am not as large as others and i have started working on Neo4j a couple months back so i wouldnt dare rule out Neo4j capabilities in terms of scaling and i am really loving working on Neo4j.Gimcrackery

© 2022 - 2024 — McMap. All rights reserved.