High Level Java Client selection for Apache Cassandra [closed]
Asked Answered
A

7

14

There are four high level APIs to access Cassandra and I do not have time to try them all. So I hoped to find somebody who could help me to choose the proper one.

I'll try to write down my findings about them:

Datanucleus-Cassandra-Plugin

pros:

  • supports JPA1, JPA2, JDO1 - JDO3 - as I read in a review, JDO scales better than Hibernate with JPA
  • all the pros as mentioned in kundera?

cons:

  • no exeirience with JDO up to now (relevant only for me of course ;)
  • documentation not found!

kundera

pros:

  • JPA 1.0 annotations with all advantages (standard conform, no boilerplate code, ...)
  • promise for following features in near future: JPA listeners, @PrePersist @PostPersist etc. - Relationships, @OneToMany, @ManyToMany etc. - Transactional support, @Transactional

cons:

  • early development stage of the plugin?
  • bugs?
  • no possibillity to fix problems in the JDO / JPA framework?

s7 pelops

pros:

  • pure java api --> finer control over persistence?

cons:

  • pure java api --> boilerplate code

hector 0.7

pros:

  • mavenized
  • spring integration --> dependency injection
  • pure java api --> finer control over persistence?
  • jmx monitoring?
  • managing of nodes seems to be easy and flexible

cons:

  • pure java api (no annotations) --> boiler plate code

Conclusion so far

As I am confident with RDMS, Hibernate, JPA, Spring and not so up to date anymore with EJB, my first impression was, to go for kundera would have been the right choice. But after reading some posts regarding JPO, DataNucleus, I am not sure anymore. As the learning curve should be steep (also for expirienced JPA developers?) for DataNucleus, I am not sure, whether I should go for it.

My major concern is the status of the plugin. Also the forum support/help for JDO and Datanucleus-Cassandra-Plugin, as it is not as wide spread, as far as I understood.

Is anybody out there, who has experience, with some of the framworks already and can give me a hint? Maybe a mixed strategy would make sense as well. In cases (if they exist) JDO is not flexible/sufficient/whatever enough for my needs, to fall back to one of the easier APIs of pelops or hector? Is this possible? Is there an approach like in JPA to get an sql connection and fetch/put data?


After reading a bit on, I found following additional information:

Datanucleus-Cassandra-Plugin is based on the pelops, which also can be accessed for more flexibility, more performance (?), which should be used on the column families with a lot of data, JDO/JPA access should be only used on "administrative" data, where performance is not so important and data amount is not overwhelming.

Which still leaves the question open to start with hector or pelops.

pelops for it's later Datanucleus-Cassandra-Plugin extensibility, or hector for it's more sufficient support on node hanldling.

Agnew answered 8/3, 2011 at 11:58 Comment(5)
Stick with Hector and you wont get disappointed (pelops looks promising also, havent got time to test it myself yet though)Pulmonate
Hector....works with Cassandra like a charm....Altenatively, use Avro.Obla
You say "Doc not found" for DN Cassandra plugin, but the doc is JDO or JPA, so just refer to the DataNucleus docs (v2.2) for that. Todd Nine's plugin and Pedro Gomez's plugin should build/work against DN 2.2 (which one are you using? I think Todd's is more complete). The only specific bit for Cassandra is the URL you pass in (and which features are supported).Assumption
Thanks for your tips. First I will stick with hector an then proceed to DataNucleus, when I better understand, what I am doing...Agnew
DataNucleus JDO/JPA provide their own Cassandra plugin since 2 years ago (2013). It is not based on pelops, instead using the newer CQL so hence tracking recent Cassandra developmentsTrooper
U
9

I tried most of these solutions and find hector the best. Even when you have some problem you can always reach people who wrote hector in #cassandra in freenode. and the code is more mature as far as I concern. In cassandra client the most critical part would be connection pooling management (since all the clients do mostly the same operations through thrift, but connection pooling is what makes high level client roll). In that case I would vote for hector since I am using it in production for over a year now with no visible problem (1 reconnect issue fixed as soon as I discovered and send an email about it).

I am still using cassandra 0.6 though.

Ute answered 8/3, 2011 at 12:7 Comment(2)
Thanks for your answer. Really looks like I will go this way. Also the example application is build on hector. Also the promised support of DataNucleus with JDO and JPA sounds nice.Agnew
I also like hector because cassandra-unit uses it already and I can create unit tests more easily that actually work with cassandra itself rather than mocks.Allista
R
7

The author of the datanucleus plugin, Todd Nine, is working on the next-gen JPA support in Hector now.

Remaremain answered 8/3, 2011 at 16:10 Comment(1)
This sounds nice. Actually, I possibly would have choosen for the possibillity to go the JPA, JDO path later on. But knowing, this will be possible in future with hector as well, it seems better to go for hector for it's better node management support. Thanks for your hint!Agnew
S
3

The Hector client was the API that we choose because of the following things that it had:

  • Connection Pooling (huge performance gain when sharing a connection to a node)
  • Complete Custom Configuration using interfaces for most everything.
  • Auto Discovery Hosts
  • Custom Load Balancing Policy definitions (LeastActiveBalancingPolicy or RoundRobinBalancingPolicy or implement LoadBalancingPolicy)
  • Light-weight adapter on top of the Thrift API.
  • Great examples: See hector-examples
  • Built in JMX support.

Downside of Hector:

  • Documentation not bad, but the Java Docs are lacking a bit. That could easily be a Git fork / pull request by the user community.
  • The ORM support was a bit limited, but not urgent for usage in our case. I couldn't get some of the one-to-many associations to work easily, plus lack of describing what type of Cassandra model (super columns or column families for associated collections). Also a lack of Java examples (maybe there are some, please post if you find some).

Also, I tried using kundera with very little success. Not many examples to use or try, very little forum support. It appears to be maintained by one person, which makes it even hard to choose a tool like that. It appears based on the SVN activity it was migrating to using Hadoop instead or support for it as well.

Sistrunk answered 23/3, 2011 at 20:1 Comment(0)
L
2

I suggest you give Kundera-2.0.1 a try. It has gone a major change since its inception and I see a lot of new features getting added and bugs being fixed. Currently it supports JPA 1.0 and Cassandra 0.7.6 but they are planning to add support for Cassandra 0.8 and JPA 2.0 very soon. There is a pretty good example here: https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes

Laritalariviere answered 14/7, 2011 at 17:46 Comment(0)
A
2

Kundera 2.0.4 released.

Major Changes in this release:

  • Cross-datastore persistence( Easy to migerate existing mysql app over nosql)
  • support for relational databases (e.g Mysql etc)
  • replace solandra with lucene based indexing.
  • Support added for bi-directinal associations.
  • Performance improvement fixes.
Aun answered 12/12, 2011 at 7:47 Comment(0)
K
2

I would propose also Astyanax, I'm working with it and I'm quite happy. Only the documentation is not really good.

Astyanax API

Astyanax implements a fluent API which guides the caller to narrow or customize the query via a set of well defined interfaces. We've also included some recipes that will be executed efficiently and as close to the low level RPC layer as possible. The client also makes heavy use of generics and overloading to almost eliminate the need to specify serializers. Some key features of the API include:

  • Key and column types are defined in a ColumnFamily class which eliminates the need to specify serializers.
  • Multiple column family key types in the same keyspace. Annotation based composite column names.
  • Automatic pagination.
  • Parallelized queries that are token aware.
  • Configurable consistency level per operation.
  • Configurable retry policy per operation.
  • Pin operations to specific node.
  • Async operations with a single timeout using Futures.
  • Simple annotation based object mapping.
  • Operation result returns host, latency, attempt count.
  • Tracer interfaces to log custom events for operation failure and success.
  • Optimized batch mutation.
  • Completely hide the clock for the caller, but provide hooks to customize it.
  • Simple CQL support.
  • RangeBuilders to simplify constructing simple as well as composite column ranges.
  • Composite builders to simplify creating composite column names.

Recipes Recipes for some common use cases:

  • CSV importer.
  • JSON exporter to convert any query result to JSON with a wide range of customizations.
  • Parallel reverse index search.
  • Key unique constraint validation.

http://techblog.netflix.com/2012/01/announcing-astyanax.html

Kilogram answered 21/5, 2013 at 12:25 Comment(0)
S
2

You can try Achilles, a new Entity Manager I've developed that supports all CQL3 features.

  • Entity mapping
  • JPA style operations
  • Limited support for join
  • Mapping of clustered entities using compound primary key
  • Queries (native, typed, slice)
  • Support for counters
  • Support for Consistency level
  • TTL & timestamp
  • JUnit 4 Rule to start embedded Cassandra server for testing

    And so more ...

    There are 2 implementations: Thrift & CQL

    The Thrift version relies on Hector under the hood.

    The CQL version pulls the brand new Java Driver Core from Datastax for all operations

    Quick reference here

Sharondasharos answered 31/8, 2013 at 16:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.