Understanding the philosophy behind Cassandra

I am trying to get familiar with Apache Cassandra, for a particular PoC work. After going through various articles on the net, trying out various libraries/clients available, a particular question pops up in my mind.

The initial reason why we thought of Cassandra, is because we wanted a 'truly' distributed datastore. From my understanding of 'distribution', it ultimately boils down to some sort of 'key-value' and some sort of 'consistent hashing', if I am able to express myself in a super succinct manner!

So a key-value store like Cassandra is a perfect fit. However, as I try to delve for articles to understand data modelling in Cassandra, almost all of them explain/exemplify using CQL. Also, the official proclamation seems to be that CQL should be the "de jure" way to learn Cassandra. Why such a push to fall in line with SQL?

I do not need relational model, and that is why I have come to Cassandra. I appreciate its underlying concepts, like partitioned key/clustering columns etc, and I would want to understand it how it is implemented underneath the hoods of CQL.

Asking the experts on Cassandra, am I actually a misfit as a Cassandra user? Should I really forget about key value and just try to fit CQL (if possible) in my use case?

CQL is more than a "sugar", even though was initially created to encourage people in the migration from SQL world. The world before CQL was a mess, dozens of clients written in different ways all using Thrift protocol -- but unlike SQL world the Cassandra one is improving everyday, bringing new features in every release -- and very often each of these improvements would require a new "client version", capable of handling the new kind of results generated (think about counters or collections for instance) or the new syntax to use the new feature.

I'm glad I've had the possibility to go in production, for more than 3 years, with a Thrift client (Pelops) -- this helped me understanding a lot of the cassandra world, data structures and so on -- but now I'd never go back to such client (even though it was really great!).

At the beginning Cassandra was completely different in particular was/had

"schema-less" meaning that each row of a CF could contains a different number of columns and there was no place where these columns had to be declared. This brought many projects to disaster, the possibility to add new columns at "runtime" led to a situation where you didn't know what you could find in a table.
"super-columns" a deprecated data structure replaced by wide-rows

Now that the data model is stable CQL syntax brings more readability and you can now migrate to any project you're not so familiar with the possibility to understand how the application talk to the DB thanks to a unique syntax -- more -- every new Cassandra release is immediately followed by the new version of the client.

CQL is not a "subset" of SQL, like many people write: in some way it's a "superset" because it is capable of handling different data structures extending the base language.

My answer is: think in key-value way but use ONLY CQL

HTH, Carlo

Recommended topics

Hot tags