Understanding the philosophy behind Cassandra
Asked Answered
A

1

3

I am trying to get familiar with Apache Cassandra, for a particular PoC work. After going through various articles on the net, trying out various libraries/clients available, a particular question pops up in my mind.

The initial reason why we thought of Cassandra, is because we wanted a 'truly' distributed datastore. From my understanding of 'distribution', it ultimately boils down to some sort of 'key-value' and some sort of 'consistent hashing', if I am able to express myself in a super succinct manner!

So a key-value store like Cassandra is a perfect fit. However, as I try to delve for articles to understand data modelling in Cassandra, almost all of them explain/exemplify using CQL. Also, the official proclamation seems to be that CQL should be the "de jure" way to learn Cassandra. Why such a push to fall in line with SQL?

I do not need relational model, and that is why I have come to Cassandra. I appreciate its underlying concepts, like partitioned key/clustering columns etc, and I would want to understand it how it is implemented underneath the hoods of CQL.

Asking the experts on Cassandra, am I actually a misfit as a Cassandra user? Should I really forget about key value and just try to fit CQL (if possible) in my use case?

Amena answered 13/2, 2015 at 7:46 Comment(1)
Read the amazon dynamo paper.Cesarean
A
2

CQL is more than a "sugar", even though was initially created to encourage people in the migration from SQL world. The world before CQL was a mess, dozens of clients written in different ways all using Thrift protocol -- but unlike SQL world the Cassandra one is improving everyday, bringing new features in every release -- and very often each of these improvements would require a new "client version", capable of handling the new kind of results generated (think about counters or collections for instance) or the new syntax to use the new feature.

I'm glad I've had the possibility to go in production, for more than 3 years, with a Thrift client (Pelops) -- this helped me understanding a lot of the cassandra world, data structures and so on -- but now I'd never go back to such client (even though it was really great!).

At the beginning Cassandra was completely different in particular was/had

  • "schema-less" meaning that each row of a CF could contains a different number of columns and there was no place where these columns had to be declared. This brought many projects to disaster, the possibility to add new columns at "runtime" led to a situation where you didn't know what you could find in a table.

  • "super-columns" a deprecated data structure replaced by wide-rows

Now that the data model is stable CQL syntax brings more readability and you can now migrate to any project you're not so familiar with the possibility to understand how the application talk to the DB thanks to a unique syntax -- more -- every new Cassandra release is immediately followed by the new version of the client.

CQL is not a "subset" of SQL, like many people write: in some way it's a "superset" because it is capable of handling different data structures extending the base language.

My answer is: think in key-value way but use ONLY CQL

HTH, Carlo

Assured answered 13/2, 2015 at 9:49 Comment(2)
Thanks for your response. I evaluated Pelops as well and Hector, and then again I 'had' to use the datastax driver as well. So was really getting frustrated not being able to fulfill my all requirements using a single library. For our use case, we need both schema-full and schema-less models. I was under the impression that some library should allow me to do both, or rather whatever I want! (after all the storage engine underneath being the same). Something like, I would want to create a column family from CQLSH using composite columns and query it from application using a slicepredicate api.Amena
You possibly can get what you need with an old version of Cassandra (1.2? 1.1?) -- You can achieve something very similar to a schema-less model using collections, as I explained in this post: #25098951 -- But I would not recommend an extensive usage of this patternAssured

© 2022 - 2024 — McMap. All rights reserved.