Is Cassandra a column oriented or columnar database

Asked 22/8, 2014 at 7:40 Answered 5/6, 2023 at 19:36

Solved cassandra nosql column-oriented wide-column-store

Columnar database should store group of columns together. But Cassandra stores data row-wise. SS Table will hold multiple rows of data mapped to their corresponding partition key. So I feel like Cassandra is a row wise data store like MySQL but has other benefits like "wide rows" and every columns are not necessarily to be present for all the rows and of course it's in memory . Please correct me if I'm wrong.

Astraddle answered 22/8, 2014 at 7:40 Comment(0)

If you go to the Apache Cassandra project on GitHub, and scroll down to the "Executive Summary," you will get your answer:

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns.

"So I feel like Cassandra is a row wise data store"

And that would be correct.

Tenuous answered 22/8, 2014 at 12:47 Comment(3)

Then Cassandra cant be directly compared to HBASE which is eventually a columnar database. Perhaps data in HBASE gets stored as a region which is nothing but a list of rows of a column family.(vertical split based on column family) – Astraddle 25/8, 2014 at 8:38

Why does aws mentions cassandra as column store then aws.amazon.com/nosql/columnar/…. ?? – Sloganeer 10/7, 2020 at 6:14

@RajSaraogi Good find! I cannot speak to the intent behind Amazon's documentation in this case. I can tell you that categorizing Cassandra in this way is a common misconception. The fact that their link to the "Apache Cassandra on AWS Whitepaper" (d0.awsstatic.com/whitepapers/AWS_Cassandra_Whitepaper.pdf) states that Cassandra is not a columnar database adds even more confusion. I can only conclude that their documentation is not terribly accurate. – Tenuous 10/7, 2020 at 13:17

In a Column oriented or a columnar database data are stored on disk in a column wise manner.

e.g: Table Bonuses table

   ID         Last    First   Bonus
   1          Doe     John    8000
   2          Smith   Jane    4000
   3          Beck    Sam     1000

In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this:
1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
Cassandra is basically a column-family store
Cassandra would store the above data as:

Bonuses: { row1: { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000}, row2: { "ID":2, "Last":"Smith", "Jane":"John", "Bonus":4000} ... }
Vertica, VectorWise, MonetDB are some column oriented databases that I've heard of.
Read this for more details.

Hope this helps.

Tepic answered 5/8, 2016 at 16:31 Comment(1)

Is column-family same as wide-column? In the example, I suppose Bonuses would be a table in Cassandra with row1, row2 forming partition/primary keys. While ID, Last`, etc. are columns, are they wide-columns i.e. could more and different columns be added within each of these columns? – Spastic 31/10, 2020 at 10:56

A good way of thinking about cassandra is as a map of maps, where the inner maps are sorted by key. A partition has many columns, and they are always stored together. They are sorted by clustering keys - first by the first key, then the next, then next...and so on. Partitions are then replicated amongst replicas. It's not necessarily stored as "rows" as different rows are stored on different nodes based on replication strategy and active hashing algorithm. In other words, a partition for ProductId 1 is likely not stored next to ProductId 2 if ProductId is the partition key. However the coloumns for Product Id 1, are always stored together.

As for definitions, most NoSQL stores are blurring the lines one way or the other. They usually span multiple categories. I'll leave it up to you to decide whether this qualifies as a columnar database or not :)

Lambart answered 22/8, 2014 at 8:57 Comment(0)

It is a wide column database and is also known as column family databases. The definition from Wikipedia also helps further:

Wide-column stores such as Bigtable and Apache Cassandra are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. In genuine column stores, a columnar data layout is adopted such that each column is stored separately on disk. Wide-column stores do often support the notion of column families that are stored separately. However, each such column family typically contains multiple columns that are used together, similar to traditional relational database tables. Within a given column family, all data is stored in a row-by-row fashion, such that the columns for a given row are stored together, rather than each column being stored separately. Wide-column stores that support column families are also known as column family databases.

Reference: https://en.wikipedia.org/wiki/Wide-column_store

Dorolisa answered 2/1, 2021 at 6:4 Comment(0)

Short Answer:

Cassandra has a concept of column family, but it's NOT column-oriented.

Long Answer:

Quoting part of the best book I've ever read: Designing Data-Intensive Application by Martin Kleppmann:

Cassandra and HBase have a concept of column families, which they inherited from Bigtable. However, it is very misleading to call them column-oriented: within each column family, they store all columns from a row together, along with a row key, and they do not use column compression. Thus, the Bigtable model is still mostly row-oriented.

Mariannmarianna answered 5/6, 2023 at 19:36 Comment(0)

Short Answer:

Long Answer:

Recommended topics

Hot tags