What's the difference between creating a table and creating a columnfamily in Cassandra?
Asked Answered
A

6

40

I need details from both performance and query aspects, I learnt from some site that only a key can be given when using a columnfamily, if so what would you suggest for my keyspace, I need to use group by, order by, count, sum, ifnull, concat, joins, and some times nested queries.

Acetum answered 16/9, 2013 at 9:17 Comment(0)
M
59

To answer the original question you posed: a column family and a table are the same thing.

  • The name "column family" was used in the older Thrift API.
  • The name "table" is used in the newer CQL API.

More info on the APIs can be found here: http://wiki.apache.org/cassandra/API

If you need to use "group by,order by,count,sum,ifnull,concat ,joins and some times nested querys" as you state then you probably don't want to use Cassandra, since it doesn't support most of those.

CQL supports COUNT, but only up to 10000. It supports ORDER BY, but only on clustering keys. The other things you mention are not supported at all.

Micra answered 16/9, 2013 at 17:31 Comment(5)
It's not strictly true that count is supported only up to 10,000. It works up to the query limit (which is 10,000 by default, but can be explicitly defined). That being said, you probably shouldn't use it for performance reasons.Corinthians
Hi,I refered this link maxgrinev.com/2010/07/12/… ,but group by is getting error for me in cqlsh>select count(*) from event_log group by date;I learned that inserting a data in cassandra is much more fasted then mysql is it so?Acetum
That is because group by is not valid CQL. You cannot just run random SQL statements and expect them to work.Corinthians
@Corinthians after a long way of understanding the Cassandra model,we have finalized to use Elastic Search(lucene) as an secondary storage level for all my aggregate functions,group by and order by function.still nested query not support much in ES its ok to have in my production.Acetum
Broken links: Thrift API, CQL API, wiki.apache.org/cassandra/API . Some possible new ones: CQL Syntax, Drivers APISpires
D
5

Refer the document: https://cassandra.apache.org/doc/old/CQL-3.0.html

It specifies that the LRM of the CQL supports TABLE keyword wherever COLUMNFAMILY is supported.

This is a proof that TABLE and COLUMNFAMILY are synonyms.

Dorothadorothea answered 16/2, 2017 at 21:21 Comment(0)
A
5

In cassandra there is no difference between table and columnfamily. they are one concept.

Asquint answered 6/3, 2018 at 8:21 Comment(0)
C
2

For Cassandra 3+ and cqlsh 5.0.1

To verify, enter into a cqlsh prompt within keyspace (ksp):

CREATE COLUMNFAMILY myTable (
     ...  id text,
     ...  name int
);

And type 'desc myTable'.
You'll see:

CREATE TABLE ksp.myTable (
      ...  id text,
      ...  name int
);

They are synonyms, and Cassandra uses table by default.

Crystallography answered 28/2, 2020 at 16:3 Comment(0)
S
2

here small example to understands concept. A keyspace is an object that holds the column families, user defined types.

Create keyspace University with replication={'class':SimpleStrategy, 'replication_factor': 3};

create table University.student(roll int Primary KEY, dept text, name text, semester int)

'Create table', table 'Student' will be created in the keyspace 'University' with columns RollNo, Name and dept. RollNo is the primary key. RollNo is also a partition key. All the data will be in the single partition.

Key aspects while altering Keyspace in Cassandra

Keyspace Name: Keyspace name cannot be altered in Cassandra.

Strategy Name: Strategy name can be altered by specifying new strategy name.

Replication Factor: Replication factor can be altered by specifying new replication factor. DURABLE_WRITES :DURABLE_WRITES value can be altered by specifying its value true/false. By default, it is true. If set to false, no updates will be written to the commit log and vice versa.

Execution: Here is the snapshot of the executed command "Alter Keyspace" that alters the keyspace strategy from 'SimpleStrategy' to 'NetworkTopologyStrategy' and replication factor from 3 to 1 for DataCenter1.

Stillas answered 15/2, 2021 at 9:38 Comment(0)
T
-2

Column family are somewhat related to relational database's table, with a distribution differences and maybe even idealistic character.

Imaging you have a user entity that might contain 15 column, in a relational db you might want to divide the columns into small-related-column-based struct that we all know as Table. In distributed db such as Cassandra you'll be able to concatenate all those tables entry into a single long row, so if you'll use profiler/ db manager you'll see a single table with 15 columns instead of 2/3 tables. Another interesting thing is that every column family is written to different nodes, maybe on different cluster and be recognized by the row key, meaning that you'll have a single key to all the columns family and won't need to maintain a PK or FK for every table and maintain the relationships between them with 1-1, 1-n, n-n relations. Easy!

Tumor answered 16/2, 2015 at 9:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.