I recently started trying out some noSQL prototypes for a customer. They got a real-time application which does lots of inserts, but less reads(Currently they are using MySql and would like to try out some noSQL solutions)
Over the weekend I tried Cassandra 2.0, MongoDB 2.4.9 and Redis to be compared to a normal Mysql 5.5 DB. All are running in my Windows i3 core 2.30 Ghz/8GB RAM laptop, so no high-end fancy machines.
The table structure is a simple one as below. Though it's the MySql DESC, Cassandra has the same structure, and in MongoDb it's stored as a JSON/BSON but got the same structure and indexes. It has got two indexes (oneway_id & twoway_id) for all the three db's.
Structure (For all four db's)
+--------------+---------------------+
| Field | Type |
+--------------+---------------------+
| tmstamp | bigint(20) unsigned |
| field_1 | bigint(20) unsigned |
| field_2 | varchar(64) |
| field_3 | varchar(64) |
| field_4 | tinyint(3) unsigned |
| field_5 | bigint(20) unsigned |
| field_6 | varchar(25) |
| field_7 | varchar(15) |
| field_8 | varchar(15) |
| field_9 | varchar(15) |
+--------------+---------------------+
DB/Environment details
- MySql 5.6(64 bit) with mysql java connector 5.1.28
- Apache Cassandra 2.0 with datastax 2.0 Java drivers
- MongoDB 2.4.6 with mongo Java driver 2.12.0
- Redis 2.8.17 running on a linux machine
- Oracle Java 1.6(64 bit)
- Microsoft Windows 7(64 bit)
- Intel i3 core 2.30 Ghz processor
- 8GB RAM
Created a simple java test cases and these are the results I got (Though not consistently the same numbers but latencies are pretty much the same way):
100,000 Records
- MySql 1000,000 - 46 secs
- Cassandra - 54 secs
- MongoDb - 2 secs
500,000 Records
- MySql 1000,000 - 142 secs
- Cassandra - 299 secs
- MongoDb - 41 secs
1,000,000 Records
- MySql 1000,000 - 349 secs
- Cassandra - 699 secs
- MongoDb - 51 secs
- Redis - 34 secs
My question is why does Cassandra takes this long for such a small and simple table inserts?
In Cassandra I tried both inline looped sql inserts & Batch inserts. The funny thing is batch inserts took more time. The document I followed for batch inserts is:
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
I don't want to use asyncExecute, because it doesn't gives me the exact insert time.
Batch insert which I used is something like this(Which takes ages more than normal insert)
PreparedStatement ps = session.prepare("INSERT INTO some_table (val_1, val_2, val_3, val_4) VALUES (?, ?, ?, ?)");
BatchStatement batch = new BatchStatement();
//for loop start
batch.add(ps.bind(uid, mid1, title1, body1));
//for loop end
session.execute(batch);
inline loop I used insert is something like this
String sqlInsert = "INSERT INTO some_table (val_1, val_2, val_3, val_4) VALUES (";
// for loop start
sqlInsert += uid+", "+", "+mid1+", "+title1+", "+body1+")";
session.execute(sqlInsert);
// for loop end
Now why is Cassandara slower than mysql and more important - why is MongoDB much much faster than Cassandra? I seriously wish I am doing something wrong?
Is there a way I can insert JSON/BSON objects directly to Cassandra like MongoDB does? I guess that might make it fast? Can some experts please help me on this? If there are no answers I'll conclude that MongoDB is better than Cassandra!