First of all I am relatively new to Big Data and the Hadoop world and I have just started to experiment a little with the Hortonworks Sandbox (Pig and Hive so far).
I was wondering in which cases could I use the above mentioned tools of Hadoop, Hive, Pig, HBase and Cassandra?
In my sandbox environment with a file of just 9MB Hive and Pig had response times of seconds to minutes. This is obviously not usable in some situations for example web applications (unless it is something else such as my virtual machine setup).
My guesses about the correct usages are:
- Hadoop: Just the technological base for the rest, only very few use-cases where it would be used directly
- Hive or Pig: For analytical processes that run once per hour or day
- HBase or Cassandra: for real-time applications (e.g. web applications) where response times with 100ms or less are required
Additionally, when to use HBase as opposed to when to use Cassandra?
Thanks!