HDInsight: HBase or Azure Table Storage?
Asked Answered
B

2

6

Currently my team is creating a solution that would use HDInsight. We will be getting 5TB of data daily and will need to do some map/reduce jobs on this data. Would there be any performance/cost difference if our data will be stored in Azure Table Storage instead of Azure HBase?

Bigler answered 28/10, 2014 at 12:15 Comment(0)
T
8

The main differences will be in both functionality and cost.

Azure Table Storage doesn't have a map reduce engine attached to it in itself, though of course you could use the map reduce approach to write your own.

You can use Azure HDInsight to connect Map Reduce to table storage. There are a couple of connectors around, including one written by me which is hive focused and requires some configuration, and may not suit your partition scheme (http://www.simonellistonball.com/technology/hadoop-hive-inputformat-azure-tables/) and a less performance focused, but more complete version from someone at Microsoft (http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx).

The main advantage of Table Storage is that you aren't constantly taking processing cost.

If you use HBase, you will need to run a full cluster all the time, so there is a cost disadvantage, however, you will get some functionality and performance gains, plus you will have something a bit more portable, should you wish to use other hadoop platforms. You would also have access to a much greater range of analytic functionality with the HBase option.

Tommi answered 28/10, 2014 at 22:56 Comment(0)
F
3

HDInsight (HBase/Hadoop) uses Azure Blob storage not ATS. For your data-storage you will charged only applicable blob storage cost, based on your subscription.

P.S. Don't forget to delete your cluster once job has completed, to avoid charges. Your data will persist in BLOB storage and can be used by next cluster you build.

Fovea answered 28/10, 2014 at 12:54 Comment(1)
Blob Storage is the main storage mechanism, but it is certainly possible to write a StorageHandler to allow HDInsight clusters to process Table Storage data.Tommi

© 2022 - 2024 — McMap. All rights reserved.