I have 4 nodes HBase v0.90.4-cdh3u3 cluster deployed on Amazon XLarge instances (16Gb RAM, 4 cores CPU) with 8Gb heap -Xmx allocated for HRegion servers, 2Gb for datanodes. HMaster\ZK\Namenode is on the separate XLarge instance. Target dataset is 100 millions records (each record is 10 fields by 100 bytes). Benchmarking performed concurrently from parallel 100 threads.
I'm confused with a read latency I got, comparing to what YCSB team achieved and showed in their YCSB paper. They achieved throughput of up to 7000 ops/sec with a latency of 15 ms (page 10, read latency chart). I can't get throughput higher than 2000 ops/sec on 90% reads/10% writes workload. Writes are really fast with auto commit disabled (response within a few ms), while read latency doesn't go lower than 70 ms in average.
These are some HBase settings I used:
- hbase.regionserver.handler.count=50
- hfile.block.cache.size=0.4
- hbase.hregion.max.filesize=1073741824
- hbase.regionserver.codecs=lzo
- hbase.hregion.memstore.mslab.enabled=true
- hfile.min.blocksize.size=16384
- hbase.hregion.memstore.block.multiplier=4
- hbase.regionserver.global.memstore.upperLimit=0.35
- hbase.zookeeper.property.maxClientCnxns=100
Which settings do you recommend to look at\tune to speed up reads with HBase?