Speeding up HBase read response
Asked Answered
M

3

6

I have 4 nodes HBase v0.90.4-cdh3u3 cluster deployed on Amazon XLarge instances (16Gb RAM, 4 cores CPU) with 8Gb heap -Xmx allocated for HRegion servers, 2Gb for datanodes. HMaster\ZK\Namenode is on the separate XLarge instance. Target dataset is 100 millions records (each record is 10 fields by 100 bytes). Benchmarking performed concurrently from parallel 100 threads.

I'm confused with a read latency I got, comparing to what YCSB team achieved and showed in their YCSB paper. They achieved throughput of up to 7000 ops/sec with a latency of 15 ms (page 10, read latency chart). I can't get throughput higher than 2000 ops/sec on 90% reads/10% writes workload. Writes are really fast with auto commit disabled (response within a few ms), while read latency doesn't go lower than 70 ms in average.

These are some HBase settings I used:

  • hbase.regionserver.handler.count=50
  • hfile.block.cache.size=0.4
  • hbase.hregion.max.filesize=1073741824
  • hbase.regionserver.codecs=lzo
  • hbase.hregion.memstore.mslab.enabled=true
  • hfile.min.blocksize.size=16384
  • hbase.hregion.memstore.block.multiplier=4
  • hbase.regionserver.global.memstore.upperLimit=0.35
  • hbase.zookeeper.property.maxClientCnxns=100

Which settings do you recommend to look at\tune to speed up reads with HBase?

Marketplace answered 6/4, 2012 at 7:33 Comment(0)
L
1

Upgrading to a newer stable version will help. Anything 0.92+ will have the newer HFile v2 which can really help.

  • 0.94 has been release and had a few point releases.
  • If you prefer a CDH build CDH 4.1 has a 0.92.1 based HBase.

Creating the table pre-split with bloom filters enabled can really help. I would try lowering the number of handlers a little bit. http://archive.cloudera.com/cdh4/cdh/4/hbase/book.html#perf.handlers

Read latency of 70ms is really far off of what I would expect. Look into gc tuning and make sure that all of your RegionServers are running and have regions for the table you are trying to benchmark.

Luxuriant answered 19/11, 2012 at 21:19 Comment(0)
S
0

This is not a direct answer. I suggest you to setup Ganglia to monitor the performance of HBase. You can follow the instruction at here and here.

Once you have the metrics you might be able to identify the bottleneck of the system and perform some tuning against it.

Sissie answered 25/2, 2013 at 16:23 Comment(0)
F
0

It is very hard to benchmark HBase corectly. You should give also some information about the queries that you are using.

For example, in HBase, a scan query with a RowFilter and a QualifierPrefixFilter may be very slow even if you are retreving only one row (specified in the RowFilter).

However, the same query, done with a get instead of a scan and a QualifierPrefixFilter is much faster.

Frodeen answered 9/6, 2015 at 5:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.