Where does Hbase store data?
Asked Answered
P

1

7

I'm new to HBase. Currently I'm using hortonworks sandbox hdp2. While studying Hbase, I came across some questions.

  1. Where does hbase stores data?

  2. If it stores on HDFS, then how it perform update operation, as hdfs is write once & read many times

Pacifier answered 24/8, 2015 at 6:20 Comment(0)
S
20

By default Hbase stores the data in HDFS. It is possible to run HBase over other distributed file systems like Amazon s3, GFS etc. We can't edit hdfs, but we can append data to HDFS. HDFS supports append feature.

HBase uses HFile as the format to store the tables on HDFS. HFile stores the keys in a lexicographic order using row keys. It's a block indexed file format for storing key-value pairs. Block indexed means that the data is stored in a sequence of blocks and a separate index is maintained at the end of the file to locate the blocks. When a read request comes, the index is searched for the block location. Then the data is read from that block.

Regionserver maintains the inmemory copy of the table updates in memcache. In-memory copy is flushed to the disc periodically. Updates to HBase table is stored in HLog files which stores redo records. In case of region recovery, these logs are applied to the last commited HFile and reconstruct the in-memory image of the table. After reconstructing the in-memory copy is flushed to the disc so that the disc copy is latest.

Hbase keep the versions of your updates. The earlier version will be preserved along with the latest version. By default the number of preserved versions are 3. It is a new copy that is getting saved when you perform an update.

Shool answered 24/8, 2015 at 7:13 Comment(1)
That's great, Thanks. you mean it creates new copy of data which we called version right?, but when it crosses the version limit, then it override the latest version value. So how we can say that it is appending value .Pacifier

© 2022 - 2024 — McMap. All rights reserved.