I'm new to HBase. Currently I'm using hortonworks sandbox hdp2. While studying Hbase, I came across some questions.
Where does hbase stores data?
If it stores on HDFS, then how it perform update operation, as hdfs is write once & read many times
I'm new to HBase. Currently I'm using hortonworks sandbox hdp2. While studying Hbase, I came across some questions.
Where does hbase stores data?
If it stores on HDFS, then how it perform update operation, as hdfs is write once & read many times
By default Hbase stores the data in HDFS. It is possible to run HBase over other distributed file systems like Amazon s3, GFS etc. We can't edit hdfs, but we can append data to HDFS. HDFS supports append feature.
HBase uses HFile as the format to store the tables on HDFS. HFile stores the keys in a lexicographic order using row keys. It's a block indexed file format for storing key-value pairs. Block indexed means that the data is stored in a sequence of blocks and a separate index is maintained at the end of the file to locate the blocks. When a read request comes, the index is searched for the block location. Then the data is read from that block.
Regionserver maintains the inmemory copy of the table updates in memcache. In-memory copy is flushed to the disc periodically. Updates to HBase table is stored in HLog files which stores redo records. In case of region recovery, these logs are applied to the last commited HFile and reconstruct the in-memory image of the table. After reconstructing the in-memory copy is flushed to the disc so that the disc copy is latest.
Hbase keep the versions of your updates. The earlier version will be preserved along with the latest version. By default the number of preserved versions are 3. It is a new copy that is getting saved when you perform an update.
© 2022 - 2024 — McMap. All rights reserved.