I used OpenTSDB over HBase (pseudo-distributed Hadoop on virtual box) to send data at very high load (~ 50,000 records / s). The system worked properly for a while but it went down suddenly. I terminated OpenTSDB and HBase. Unfortunately, I could never bring them up again. Every time I tried to run HBase and OpenTSDB, they showed error logs. Here I list the logs:
regionserver:
2015-07-01 18:15:30,752 INFO [sync.3] wal.FSHLog: Slow sync cost: 112 ms, current pipeline: [192.168.56.101:50010]
2015-07-01 18:15:41,277 INFO [regionserver/node1.vmcluster/192.168.56.101:16201.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/node1.vmcluster,16201,1435738612093/node1.vmcluster%2C16201%2C1435738612093.default.1435742101122 with entries=3841, filesize=123.61 MB; new WAL /hbase/WALs/node1.vmcluster,16201,1435738612093/node1.vmcluster%2C16201%2C1435738612093.default.1435742141109
2015-07-01 18:15:41,278 INFO [regionserver/node1.vmcluster/192.168.56.101:16201.logRoller] wal.FSHLog: Archiving hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16201,1435738612093/node1.vmcluster%2C16201%2C1435738612093.default.1435742061805 to hdfs://node1.vmcluster:9000/hbase/oldWALs/node1.vmcluster%2C16201%2C1435738612093.default.1435742061805
2015-07-01 18:15:42,249 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., current region memstore size 132.20 MB
2015-07-01 18:15:42,381 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., current region memstore size 133.09 MB
2015-07-01 18:15:42,382 WARN [MemStoreFlusher.1] regionserver.DefaultMemStore: Snapshot called again without clearing previous. Doing nothing. Another ongoing flush or did we fail last attempt?
2015-07-01 18:15:42,391 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16201,1435738612093: Replay of WAL required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72.
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:494)
at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955)
... 7 more
2015-07-01 18:15:42,398 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
Master:
this is the same to one on regionserver:
2015-07-01 18:15:42,596 ERROR [B.defaultRpcServer.handler=15,queue=0,port=16020] master.MasterRpcServices: Region server node1.vmcluster,16201,1435738612093 reported a fatal error:
ABORTING region server node1.vmcluster,16201,1435738612093: Replay of WAL required. Forcing server shutdown
Cause:
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72.
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:494)
at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955)
... 7 more
and this:
2015-07-01 18:17:16,971 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:21,976 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:26,979 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:31,983 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:36,985 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:41,992 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:45,283 WARN [CatalogJanitor-node1:16020] master.CatalogJanitor: Failed scan of catalog table
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=351, exceptions:
Wed Jul 01 18:17:45 KST 2015, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68275: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node1.vmcluster,16201,1435738612093, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:264)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:215)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:267)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:139)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:134)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:823)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:169)
at org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:121)
at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:222)
at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:103)
at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68275: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node1.vmcluster,16201,1435738612093, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:310)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:291)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupConnection(RpcClientImpl.java:403)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:709)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:880)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:849)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1173)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31889)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:188)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 6 more
2015-07-01 18:17:46,995 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:52,002 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:17:57,004 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:18:02,006 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-07-01 18:18:07,011 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
After that, when I restarted hbase, regionserver showed this error:
2015-07-02 09:17:49,151 INFO [RS_OPEN_REGION-node1:16201-0] regionserver.HRegion: Replaying edits from hdfs://node1.vmcluster:9000/hbase/data/default/tsdb/1a692e2668a2b4a71aaf2805f9b00a72/recovered.edits/0000000000000169657
2015-07-02 09:17:49,343 INFO [RS_OPEN_REGION-node1:16201-0] regionserver.HRegion: Started memstore flush for tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., current region memstore size 132.20 MB; wal is null, using passed sequenceid=169615
2015-07-02 09:17:49,428 ERROR [RS_OPEN_REGION-node1:16201-0] handler.OpenRegionHandler: Failed open of region=tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., starting to roll back the global memstore size.
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72.
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3694)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3499)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:889)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:769)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:742)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4921)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4887)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4858)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4814)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4765)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:356)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:126)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:490)
at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955)
... 16 more
2015-07-02 09:17:49,430 INFO [RS_OPEN_REGION-node1:16201-0] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 1a692e2668a2b4a71aaf2805f9b00a72, NAME => 'tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72.', STARTKEY => '', ENDKEY => '\x00\x15\x08M|kp\x00\x00\x01\x00\x00\x01'} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 1
2015-07-02 09:17:49,443 INFO [PriorityRpcServer.handler=9,queue=1,port=16201] regionserver.RSRpcServices: Open tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72.
2015-07-02 09:17:49,458 INFO [StoreOpener-1a692e2668a2b4a71aaf2805f9b00a72-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=3, currentSize=533360, freeSize=509808592, maxSize=510341952, heapSize=533360, minSize=484824864, minFactor=0.95, multiSize=242412432, multiFactor=0.5, singleSize=121206216, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2015-07-02 09:17:49,458 INFO [StoreOpener-1a692e2668a2b4a71aaf2805f9b00a72-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000
2015-07-02 09:17:49,519 INFO [RS_OPEN_REGION-node1:16201-2] regionserver.HRegion: Replaying edits from hdfs://node1.vmcluster:9000/hbase/data/default/tsdb/1a692e2668a2b4a71aaf2805f9b00a72/recovered.edits/0000000000000169567
Update
Today I did the test at very light traffic: 1000 records / sec for 2000 seconds, the problem came again.
export HBASE_HEAPSIZE=2000
. – Woosley