HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Example : Pre-split regions to avoid region hotspots

It’s well known that as your HBase table size grows it should be created with pre-split regions in order to avoid region hotspots. If certain region servers get hammered by very intensive write/read operations, HBase may drop that region server because the Zookeeper connection will timeout and  “YouAreDeadException” will be triggered. A better practice is to create a fixed number of regions and evenly distribute those regions across all the region servers, by estimating how big the table will be and knowing the number of region servers you have. Of course, you also have to make sure your row keys are well distributed across all the regions.

Let’s look at an example.  Say you have 16 region servers and your table size will be 1 TB.  You can set the maximum file size of each region to be 4GB (hbase.hregion.max.filesize = 4294967296) which will mean each of the 16 region servers will have 256 regions.Here is an example, using HBase shell, of creating 16 pre-split regions with your row keys designed as 32-byte hex strings.

create ‘users, ‘usercf’, {SPLITS=›


To see whether all the regions are balanced, log into one of your HBase servers, region server or master and run this command.“sudo su hdfs; hadoop fs -du hdfs://hdfs_address:8020/hdfs_directory/users/”

This is an example of ”users” balanced across 16 regions, the first column is the size of the each region in bytes, and the second column is the region location.

238875729   hdfs://hadoop-m0/hbase/users/011beb7e35c58aa7735866dea41081ae

238216443   hdfs://hadoop-m0/hbase/users/078b146bfa3b717c05ad1615700862ce 

238181485   hdfs://hadoop-m0/hbase/users/210a2a5cc9ddbb8d92f7134e185b7a49 

237511986   hdfs://hadoop-m0/hbase/users/34f31a8fe15a83ee3b4b06f45231a728 

237598160   hdfs://hadoop-m0/hbase/users/4cf8a1b47d9d03e3e48cd2e3b04f8063 

238185290   hdfs://hadoop-m0/hbase/users/4f91d63b31c1c7750cc895a539c52ede 

237348231   hdfs://hadoop-m0/hbase/users/4f9878ad149f36937de0b473cb47c6d6 

238205770   hdfs://hadoop-m0/hbase/users/5db80e6cea495159e568bebc3a291062 

238184173   hdfs://hadoop-m0/hbase/users/634b4ed1fd9adc12a58326a4c185c113 

238826313   hdfs://hadoop-m0/hbase/users/69863fd689997d775bbdadc3014f6eca 

237634448   hdfs://hadoop-m0/hbase/users/6f53f22d53d8363b87cd394d3b413bfd 

239878227   hdfs://hadoop-m0/hbase/users/7ea4016b64f59fbf36a40101401a27ee 

237730386   hdfs://hadoop-m0/hbase/users/8ff37fca64b53cc07832d3473ce0ab4c 

238726731   hdfs://hadoop-m0/hbase/users/aa9f412c3e3439aeade3f881d4803cee 

238493854   hdfs://hadoop-m0/hbase/users/d00a056921dee34e34809123efd2035c 

237435654   hdfs://hadoop-m0/hbase/users/e6123060147325f591270b99b64df543

One thing to keep in mind, if you are intensively writing to HBase you might want to increase the number of hbase.regionserver.maxlogs.  The downside of doing this is if the cluster goes down or needs to be restarted for some reason, it will take longer to get it back up again.  Also adjusting hbase.regionserver.global.memstore.upperLimit, hregion.max.filesize, hbase.hregion.memstore.flush.size can be helpful.

Add comment

Security code

You are here: Home Question & Answer Hadoop Questions Example : Pre-split regions to avoid region hotspots