www.HadoopExam.com

HadoopExam Learning Resources

HBase HotSpot Detection and Resolution

The most common cause for hotspotting is inserting rows with monotonically increasing row keys.
In that case only the last region will get the writes and no amount of splitting will fix that (only one region serer will hold the last region of the table regardless of how small it is).
There are ways around this. If you generate keys make sure they are not monotonically increasing. For example if you do not care about the sort order of the keys w.r.t. to each other you could reverse the bytes before you use them as row key. Another option is to prefix the key with a hash of the key (but then you loose the ability to do range scan across keys).

If you still need to scan rows according to their sort order you can "salt" (as some call it) the key by prefix it with a limited number of random single digit (maybe 5-10 different numbers). Could also do a mod of the key. Each scan then has to issue multiple scans in parallel for each of the possible prefix numbers.
(In fact that is a pretty effective way to avoid hotspotting and to parallelize your scans, but it needs some client side to reconcile the parallel scans).

Another reason for hotspotting is inserting new versions a of small'ish set of row keys. In that case splitting might help, because it will increase the likelyhood of all those key falling into the same region.

You have no rights to post comments

You are here: Home Free Tutorial HBase Tutorial HBase HotSpot Detection and Resolution