Should I use prefixfilter or rowkey range scan in HBase
Asked Answered
P

2

10

I don't know why it's very slow if I use prefixfilter to query. Can someone explain which is the best way to query HBase, thanks.

hbase(main):002:0> scan 'userlib',{FILTER=>org.apache.hadoop.hbase.filter.PrefixFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('0000115831F8'))}
ROW               COLUMN+CELL                                                                                                                                
0000115831F8001   column=track:aid, timestamp=1339121507633, value=aaa                                                                                       
1 row(s) in 41.0700 seconds

hbase(main):002:0> scan 'userlib',{STARTROW=>'0000115831F8',ENDROW=>'0000115831F9'}                                                                                        
ROW               COLUMN+CELL                                                                                                                                
0000115831F8001   column=track:aid, timestamp=1339121507633, value=aaa                                                                                       
1 row(s) in 0.1100 seconds
Phraseology answered 8/6, 2012 at 3:14 Comment(1)
The problem with this approach is that when the last character is the maximum Byte value, thus you can't increase it by 1. If you place 0 and then increase by 1 the next byte, you instruct HBase to be inclusive of this end key which is not the desired outcomeImpossibility
R
25

HBase filters - even row filters - are really slow, since in most cases these do a complete table scan, and then filter on those results. Have a look at this discussion: http://grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters

Row key range scans however, are indeed much faster - they do the equivalent of a filtered table scan. This is because the row keys are stored in sorted order (this is one of the basic guarantees of HBase, which is a BigTable-like solution), so the range scans on row keys are very fast. More explanation here: http://www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-be-achieved-through-a-programming-language-such-as-Python-PHP-or-JSP

[UPDATE 1] turns out that PrefixFilter does do a full table scan until it passes the prefix used in the filter (if it finds it). The recommendation for fast performance using a PrefixFilter seems to be to specify a start_row parameter in addition to the PrefixFilter. See related 2013 discussion on the hbase-user mailing list.

[UPDATE 2, from @aaa90210] In regards to above update, there is now an efficient row prefix filter that is much faster than PrefixFilter, see this answer: https://mcmap.net/q/433362/-hbase-easy-how-to-perform-range-prefix-scan-in-hbase-shell

Ramification answered 8/6, 2012 at 19:31 Comment(4)
The link doesn't really add much value hereImpossibility
Your UPDATE is no longer accurate. There is a ROWPREFIXFILTER that is not the same as PrefixFilter, see this answer https://mcmap.net/q/433362/-hbase-easy-how-to-perform-range-prefix-scan-in-hbase-shell.Nakitanalani
Thanks, updated the answer and credited you for the update.Ramification
The link for the first update is broken. Would you mind retrieving it for us?Routinize
J
0

DATE: turns out that PrefixFilter does do a full table scan until it passes the prefix used in the filter (if it finds it). The recommendation for fast performance using a PrefixFilter seems to be to specify a start_row parameter in addition to the PrefixF

Jeu answered 2/6, 2016 at 14:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.