Scan with filter using HBase shell
Asked Answered
V

7

44

Does anybody know how to scan records based on some scan filter i.e.:

column:something = "somevalue"

Something like this, but from HBase shell?

Vidette answered 31/8, 2011 at 11:10 Comment(0)
T
50

Try this. It's kind of ugly, but it works for me.

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
scan 't1', { COLUMNS => 'family:qualifier', FILTER =>
    SingleColumnValueFilter.new
        (Bytes.toBytes('family'),
         Bytes.toBytes('qualifier'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('somevalue'))
}

The HBase shell will include whatever you have in ~/.irbrc, so you can put something like this in there (I'm no Ruby expert, improvements are welcome):

# imports like above
def scan_substr(table,family,qualifier,substr,*cols)
    scan table, { COLUMNS => cols, FILTER =>
        SingleColumnValueFilter.new
            (Bytes.toBytes(family), Bytes.toBytes(qualifier),
             CompareFilter::CompareOp.valueOf('EQUAL'),
             SubstringComparator.new(substr)) }
end

and then you can just say in the shell:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'
Televise answered 16/9, 2011 at 16:7 Comment(2)
This is indeed super ugly. Thanks though, couldn't find any examples of this in the HBase docs/book/oreilly book.Proscription
Hello, I saw you mention ruby and I had no idea what was going on, then I looked it up and found that HBase accepts some ruby scripts? Is that so?Diva
L
29
scan 'test', {COLUMNS => ['F'],FILTER => \ 
"(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \
(SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

More information can be found here. Note that multiple examples reside in the attached Filter Language.docx file.

License answered 28/6, 2012 at 2:13 Comment(2)
I think that this Filter parsing language is only working in later versions of Hbase - in 0.90.6 (cdh 3u6) I couldn't get any variation of this to work.Saltsman
I think it is very useful to look at the javadoc; here's the javadoc for 0.94: hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/…Batch
I
8

Use the FILTER param of scan, as shown in the usage help:

hbase(main):002:0> scan

ERROR: wrong number of arguments (0 for 1)

Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS. If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

Some examples:

  hbase> scan '.META.'
  hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}

For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
Intort answered 31/8, 2011 at 21:3 Comment(0)
D
5
Scan scan = new Scan();
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions
or MUST_PASS_ONE condition.

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
                   Bytes.toBytes("SOME COLUMN FAMILY" ),
                   Bytes.toBytes("SOME COLUMN NAME"),
                   CompareOp.EQUAL,
                   Bytes.toBytes("SOME VALUE"));

filter_by_name.setFilterIfMissing(true);  
//if you don't want the rows that have the column missing.
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name);


scan.setFilter(list);
Define answered 18/2, 2014 at 7:3 Comment(1)
This code is in Java, the question is asking about the HBase shell.Intort
A
5

One of the filter is Valuefilter which can be used to filter all column values.

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

binary is one of the comparators used within the filter. You can use different comparators within the filter based on what you want to do.

You can refer following url: http://www.hadooptpoint.com/filters-in-hbase-shell/. It provides good examples on how to use different filters in HBase Shell.

Aberdeen answered 12/2, 2016 at 21:17 Comment(2)
Link only answers are not good questions. Post some code and explain it to help.Siobhansion
The link does not work anymore. Take you to a spam-siteBeaumont
S
0

Add setFilterIfMissing(true) at the end of query

hbase(main):009:0> import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
 import org.apache.hadoop.hbase.filter.BinaryComparator;
 import org.apache.hadoop.hbase.filter.CompareFilter;
 import org.apache.hadoop.hbase.filter. Filter;

 scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'),
      Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'),
      BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}
Samuelson answered 15/10, 2018 at 6:58 Comment(0)
R
0

You can use the below command in hbase shell to filter the column data:

scan 'tableName',{ COLUMNS => 'columnName', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:customer' )" }
Resemblance answered 9/8, 2023 at 5:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.