How to apply several QualifierFilter to a row in HBase
Asked Answered
B

2

7

we would like to filter a scan on a HBase table with two QualifierFilters. Means we would like to only get the rows of the table which do have a certain column 'col_A' AND (!) a certain other column 'col_B'.

Our current approach looks like this:

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
Filter filter1 = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("col_A".getBytes()));
filterList.addFilter(filter1);
Filter filter2 = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("col_B".getBytes()));
filterList.addFilter(filter2);

Scan scan = new Scan();
scan.setFilter(filterList);
... 

The ResultScanner does not return any results from this scan although there are several rows in the HBase table which do have both columns 'col_A' and 'col_B'.

If we only apply filter1 to the scan everything works fine an we do get all the rows which have 'col_A'. If we only apply filter2 to the scan it is the same. We do get all rows which have 'col_B'.

Only if we combine these two filters we do not get any results.

What would be the right way to get only the rows from the table which do have col_A AND col_B?

Badenpowell answered 14/11, 2012 at 13:2 Comment(0)
U
3

You can achieve this by defining the following filters:

List<Filter> filters = new ArrayList<Filter>(2);
byte[] colfam = Bytes.toBytes("c");
byte[] fakeValue = Bytes.toBytes("DOESNOTEXIST");
byte[] colA = Bytes.toBytes("col_A");
byte[] colB = Bytes.toBytes("col_B");

SingleColumnValueFilter filter1 = 
    new SingleColumnValueFilter(colfam, colA , CompareOp.NOT_EQUAL, fakeValue);  
filter1.setFilterIfMissing(true);
filters.add(filter1);

SingleColumnValueFilter filter2 = 
    new SingleColumnValueFilter(colfam, colB, CompareOp.NOT_EQUAL, fakeValue);          
filter2.setFilterIfMissing(true);
filters.add(filter2);

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);
Scan scan = new Scan();
scan.setFilter(filterList);

The idea here is to define one SingleColumnValueFilter per column you are looking for, each with a fake value and a CompareOp.NOT_EQUAL operator. I.e: such a SingleColumnValueFilter will return all columns for a given name.

Source: http://mapredit.blogspot.com/2012/05/using-filters-in-hbase-to-match-two.html

Undecagon answered 18/11, 2012 at 21:59 Comment(3)
Thanks for the answer. Just tried it and it works for our case. But the question is about the performance. I assume that the filters are evaluated in the order as they are put into the FilterList. So if I do have many rows where col_A exists HBase has to check against the actual values in this column. That sounds quite expensive. Is there any way to first evaluate the existence of both columns before actual cell values are checked?Badenpowell
@Badenpowell I don't know how much data you have but I'm afraid you are right. Another option would be to implement a custom filter which takes the qualifier list you are looking forUndecagon
QualifierFilter filters the unwanted columns but your solution does not.Chatoyant
R
3

I think this line is the issue -

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

You want it to be -

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);

The filter will try to find a column that has both the column qualifier and there is no such column

Roundelay answered 17/3, 2014 at 21:37 Comment(1)
Welcome to StackOverflow! Your answers will likely be most appreciated when (a) the question has not already been answered to the original poster's satisfaction, or (b) you have an alternate solution to offer to the problem. Also, please check the help link when composing your answer to learn more about how to format your answers for maximum readability.Realize

© 2022 - 2024 — McMap. All rights reserved.