How to filter out rows with given column(not null)?
Asked Answered
L

3

6

I want to do a hbase scan with filters. For example, my table has column family A,B,C, and A has a column X. Some rows have the column X and some do not. How can I implement the filter to filter out all the rows with column X?

Literature answered 12/10, 2012 at 12:23 Comment(0)
D
12

I guess you are looking for SingleColumnValueFilter in HBase. As mentioned in the API

To prevent the entire row from being emitted if the column is not found on a row, use setFilterIfMissing(boolean) on Filter object. Otherwise, if the column is found, the entire row will be emitted only if the value passes. If the value fails, the row will be filtered out.

But SingleColumnValueFilter would want a value to have Column X "CompareOp" to something, say bring this row if ColumnX == "X" or bring this row if ColumnX != "A sentinel value that ColumnX can never take" and setFilterIfMissing(true) so that if ColumnX has some value, it is returned.

I hope this nudges you in the right direction.

Desegregate answered 12/10, 2012 at 18:39 Comment(0)
H
1

You can use a SkipFilter along with ColumnPrefixFilter. The ColumnPrefixFilter gets keys where the column exists (an HBase row will only have a column if it has a value) the Skip filter will give you the "Not" on the first filter so the row will be omitted

Hiltner answered 13/10, 2012 at 16:0 Comment(1)
Note, this will only pass on rows where all the columns in the row pass the prefix filterObstreperous
W
0

Ankit Arnon user1573269

The only way I could get it work, is like below

So - I have a table with columns rule1, rule2 , rule3 and so on. Rows can have only rule1 column, or rule1 and rule2, or rule1 and rule2 and rule3 and so on. Say - I want to extract rows which have ONLY rule1 in them. Now this means, I will have to skip rows which have rule2 in them.

Scan getRules = new Scan();
    ColumnPrefixFilter rule1Filter = new ColumnPrefixFilter(Bytes.toBytes("rule1"));
    SingleColumnValueFilter skipRule2Value = new      SingleColumnValueFilter(Bytes.toBytes("rules"),Bytes.toBytes("rule2"),
    CompareOp.EQUAL,Bytes.toBytes("0"));
    SkipFilter skipRule2 = new SkipFilter(skipRule2Value);
    getRules.setFilter(rule1Filter);
    getRules.setFilter(skipRule2);
    ResultScanner scanner = htable.getScanner(getRules);

Though this worked, I am not very happy with the solution. Its takes time for hbase to figure out. I would have thought there should be an easier straightforward method which does not have to check the value. Arnon, your method does not work because SkipFilter will skip those which DONOT satisfy the condition. Hence constructing it from a ColumnPrefixFilter fails the requirement.

Winterwinterbottom answered 5/8, 2013 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.