Spark Cassandra connector filtering with IN clause
Asked Answered
S

1

5

I am facing some issues with spark cassandra connector filtering for java. Cassandra allows the filtering by last column of the partition key with IN clause. e.g

create table cf_text
(a varchar,b varchar,c varchar, primary key((a,b),c))

Query : select * from cf_text where a ='asdf' and b in ('af','sd');

sc.cassandraTable("test", "cf_text").where("a = ?", "af").toArray.foreach(println)

How count I specify the IN clause which is used in the CQL query in spark? How range queries can be specified as well?

Seditious answered 25/6, 2015 at 10:45 Comment(0)
B
7

Just wondering, but does your Spark code above work? I thought that Spark won't allow a WHERE on partition keys (a and b in your case), since it uses them under the hood (see last answer to this question): Spark Datastax Java API Select statements

In any case, with the Cassandra Spark connector, you are allowed to stack your WHERE clauses, and an IN can be specified with a List<String>.

List<String> valuesList = new ArrayList<String>();
valuesList.Add("value2");
valuesList.Add("value3");

sc.cassandraTable("test", "cf")
    .where("column1 = ?", "value1")
    .where("column2 IN ?", valuesList)
    .keyBy(new Function<MyCFClass, String>() {
                public String call(MyCFClass _myCF) throws Exception {
                    return _myCF.getId();
                }
            });

Note that the normal rules of using IN with Cassandra/CQL still apply here.

Range queries function in a similar manner:

sc.cassandraTable("test", "person")
    .where("age > ?", "15")
    .where("age < ?", "20")
    .keyBy(new Function<Person, String>() {
                public String call(Person _person) throws Exception {
                    return _person.getPersonid();
                }
            });
Bicycle answered 25/6, 2015 at 14:39 Comment(1)
Yeah spark works on the partition key in where clause. I missed b column of the partition key(a,b) om above spark code. Both a and b should be present in the where clause. sc.cassandraTable("test", "cf_text").where("a = ?", "af").where("b=?","df").toArray.foreach(println)Seditious

© 2022 - 2024 — McMap. All rights reserved.