Range query on secondary index in cassandra

Asked 1/3, 2016 at 10:8 Answered 1/3, 2016 at 13:31

I am using cassandra 2.1.10. So First I will clear that I know secondary index are anti-pattern in cassandra.But for testing purpose I was trying following:

CREATE TABLE test_topology1.tt (
    a text PRIMARY KEY,
    b timestamp
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX idx_tt ON test_topology1.tt (b);

When I run following query it gives me error.

cqlsh:test_topology1> Select * from tt where b>='2016-04-29 18:00:00' ALLOW FILTERING;
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on the restricted columns support the provided operators: 'b >= <value>'"

while this Blog says that allow filtering can be used to query secondary index. Cassandra is installed on windows machine.

Chui answered 1/3, 2016 at 10:8 Comment(2)

The two answers in this thread that are not yours explain that range queries are not possible on secondary indexes. The post you are referencing also explains that >= restrictions in secondary index queries are only possible for non-indexed columns and only if you allow filtering. – Aftertaste 1/3, 2016 at 11:4

@Aftertaste one of the answer also says allow filtering will allow range queries. Also the blog of cassandra I mentioned in the question and this so post #34541383 suggest the same – Chui 1/3, 2016 at 11:58

Range queries on secondary index columns are not allowed in Cassandra up to and including 2.2.x. However, as the post A deep look at the CQL WHERE clause points out, they are allowed on non-indexed columns, if filtering is allwed:

Direct queries on secondary indices support only =, CONTAINS or CONTAINS KEY restrictions.

[..]

Secondary index queries allow you to restrict the returned results using the =, >, >=, <= and <, CONTAINS and CONTAINS KEY restrictions on non-indexed columns using filtering.

So, given the table structure and index

CREATE TABLE test_secondary_index (
     a text PRIMARY KEY,
     b timestamp,
     c timestamp 
);
CREATE INDEX idx_inequality_test ON test_secondary_index (b);

the following query fails because the inequality test is done on the indexed column:

SELECT * FROM  test_secondary_index WHERE b >= '2016-04-29 18:00:00' ALLOW FILTERING ;
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on the restricted columns support the provided operators: 'b >= <value>'"

But the following works because the inequality test is done on a non-indexed column:

SELECT * FROM  test_secondary_index WHERE b = '2016-04-29 18:00:00' AND c >= '2016-04-29 18:00:00' ALLOW FILTERING ;

 a | b | c
---+---+---

(0 rows)

This still works if you add another index on column c, but also still requires the ALLOW FILTERING term, which to me means that the index on column c is not used in this scenario.

Aftertaste answered 1/3, 2016 at 13:31 Comment(0)

The range query DOES work with secondary index using ALLOW FILTERING

cqlsh:spark_demo> create table tt (
              ...     a text PRIMARY KEY,
              ...     b timestamp
              ... );
cqlsh:spark_demo> CREATE INDEX ON tt(b);
cqlsh:spark_demo> SELECT * FROM tt WHERE b >= '2016-03-01 12:00:00+0000';
InvalidRequest: code=2200 [Invalid query] message="No supported secondary index found for the non primary key columns restrictions"
cqlsh:spark_demo> SELECT * FROM tt WHERE b >= '2016-03-01 12:00:00+0000' ALLOW FILTERING;

 a | b
---+---

(0 rows)
cqlsh:spark_demo>

Ezarra answered 1/3, 2016 at 12:28 Comment(4)

can you please confirm cassandra version, cql version and OS. Because it's not working on windows cassandra 2.1.10 – Chui 1/3, 2016 at 12:54

Cassandra 3.3, OS = Mac OS X 10.11.1 El Capitan – Ezarra 1/3, 2016 at 13:0

I will have to check for 3.0 or above because this is not working in 2.1. – Chui 1/3, 2016 at 13:3

@Ezarra how does this internally work ? whats the data-structure for secondary index? Is it a B Tree? or is it a hidden table with primary key 'b' ? I assume secondary index is created in every node.If so, this query perform some kind of scatter-gather operation on all the nodes ? – Crotchety 9/4, 2020 at 11:9

This will get you your desired results. Use b as a clustering column.

CREATE TABLE test_topology1.tt ( a text, b timestamp, PRIMARY KEY (a, b) )

select * from tt where b>='2016-04-29 18:00:00' allow filtering;

Truc answered 1/3, 2016 at 12:0 Comment(1)

I know it will. what I am looking for is why the range query is not working on secondary index with allow filtering when blog says so – Chui 1/3, 2016 at 12:15

Recommended topics

Hot tags