Validating row at client side better than secondary index with whole primary key?
Asked Answered
P

2

1

In cassandra, it's well known that secondary indexes should be used very sparingly.

If I have a table for example:

User(username, usertype, email, etc..)

Here username is partition key. Now I want to support operation which returns a specific user(username will be given) if and only if usertype is a specific value X.

There are two ways I can do it:

One: Create a secondary index on usertype, possible values ('A', 'B', 'C') and username is partition key.

SELECT * FROM user WHERE username='something' AND usertype='A';

Two:

I can just fetch the row with username to client and then check if usertype is A.

Which approach is better? Please also consider a wide row(not so big, 10s) scenario where not all rows of a partition might have the given value (which requires some client side filtering).

What I'm not clear about secondary indexes is how data is looked up in a particular node.

Ex: SELECT * FROM user WHERE username='something' AND usertype='A'

For example usertype hidden CF has data 'A'-> 'jhon', 'miller', 'chris',...etc, 100 usernames

And the query with partition key is given along with usertype does it scan through all these 100 usernames to match with the username 'something' or does it just fetches by username first and sees the usertype column if it matches with 'A'? How exactly it does that searching? How does the query fares given the index is on low cardinality data and each one is mapped to many rows?

I'm using java as client if that matters.

Update: I understand that I can use clustering (usertype) key for this particular example but I wanted to know the trade off I've asked. My original tables are much more complex.

Pathognomy answered 15/4, 2015 at 19:51 Comment(3)
gender example might have been better than usertypePathognomy
See this: #29693238Dissuade
@Dissuade thanks for the link...I'm still skeptical on the scalability of low cardinality indexes. For example, gender, when there are millions of users and even when we provide partition key in the query.Pathognomy
J
2

For this example, let's say I create a table to keep track of crew members by ship and id:

CREATE TABLE crewByShip (
  ship text,
  id int,
  firstname text,
  lastname text,
  gender text,
  PRIMARY KEY(ship,id));

And I'll create an index on gender:

CREATE INDEX crewByShipG_idx ON crewByShip(gender);

After inserting some data, my table looks like this:

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+-----------
 Serenity |  1 |     Hoban |      M | Washburne
 Serenity |  2 |      Zoey |      F | Washburne
 Serenity |  3 |   Malcolm |      M |  Reynolds
 Serenity |  4 |    Kaylee |      F |      Frye
 Serenity |  5 |  Sheppard |      M |      Book
 Serenity |  6 |     Jayne |      M |      Cobb
 Serenity |  7 |     Simon |      M |       Tam
 Serenity |  8 |     River |      F |       Tam
 Serenity |  9 |     Inara |      F |     Serra

Now I'll turn tracing on, and query a distinct row with the PRIMARY KEY, but also restrict by our index on gender.

aploetz@cqlsh:stackoverflow2> tracing on;
aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M';

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 34ea1840-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                                                                                                                                                                                                                                       | timestamp                  | source         | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                                                                                                                                                                                                                             Execute CQL3 query | 2015-04-22 06:17:48.102000 | 192.168.23.129 |              0
                                                                                                                                                                                                          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M'; [SharedPool-Worker-1] | 2015-04-22 06:17:48.114000 | 192.168.23.129 |           3715
                                                                                                                                                                                                                                                                      Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:48.116000 | 192.168.23.129 |           4846
                                                                                                                                                                                                                                                Executing single-partition query on users [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5730
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5757
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5793
                                                                                                                                                                                                                                                              Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5848
                                                                                                                                                                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           5856
                                                                                                                                                                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           7056
                                                                                                                                                                                                                                               Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.121000 | 192.168.23.129 |           7080
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7143
                                                                                                                                                                                                                                                                Computing ranges to query [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7578
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7742
                                                                                                                                                                                              Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7807
                                                                                                                                                                                                                                  Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7851
                                                                                                                                                                                                                                          Executing indexed scan for [Serenity, Serenity] [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10848
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10936
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11007
                                                                                                                                                                                                                           Executing single-partition query on crewbyship.crewbyshipg_idx [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11130
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11139
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11155
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11253
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11262
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.127000 | 192.168.23.129 |          11281
                                                                                                                                                                                                                                           Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:48.130000 | 192.168.23.129 |          11369
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11375
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11383
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.133000 | 192.168.23.129 |          11409
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.134000 | 192.168.23.129 |          11415
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11430
                                                                                                                                                                                                                                                             Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11490
                                                                                                                                                                                                                                                                                               Request complete | 2015-04-22 06:17:48.115679 | 192.168.23.129 |          13679

Now, I'll re-run the same query, but without the superfluous index on gender.

aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3;

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 38d7f440-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                        | timestamp                  | source         | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                              Execute CQL3 query | 2015-04-22 06:17:54.692000 | 192.168.23.129 |              0
          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3; [SharedPool-Worker-1] | 2015-04-22 06:17:54.695000 | 192.168.23.129 |             87
                                                       Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:54.696000 | 192.168.23.129 |            246
                                 Executing single-partition query on users [SharedPool-Worker-3] | 2015-04-22 06:17:54.697000 | 192.168.23.129 |           1185
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1197
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1215
                                               Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1249
                               Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1278
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3309
                                Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3333
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           3368
                            Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           4607
                                              Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4633
                                               Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4643
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4678
                                Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4683
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:54.706000 | 192.168.23.129 |           4697
                                                                                Request complete | 2015-04-22 06:17:54.697676 | 192.168.23.129 |           5676

As you can see, the "source_elapsed" for the query with the secondary index was more than twice what it was for the same query (which returned the same row) without the index.

I think we can definitely say that using a secondary index on a low-cardinality column in a wide-row table will not perform well. Now while I won't say that filtering client-side is a good idea, in this case, with a small result set, it probably would be the better option.

Jumada answered 22/4, 2015 at 11:36 Comment(7)
Yeah..only other option is to create another table having the (ship, id, gender) right? But that will cause me trouble updating two table consistently using LOGGED BATCHES...which costs me some performance while insertion...and duplication of data...right?Pathognomy
@Pathognomy Correct on both accounts. But that is the use pattern that Cassandra works the best with. And if you can kick-off the batch asynchronously, your application shouldn't notice much performance-wise.Jumada
Can logged batch be run asynchronously?Pathognomy
@Pathognomy Yes. Lyuben posted a good answer describing that: #19203312Jumada
Am I missing something? But that answer is about using batches and using asynch queries which I already know...it's not about executing a LOGGED BATCH asynchronously right?Pathognomy
Actually, that is a logged batch. Batches are "logged" by default, and can be rendered "unlogged" with the UNLOGGED keyword.Jumada
Batches are sent at a time...we don't have to manually run asynchronously if I get it correctly..is it not? In fact I've read some where Unlogged batches are nothing but asynchronous queries...Pathognomy
D
1

The good option here would be to create a composite primary key consisting of username and usertype with username being partition key and usertype a cluster key. You will not even need an index and the query will work.

CREATE TABLE users (
  username text,
  usertype text,
   ....
  PRIMARY KEY ((username), usertype)
)
Dissuade answered 15/4, 2015 at 20:3 Comment(1)
I understand that...I think I have given a bad example...but my original tables are much more complex....so please try to answer the trade off...Pathognomy

© 2022 - 2024 — McMap. All rights reserved.