How to get same rank for same scores in Redis' ZRANK?
Asked Answered
M

5

6

If I have 5 members with scores as follows

a - 1
b - 2
c - 3
d - 3
e - 5

ZRANK of c returns 2, ZRANK of d returns 3

Is there a way to get same rank for same scores?
Example: ZRANK c = 2, d = 2, e = 3
If yes, then how to implement that in spring-data-redis?

Marquardt answered 3/9, 2018 at 15:2 Comment(0)
G
4

Any real solution needs to fit the requirements, which are kind of missing in the original question. My 1st answer had assumed a small dataset, but this approach does not scale as dense ranking is done (e.g. via Lua) in O(N) at least.

So, assuming that there are a lot of users with scores, the direction that for_stack suggested is better, in which multiple data structures are combined. I believe this is the gist of his last remark.

To store users' scores you can use a Hash. While conceptually you can use a single key to store a Hash of all users scores, in practice you'd want to hash the Hash so it will scale. To keep this example simple, I'll ignore Hash scaling.

This is how you'd add (update) a user's score in Lua:

local hscores_key = KEYS[1]
local user = ARGV[1]
local increment = ARGV[2]
local new_score = redis.call('HINCRBY', hscores_key, user, increment)

Next, we want to track the current count of users per discrete score value so we keep another hash for that:

local old_score = new_score - increment
local hcounts_key = KEYS[2]
local old_count = redis.call('HINCRBY', hcounts_key, old_score, -1)
local new_count = redis.call('HINCRBY', hcounts_key, new_score, 1)

Now, the last thing we need to maintain is the per score rank, with a sorted set. Every new score is added as a member in the zset, and scores that have no more users are removed:

local zdranks_key = KEYS[3]
if new_count == 1 then
  redis.call('ZADD', zdranks_key, new_score, new_score)
end
if old_count == 0 then
  redis.call('ZREM', zdranks_key, old_score)
end

This 3-piece-script's complexity is O(logN) due to the use of the Sorted Set, but note that N is the number of discrete score values, not the users in the system. Getting a user's dense ranking is done via another, shorter and simpler script:

local hscores_key = KEYS[1]
local zdranks_key = KEYS[2]
local user = ARGV[1]

local score = redis.call('HGET', hscores_key, user)
return redis.call('ZRANK', zdranks_key, score)
Granite answered 4/9, 2018 at 9:6 Comment(1)
as dense ranking is done (e.g. via Lua) in O(N) at least Why is this so? I can think of an approach using Lua (using ZSCORE and ZCOUNT) where dense ranking is done in O(log(N)).Moen
B
4

You can achieve the goal with two Sorted Set: one for member to score mapping, and one for score to rank mapping.

Add

  1. Add items to member to score mapping: ZADD mem_2_score 1 a 2 b 3 c 3 d 5 e
  2. Add the scores to score to rank mapping: ZADD score_2_rank 1 1 2 2 3 3 5 5

Search

  1. Get score first: ZSCORE mem_2_score c, this should return the score, i.e. 3.
  2. Get the rank for the score: ZRANK score_2_rank 3, this should return the dense ranking, i.e. 2.

In order to run it atomically, wrap the Add, and Search operations into 2 Lua scripts.

Ballista answered 4/9, 2018 at 4:49 Comment(4)
I tried that approach already. It works like charm when my data is static. Problem happens when I update the score of members in first set. For example: if I update score of b to 4 and add 4 in second set, zrank score_2_rank 4 will be 3 instead of 2 because score 2 still exists in the setMarquardt
Good point! In order to delete or update a member, you need to have a HASH to save the refcounting of the scores. When adding a member, incr the refcounting of its score. When deleting or updating a member, decr the refcounting of the old score. If the score is 0, also remove the score from HASH and score_2_rank. That's really complex!Ballista
Had my fun below (twice, not including the 1st) and after mulling it I think that 1 set is better than two (but two hashes instead of one or none) in terms of computational complexity. That said, to scale this bad baby, you'll need to shard the hashes and probably the zset as well, which means more work for you :PGranite
@ItamarHaber Yes, in the case of mem_2_score mapping, hash is better than sorted set, since we don't need the order.Ballista
G
4

Any real solution needs to fit the requirements, which are kind of missing in the original question. My 1st answer had assumed a small dataset, but this approach does not scale as dense ranking is done (e.g. via Lua) in O(N) at least.

So, assuming that there are a lot of users with scores, the direction that for_stack suggested is better, in which multiple data structures are combined. I believe this is the gist of his last remark.

To store users' scores you can use a Hash. While conceptually you can use a single key to store a Hash of all users scores, in practice you'd want to hash the Hash so it will scale. To keep this example simple, I'll ignore Hash scaling.

This is how you'd add (update) a user's score in Lua:

local hscores_key = KEYS[1]
local user = ARGV[1]
local increment = ARGV[2]
local new_score = redis.call('HINCRBY', hscores_key, user, increment)

Next, we want to track the current count of users per discrete score value so we keep another hash for that:

local old_score = new_score - increment
local hcounts_key = KEYS[2]
local old_count = redis.call('HINCRBY', hcounts_key, old_score, -1)
local new_count = redis.call('HINCRBY', hcounts_key, new_score, 1)

Now, the last thing we need to maintain is the per score rank, with a sorted set. Every new score is added as a member in the zset, and scores that have no more users are removed:

local zdranks_key = KEYS[3]
if new_count == 1 then
  redis.call('ZADD', zdranks_key, new_score, new_score)
end
if old_count == 0 then
  redis.call('ZREM', zdranks_key, old_score)
end

This 3-piece-script's complexity is O(logN) due to the use of the Sorted Set, but note that N is the number of discrete score values, not the users in the system. Getting a user's dense ranking is done via another, shorter and simpler script:

local hscores_key = KEYS[1]
local zdranks_key = KEYS[2]
local user = ARGV[1]

local score = redis.call('HGET', hscores_key, user)
return redis.call('ZRANK', zdranks_key, score)
Granite answered 4/9, 2018 at 9:6 Comment(1)
as dense ranking is done (e.g. via Lua) in O(N) at least Why is this so? I can think of an approach using Lua (using ZSCORE and ZCOUNT) where dense ranking is done in O(log(N)).Moen
G
2

Then there's this Pull Request - https://github.com/antirez/redis/pull/2011 - which is dead, but appears to make dense rankings on the fly. The original issue/feature request (https://github.com/antirez/redis/issues/943) got some interest so perhaps it is worth reviving it /cc @antirez :)

Granite answered 4/9, 2018 at 16:55 Comment(1)
It seems this feature is strongly required :)Ballista
G
0

The rank is unique in a sorted set, and elements with the same score are ordered (ranked) lexically.

There is no Redis command that does this "dense ranking"

You could, however, use a Lua script that fetches a range from a sorted set, and reduces it to your requested form. This could work on small data sets, but you'd have to devise something more complex for to scale.

Granite answered 3/9, 2018 at 17:11 Comment(1)
Yep, I actually kept thinking about this question as it just so happens that it was presented by another person at roughly the same time but on another channel... I think I'm gonna add another answer just for funGranite
O
0
unsigned long zslGetRank(zskiplist *zsl, double score, sds ele) {
    zskiplistNode *x;
    unsigned long rank = 0;
    int i;

    x = zsl->header;
    for (i = zsl->level-1; i >= 0; i--) {
        while (x->level[i].forward &&
            (x->level[i].forward->score < score ||
                (x->level[i].forward->score == score &&
                sdscmp(x->level[i].forward->ele,ele) <= 0))) {
            rank += x->level[i].span;
            x = x->level[i].forward;
        }

        /* x might be equal to zsl->header, so test if obj is non-NULL */
        if (x->ele && x->score == score && sdscmp(x->ele,ele) == 0) {
            return rank;
        }
    }
    return 0;
}

https://github.com/redis/redis/blob/b375f5919ea7458ecf453cbe58f05a6085a954f0/src/t_zset.c#L475

This is the piece of code redis uses to compute the rank in sorted sets. Right now ,it just gives rank based on the position in the Skiplist (which is sorted based on scores).

What does the skiplistnode variable "span" mean in redis.h? (what is span ?)

Ozonize answered 27/8, 2021 at 5:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.