Python and Redis. Why is scan_iter() so much slower than keys() with mget()?
Asked Answered
P

2

9

I made a script to compare query speeds between methods in the redis library.

As I understand it, I should not use redis.keys() because it's a blocking function. The preferred method is scan_iter(), which doesn't block. That makes sense. But I don't understand why scan_iter is so terribly slow.

This script uses three techniques to query a redis database.

  1. Use redis.keys() to get all of the keys. Use redis.mget() to get all the values.
  2. Use a for loop with redis.scan_iter and redis.get() to get the values. (Note I don't need keys, just values)
  3. Use redis.scan_iter() to get all of the keys. Use redis.mget() to get all the values.

Option 1 was by far the fastest. Option 2 was 20 to 30 times slower. Option 3 averaged 4 times slower.

Why is this? Have I written my code incorectly? Do I have the wrong method? Is the redis.keys() method actually the best choice?

from datetime import datetime
from time import sleep
import redis
r = redis.StrictRedis(host='192.168.3.16', port=6379, decode_responses=True, db= 0)

def query_keys():
    start = datetime.now()
    redis_keys = r.keys(pattern='*')
    redis_keys = [x for x in redis_keys if not x.startswith('1_')]
    values = r.mget(redis_keys)
    end = datetime.now()
    print(str(len(values)) + ' values queried in ' + str(end - start) + ' with keys and mget.')
    
def query_scaniter():
    start = datetime.now()
    values = []
    for s in r.scan_iter():
        if not s.startswith('1_'):
            values.append(r.get(s))
    end = datetime.now()
    print(str(len(values)) + ' values queried in ' + str(end - start) + ' with scan_iter.')
    
def query_scaniter_mget():
    start = datetime.now()
    redis_keys = []
    for s in r.scan_iter():
        if not s.startswith('1_'):
            redis_keys.append(s)
    values = r.mget(redis_keys)
    end = datetime.now()
    print(str(len(values)) + ' values queried in ' + str(end - start) + ' with scan_iter and mget.')

for i in range(3):
    query_keys()
    query_scaniter()
    query_scaniter_mget()
    print('\n')
    sleep(5)

Output:
3532 values queried in 0:00:00.046872 with keys and mget.
3532 values queried in 0:00:00.781314 with scan_iter.
3532 values queried in 0:00:00.109385 with scan_iter and mget.


3526 values queried in 0:00:00.031245 with keys and mget.
3522 values queried in 0:00:00.812616 with scan_iter.
3522 values queried in 0:00:00.125007 with scan_iter and mget.


3529 values queried in 0:00:00.031246 with keys and mget.
3531 values queried in 0:00:00.797011 with scan_iter.
3530 values queried in 0:00:00.109357 with scan_iter and mget.
Pollywog answered 1/6, 2021 at 21:14 Comment(0)
A
8

Those results are as expected.

The reason to avoid KEYS isn't that it's slow, it's that it blocks the server, preventing it from satisfying other client requests. By contrast, SCAN only returns a few results at a time, allowing the server to stay responsive to all clients.

The tradeoff is that doing a lot of SCAN calls imposes extra overhead (including the network overhead of all those round-trips). So the total time for your client will be longer, but you won't cripple the server, which is generally a more important consideration.

Note that you can have the same issue with MGET if you have a huge number of keys. The most scalable solution (which is different from the fastest) would be to get a few keys at a time with SCAN, and then MGET those keys.

Appling answered 1/6, 2021 at 23:48 Comment(0)
L
7

For anyone still needing an answer, use the following: redis.scan_iter(match="STRINGMATCHPATTERN", count=10000)

count=10000 allows 10k keys to be returned at a time, significately reducing network back & forth. You can change this value.

Laverne answered 4/7, 2023 at 2:27 Comment(1)
Wouldn't using a high number for the count block the server just as keys() does?Aisha

© 2022 - 2024 — McMap. All rights reserved.