Searching in values of a redis db
Asked Answered
M

4

28

I am a novice in using Redis DB. After reading some of the documentation and looking into some of the examples on the Internet and also scanning stackoverflow.com, I can see that Redis is very fast, scales well but this costs the price that we have to think out how our data will be accessed at the design time and what operations they will have to undergo. This I can understand but I am a little confused about searching in the data what was so easy, however slow, with the plain old SQL. I could do it in one way with the KEY command but it is an O(N) operation and not O(log(N)). So I would lose one of the advantages of Redis.

What do more experienced colleagues say here?

Let's take an example use case: we have need to store personal data for approx. 100.000 people and those data need to be searched by name, phone nr.

For this I would use the following structures:

1. SET for storing all persons' ids {id1, id2, ...} 
2. HASH for each person to store personal data and name it 
like map:<id> e.g. map:id1{name:<name>, phone:<number>, etc...}

Solution 1:

1. HASH for storing all persons' ids but the key should be the phone number
2. Then with the command KEY 123* all ids could be retrieved who have a phone number 
sarting with 123. On basis of the ids also the other personal data could be retrieved.
3. So forth for each data to be searched for a separate HASH should be created.

But a major drawback of this solution is that the attribute values must also be unique, so that the assigment of the phone number and the ids in the HASH would be unambiguous. On the other hand, O(N) runtime is not ideal.

Moreover, this uses more space than would be necessary and the KEY command deteriorates the access performance. (http://redis.io/commands/keys)

How should it be done in the right way? I could also imagine that ids would go in a ZSET and the data needed search could be the scores but this make only possible to work with ranges not with seraches.

Thank you also in advance, regards, Tamas

Answer summary: Actually, both responses state that Redis was not designed to search in the values of the keys. If this use case is necessary, then either workarounds need to be implemented as shown in my original solution or in the below solution.

The below solution by Eli has a much better performance, than my original one because the access to the keys can be considered constant, only the list of ids needs to be iterated through, for the access this would give O(const) runtime. This data model also allows that one person might have the same phone number as someone else and so on also for names etc... so 1-n relationship is also possible (I would say with old ERD terminology).

The drawback of this solution is, that it consumes much more space than mine and phone numbers whose starting digits are known only, could not be searched.

Thanks for both responses.

Meridithmeriel answered 19/6, 2013 at 13:56 Comment(1)
ZeeSQL is by far the best solution for the use case, I use it myself, and its great. See Siscias answer.Kondon
A
4

Original Secondary Indicies in Redis

The accepted answer here is correct in that the traditional way of handling searching in Redis has been through secondary indices built around Sets and Sorted Sets.

e.g.

HSET Person:1 firstName Bob lastName Marley age 32 phoneNum 8675309

You would maintain secondary indices, so you would have to call

SADD Person:firstName:Bob Person:1
SADD Person:lastName:Marley Person:1
SADD Person:phoneNum:8675309 Person:1
ZADD Person:age 32 Person:1

This allows you to now perform search-like operations

e.g.

SELECT p.age
FROM People AS p
WHERE p.firstName = 'Bob' and p.lastName = 'Marley' and p.phoneNum = '8675309'

Becomes:

ids = SINTER Person:firstName:Bob Person:lastName:Marley Person:phoneNum:8675309

foreach id in ids:
   age = HGET id age
   print(age)

The key challenge to this methodology is that in addition to being relatively complicated to set up (it really forces you to think about your model), it becomes extremely difficult to maintain atomically, particularly in shardded environments (where cross-shard key constraints can become problematic) consequentially the keys and index can drift apart, forcing you to periodically have to loop through and rebuild the index.

Newer Secondary Indices with RediSearch

Caveat: This uses RediSearch a Redis Module available under the Redis Source Available License

There's a newer module that plugs into Redis that can do all this for you called RediSearch This lets you declare secondary indices, and then will take care of indexing everything for you as you insert it. For the above example, you would just need to run

FT.CREATE person-idx ON HASH PREFIX 1 Person: SCHEMA firstName TAG lastName TAG phoneNumber TEXT age NUMERIC SORTABLE

That would declare the index, and after that all you need to do is insert stuff into Redis, e.g.

HSET Person:1 firstName Bob lastName Marley phoneNumber 8675309 age 32

Then you could run:

FT.SEARCH person-idx "@firstName:{Bob} @lastName:{Marley} @phoneNumber: 8675309 @age:[-inf 33]"

To return all the items matching the pattern see query syntax for more details

Alkaline answered 6/10, 2021 at 14:58 Comment(1)
Is that really Bob's number?Cosper
B
33

Redis is for use cases where you need to access and update data at very high frequency and where you benefit from use of data structures (hashes, sets, lists, strings, or sorted sets). It's made to fill very specific use cases. If you have a general use case like very flexible searching, you'd be much better served by something built for this purpose like elastic search or SOLR.

That said, if you must do this in Redis, here's how I'd do it (assuming users can share names and phone numbers):

name:some_name -> set([id1, id2, etc...])
name:some_other_name -> set([id3, id4, etc...])

phone:some_phone -> set([id1, id3, etc...])
phone:some_other_phone -> set([id2, id4, etc...])

id1 -> {'name' : 'bob', 'phone' : '123-456-7891', etc...}
id2 -> {'name' : 'alice', 'phone' : '987-456-7891', etc...}

In this case, we're making a new key for every name (prefixed with "name:") and every phone number (prefixed "phone:"). Each key points to a set of ids that have all the info you want for a user. When you search, for a phone, for example, you'll do:

HGETALL 'phone:123-456-7891'

and then loop through the results and return whatever info on each (name in our example) in your language of choice (you can do this whole thing in server-side Lua on the Redis box to go even faster and avoid network back-and-forth, if you want):

for id in results:
    HGET id 'name'

You're cost here will be O(m) where m is the number of users with the given phone number, and this will be a very fast operation on Redis because of how optimized it is for speed. It'll be overkill in your case because you probably don't need things to go so fast, and you'd prefer having flexible search, but this is how you would do it.

Beatriz answered 19/6, 2013 at 21:4 Comment(4)
Thanks for the response. I really liked this solution only the space it requires scared me a little but I understand that it is necessary for a workaround. Could you also add the functionality to search for starting digits of phone numbers? (I guess the autocomplete #6401694 in the below post is not really feasible for a larger DB).Meridithmeriel
Yeah, you can add the ability to search for whatever you want. Just add an auxiliary hash for it. Something like phone3:xxx if you wanted to search by first 3 digits for example. For small hashes, Redis will zip them up and store them as strings for you, so you don't take up as much space. You can also change when it does that. More info at redis.io/topics/memory-optimization .Beatriz
Thanks for the link. What do you think the max size of a Redis DB could be in GB? Does the physical RAM limit the DB size?Meridithmeriel
Sort of. Once a Redis db grows past RAM size, it starts swapping and slows down. That completely defeats the point of using Redis in the first place. So, even though you can have data bigger than your RAM in Redis, you really don't want to.Beatriz
B
11

redis is awesome, but it's not built for searching on anything other than keys. You simply cant query on values without building extra data sets to store items to facilitate such querying, but even then you don't get true search, just more maintenance, inefficient use of memory, yada, yada...

This question has already been addressed, you've got some reading to do :-D

To search strings, build auto-complete in redis and other cool things...
How do I search strings in redis?

Why using MongoDB over redis is smart when searching inside documents... What's the most efficient document-oriented database engine to store thousands of medium sized documents?

Bergmans answered 19/6, 2013 at 18:45 Comment(1)
Thanks for the response. The first entry about autocomplete feature implementation I have already read before writing the original post but I think that breaking down all words at each character and adding them to any collection cummulatively would cause a data explosion in any bigger DB, so I don't think it would be feasible. The second link was useful and I found the following comparison of different NoSQL DBs also useful: kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redisMeridithmeriel
A
4

Original Secondary Indicies in Redis

The accepted answer here is correct in that the traditional way of handling searching in Redis has been through secondary indices built around Sets and Sorted Sets.

e.g.

HSET Person:1 firstName Bob lastName Marley age 32 phoneNum 8675309

You would maintain secondary indices, so you would have to call

SADD Person:firstName:Bob Person:1
SADD Person:lastName:Marley Person:1
SADD Person:phoneNum:8675309 Person:1
ZADD Person:age 32 Person:1

This allows you to now perform search-like operations

e.g.

SELECT p.age
FROM People AS p
WHERE p.firstName = 'Bob' and p.lastName = 'Marley' and p.phoneNum = '8675309'

Becomes:

ids = SINTER Person:firstName:Bob Person:lastName:Marley Person:phoneNum:8675309

foreach id in ids:
   age = HGET id age
   print(age)

The key challenge to this methodology is that in addition to being relatively complicated to set up (it really forces you to think about your model), it becomes extremely difficult to maintain atomically, particularly in shardded environments (where cross-shard key constraints can become problematic) consequentially the keys and index can drift apart, forcing you to periodically have to loop through and rebuild the index.

Newer Secondary Indices with RediSearch

Caveat: This uses RediSearch a Redis Module available under the Redis Source Available License

There's a newer module that plugs into Redis that can do all this for you called RediSearch This lets you declare secondary indices, and then will take care of indexing everything for you as you insert it. For the above example, you would just need to run

FT.CREATE person-idx ON HASH PREFIX 1 Person: SCHEMA firstName TAG lastName TAG phoneNumber TEXT age NUMERIC SORTABLE

That would declare the index, and after that all you need to do is insert stuff into Redis, e.g.

HSET Person:1 firstName Bob lastName Marley phoneNumber 8675309 age 32

Then you could run:

FT.SEARCH person-idx "@firstName:{Bob} @lastName:{Marley} @phoneNumber: 8675309 @age:[-inf 33]"

To return all the items matching the pattern see query syntax for more details

Alkaline answered 6/10, 2021 at 14:58 Comment(1)
Is that really Bob's number?Cosper
D
2

zeeSQL is a novel Redis modules with SQL and secondary indexes capabilities, allowing search by value of Redis keys.

You can set it up in such a way to track the values of all the hashes and put them into a standard SQL table.

For your example of searching people by phone number and name, you could do something like.

> ZEESQL.CREATE_DB DB
"OK"
> ZEESQL.INDEX DB NEW PREFIX customer:* TABLE customer SCHEMA id INT name STRING phone STRING

At this point zeeSQL will track all the hashes that start with customer and will put them into a SQL table. It will store the fields id as an integer, name as a string and phone as a string.

You can populate the table simply adding hashes to Redis, and zeeSQL will keep everything in sync.

> HMSET customer:1 id 1 name joseph phone 123-345-2345
> HMSET customer:2 id 2 name lukas phone 234-987-4453
> HMSET customer:3 id 3 name mary phone 678-443-2341 

At this point you can look into the customer table and you will find the result you are looking for.

> ZEESQL.EXEC DB COMMAND "select * from customer"
1) 1) RESULT
2) 1) id
2) 2) name
2) 3) phone
3) 1) INT
3) 2) STRING
3) 3) STRING
4) 1) 1
4) 2) joseph
4) 3) 123-345-2345
5) 1) 2
5) 2) lukas
5) 3) 234-987-4453
6) 1) 3
6) 2) mary
6) 3) 678-443-2341

The results specify, at first the name of the columns, then the type of the columns and finally the actual results set.

zeeSQL is based on SQLite and it supports all the SQLite syntax for filtering and aggregation.

For instance, you could search for people knowing only the prefix of their phone number.

> ZEESQL.EXEC DB COMMAND "select name from customer where phone like 678%"
1) 1) RESULT
2) 1) name
3) 1) STRING
4) 1) mary

You can find more examples in the tutorial: https://doc.zeesql.com/tutorial#using-secondary-indexes-or-search-by-values-in-redis

Distinctive answered 20/2, 2021 at 8:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.