Redis strings vs Redis hashes to represent JSON: efficiency?

Asked 4/5, 2013 at 14:8 Answered 21/12, 2022 at 16:10

356

I want to store a JSON payload into redis. There's really 2 ways I can do this:

One using a simple string keys and values.
key:user, value:payload (the entire JSON blob which can be 100-200 KB)

SET user:1 payload
Using hashes

HSET user:1 username "someone" HSET user:1 location "NY" HSET user:1 bio "STRING WITH OVER 100 lines"

Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.

Which is more memory efficient? Using string keys and values, or using a hash?

Froufrou answered 4/5, 2013 at 14:8 Comment(2)

Also keep in mind that you can't (easily) store a nested JSON object in a hash set. – Elea 5/5, 2013 at 21:21

ReJSON can help here as well: redislabs.com/blog/redis-as-a-json-store – Stiltner 26/4, 2017 at 17:51

192

It depends on how you access the data:

Go for Option 1:

If you use most of the fields on most of your accesses.
If there is variance on possible keys

Go for Option 2:

If you use just single fields on most of your accesses.
If you always know which fields are available

P.S.: As a rule of the thumb, go for the option which requires fewer queries on most of your use cases.

Squint answered 4/5, 2013 at 14:23 Comment(2)

Option 1 is not a good idea if concurrent modification of the JSON payload is expected (a classic problem of non-atomic read-modify-write). – Pinsky 23/8, 2016 at 13:16

Which is more efficient among available options of storing json blob as an json string or as a byte array in Redis? – Noreen 11/7, 2018 at 11:55

502

This article can provide a lot of insight here: http://redis.io/topics/memory-optimization

There are many ways to store an array of Objects in Redis (spoiler: I like option 1 for most use cases):

Store the entire object as JSON-encoded string in a single key and keep track of all Objects using a set (or list, if more appropriate). For example:
```
INCR id:users
SET user:{id} '{"name":"Fred","age":25}'
SADD users {id}
```
Generally speaking, this is probably the best method in most cases. If there are a lot of fields in the Object, your Objects are not nested with other Objects, and you tend to only access a small subset of fields at a time, it might be better to go with option 2.

Advantages: considered a "good practice." Each Object is a full-blown Redis key. JSON parsing is fast, especially when you need to access many fields for this Object at once. Disadvantages: slower when you only need to access a single field.
Store each Object's properties in a Redis hash.
```
INCR id:users
HMSET user:{id} name "Fred" age 25
SADD users {id}
```
Advantages: considered a "good practice." Each Object is a full-blown Redis key. No need to parse JSON strings. Disadvantages: possibly slower when you need to access all/most of the fields in an Object. Also, nested Objects (Objects within Objects) cannot be easily stored.
Store each Object as a JSON string in a Redis hash.
```
INCR id:users
HMSET users {id} '{"name":"Fred","age":25}'
```
This allows you to consolidate a bit and only use two keys instead of lots of keys. The obvious disadvantage is that you can't set the TTL (and other stuff) on each user Object, since it is merely a field in the Redis hash and not a full-blown Redis key.

Advantages: JSON parsing is fast, especially when you need to access many fields for this Object at once. Less "polluting" of the main key namespace. Disadvantages: About same memory usage as #1 when you have a lot of Objects. Slower than #2 when you only need to access a single field. Probably not considered a "good practice."
Store each property of each Object in a dedicated key.
```
INCR id:users
SET user:{id}:name "Fred"
SET user:{id}:age 25
SADD users {id}
```
According to the article above, this option is almost never preferred (unless the property of the Object needs to have specific TTL or something).

Advantages: Object properties are full-blown Redis keys, which might not be overkill for your app. Disadvantages: slow, uses more memory, and not considered "best practice." Lots of polluting of the main key namespace.

Overall Summary

Option 4 is generally not preferred. Options 1 and 2 are very similar, and they are both pretty common. I prefer option 1 (generally speaking) because it allows you to store more complicated Objects (with multiple layers of nesting, etc.) Option 3 is used when you really care about not polluting the main key namespace (i.e. you don't want there to be a lot of keys in your database and you don't care about things like TTL, key sharding, or whatever).

If I got something wrong here, please consider leaving a comment and allowing me to revise the answer before downvoting. Thanks! :)

Anastice answered 24/9, 2013 at 20:15 Comment(20)

For Option #2 you say "possibly slower when you need to access all/most of the fields in an Object". Has this been tested? – Align 7/10, 2014 at 21:35

@Align - I carefully chose the words "probably slower" because I didn't test. :) But, my theory is that, in most cases, if you are accessing all/most of the fields in an Object, option 1 should be faster than option 2, especially if the Redis server is a remote server. – Anastice 13/10, 2014 at 11:56

hmget is O(n) for n fields get with option 1 would still be O(1). Theoretically, yes, its faster. – Tsan 6/2, 2015 at 7:20

How about combining options 1 and 2 with a hash? Use option 1 for infrequently updated data and option 2 for frequently updated data? Say, we are storing articles and we store fields like title, author and url in a JSON string with a generic key like obj and store fields like views, votes and voters with separate keys? This way with a single READ query you get the entire object and can still update dynamic portions of your object quickly? The relatively infrequent updates to fields in the JSON string can be done by reading and writing the entire object back in a transaction. – Butane 12/5, 2015 at 18:55

According to this: (instagram-engineering.tumblr.com/post/12202313862/…) it's recommended to store in multiple hashes in terms of memory consumption. So after arun's optimization, we can do: 1- make multiple hashes storing the json payload as strings for the infrequently updated data, and 2- make multiple hashes storing the json fields for the frequently updated data – Cable 2/3, 2016 at 11:41

In case of option1, why are we adding it to a set? Why can't we simply use Get command and check if return in not nil. – Cortege 12/4, 2016 at 15:51

How do I get the users back based on index – Cotta 3/8, 2016 at 13:29

This section on simple numerical indexes is a really good read. Whether you like to store your data object as a json string with SET or save each tuple with HMSET, you can then create a secondary index with ZADD and query with ZRANGEBYSCORE. This then gives you the keys you need to query with GET or HGETALL. The trick is abstracting out the writing of data in your application to keep all your indexes up to date. – Rigel 1/2, 2017 at 23:8

You can find useful infomration at this link redislabs.com/ebook/part-2-core-concepts/… – Eastwood 15/2, 2017 at 21:50

@BMiner, very nice post. Can you suggest an updated documentation about the point 1? – Dumas 27/2, 2017 at 23:48

@BMiner, I mean if you know any good site/tutorial that shows a java code walkthrough of this. I have done my tests with spring-data-redis and they do a good job about storing, however for searches is another story. I was wondering if you could provide more links that explains in practice what you suggest for your point 1. – Dumas 7/3, 2017 at 20:52

how about mset, mget with multiple keys and values for json like behaviour? – Backspace 30/6, 2017 at 14:32

One way to optimize what @Anastice mentions for option 1 is to store the string as CSV instead of JSON. I have a JSON array of 1500 items with 20 keys per object. Instead of storing "key": "{\"key1\":\"value1\"}" , try "key": "value1,value2" a lot of those redundant characters such as paranthesis, double quotes backslash are eliminated – Adalbert 14/2, 2018 at 16:3

Hi, I am new to redis and just start to learn it today. I am very confused on how to do queries. For both option1 and option2 you recommend, how can I query users whose age is greater than 30? Thanks a lot. – Dunkle 24/2, 2019 at 14:22

One more question, what does SADD users {id} mean? Why do you want to add the id to a Set? I don't really get it. Thanks a lot. – Dunkle 24/2, 2019 at 14:24

@Dunkle - Adding the user ID to a set allows you to keep track of which users are available. Think about it from a querying perspective: you may want a way to iterate through the entire set of users. Using KEYS * or other options are discouraged; it is often better to explicitly keep track in a set. Hope that helps! – Anastice 2/4, 2019 at 14:59

@Dunkle - To query users whose age is > 30, you can either iterate through all users (linear performance), or you can create an explicit index on user ages (probably logarithmic performance). The latter option is a topic of its own, probably for a separate SO question. – Anastice 2/4, 2019 at 15:10

@Anastice use a Sorted Set with age as score? – Togoland 11/3, 2020 at 0:29

@Togoland - My assumption here is that you want fast lookup by user ID. If you need users sorted by age, then yes, a sorted set of users makes a lot of sense! – Anastice 18/3, 2020 at 17:45

Can you not use redis keys as the members of the set? So that you can still look things up by id. – Togoland 11/4, 2020 at 21:54

192

It depends on how you access the data:

Go for Option 1:

If you use most of the fields on most of your accesses.
If there is variance on possible keys

Go for Option 2:

If you use just single fields on most of your accesses.
If you always know which fields are available

P.S.: As a rule of the thumb, go for the option which requires fewer queries on most of your use cases.

Squint answered 4/5, 2013 at 14:23 Comment(2)

Option 1 is not a good idea if concurrent modification of the JSON payload is expected (a classic problem of non-atomic read-modify-write). – Pinsky 23/8, 2016 at 13:16

Which is more efficient among available options of storing json blob as an json string or as a byte array in Redis? – Noreen 11/7, 2018 at 11:55

Some additions to a given set of answers:

First of all if you going to use Redis hash efficiently you must know a keys count max number and values max size - otherwise if they break out hash-max-ziplist-value or hash-max-ziplist-entries Redis will convert it to practically usual key/value pairs under a hood. ( see hash-max-ziplist-value, hash-max-ziplist-entries ) And breaking under a hood from a hash options IS REALLY BAD, because each usual key/value pair inside Redis use +90 bytes per pair.

It means that if you start with option two and accidentally break out of max-hash-ziplist-value you will get +90 bytes per EACH ATTRIBUTE you have inside user model! ( actually not the +90 but +70 see console output below )

 # you need me-redis and awesome-print gems to run exact code
 redis = Redis.include(MeRedis).configure( hash_max_ziplist_value: 64, hash_max_ziplist_entries: 512 ).new 
  => #<Redis client v4.0.1 for redis://127.0.0.1:6379/0> 
 > redis.flushdb
  => "OK" 
 > ap redis.info(:memory)
    {
                "used_memory" => "529512",
          **"used_memory_human" => "517.10K"**,
            ....
    }
  => nil 
 # me_set( 't:i' ... ) same as hset( 't:i/512', i % 512 ... )    
 # txt is some english fictionary book around 56K length, 
 # so we just take some random 63-symbols string from it 
 > redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), 63] ) } }; :done
 => :done 
 > ap redis.info(:memory)
  {
               "used_memory" => "1251944",
         **"used_memory_human" => "1.19M"**, # ~ 72b per key/value
            .....
  }
  > redis.flushdb
  => "OK" 
  # setting **only one value** +1 byte per hash of 512 values equal to set them all +1 byte 
  > redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), i % 512 == 0 ? 65 : 63] ) } }; :done 
  > ap redis.info(:memory)
   {
               "used_memory" => "1876064",
         "used_memory_human" => "1.79M",   # ~ 134 bytes per pair  
          ....
   }
    redis.pipelined{ 10000.times{ |i| redis.set( "t:#{i}", txt[rand(50000), 65] ) } };
    ap redis.info(:memory)
    {
             "used_memory" => "2262312",
          "used_memory_human" => "2.16M", #~155 byte per pair i.e. +90 bytes    
           ....
    }

For TheHippo answer, comments on Option one are misleading:

hgetall/hmset/hmget to the rescue if you need all fields or multiple get/set operation.

For BMiner answer.

Third option is actually really fun, for dataset with max(id) < has-max-ziplist-value this solution has O(N) complexity, because, surprise, Reddis store small hashes as array-like container of length/key/value objects!

But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much

But you should not worry, you'll break hash-max-ziplist-entries very fast and there you go you are now actually at solution number 1.

Second option will most likely go to the fourth solution under a hood because as question states:

Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.

And as you already said: the fourth solution is the most expensive +70 byte per each attribute for sure.

My suggestion how to optimize such dataset:

You've got two options:

If you cannot guarantee max size of some user attributes then you go for first solution, and if memory matter is crucial then compress user json before storing in redis.
If you can force max size of all attributes. Then you can set hash-max-ziplist-entries/value and use hashes either as one hash per user representation OR as hash memory optimization from this topic of a Redis guide: https://redis.io/topics/memory-optimization and store user as json string. Either way you may also compress long user attributes.

Selfheal answered 28/5, 2018 at 13:21 Comment(0)

To store JSON in Redis you can use the Redis JSON module.

This gives you:

Full support for the JSON standard
A JSONPath syntax for selecting/updating elements inside documents
Documents stored as binary data in a tree structure, allowing fast access to sub-elements
Typed atomic operations for all JSON values types

https://redis.io/docs/stack/json/

https://developer.redis.com/howtos/redisjson/getting-started/

https://redis.com/blog/redisjson-public-preview-performance-benchmarking/

Imminence answered 15/7, 2022 at 20:21 Comment(1)

Great tip, but I wish someone maintained a non-Debian (i.e. secure) container with RedisJSON... but neither Alpine nor Ubuntu (stable) are available... – Inhibition 14/5, 2023 at 10:16

we had a similar issue in our production env , we have came up with an idea of gzipping the payload if it exceeds some threshold KB.

I have a repo only dedicated to this Redis client lib here

what is the basic idea is to detect the payload if the size is greater than some threshold and then gzip it and also base-64 it and then keep the compressed string as a normal string in the redis. on retrieval detect if the string is a valid base-64 string and if so decompress it.

the whole compressing and decompressing will be transparent plus you gain close to 50% network traffic

Compression Benchmark Results


BenchmarkDotNet=v0.12.1, OS=macOS 11.3 (20E232) [Darwin 20.4.0]
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.201
  [Host] : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT DEBUG

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
WithCompressionBenchmark	668.2 ms	13.34 ms	27.24 ms	-	-	-	4.88 MB
WithoutCompressionBenchmark	1,387.1 ms	26.92 ms	37.74 ms	-	-	-	2.39 MB

Frig answered 11/5, 2021 at 8:20 Comment(1)

I remember some exploit on Wiko smartphones where a similar thing exist for sms, if you send a sms looking like a base64 string to someone having a impacted phone, it would try to decode it and try to display the text, which would be garbage if the data was not "real" base64, and you could even crash the sms app by sending certain ascii control characters :D You should really use a prefix, or some other way, to tell is the data is compressed or not! – Abdella 15/9, 2023 at 11:43

You can use the json module: https://redis.io/docs/stack/json/ It is fully supported and allows you to use json as a data structure in redis. There is also Redis Object Mappers for some languages: https://redis.io/docs/stack/get-started/tutorials/

Sensate answered 21/12, 2022 at 16:10 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Overall Summary

Compression Benchmark Results

Recommended topics

Hot tags