Redis 10x more memory usage than data

Go To StackoverFlow.com

18

I have a small question.

I am trying to store a wordlist in redis. The performance is great.

My approach is of making a set called "words" and adding each new word via 'sadd'.

Here's the problem when adding a file thats 15.9mb and contains about a million words the redis-server process consumes 160mb of ram. How come I am using 10x the memory, is there any better way of approaching this problem?

Thanks in Advance

2012-04-04 03:33
by cwoebker


80

Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.

A Redis set is implemented as a hash table. This includes:

  • an array of pointers growing geometrically (powers of two)
  • a second array may be required when incremental rehashing is active
  • single-linked list cells representing the entries in the hash table (3 pointers, 24 bytes per entry)
  • Redis object wrappers (one per value) (16 bytes per entry)
  • actual data themselves (each of them prefixed by 8 bytes for size and capacity)

All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)

Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.

If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.

To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:

  • hash it modulo 10000 (done on client side)
  • HMSET words:[hashnum] [word] 1

Instead of storing:

words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }

you can store:

words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...

To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).

With this strategy, significant memory saving can be done provided the modulo of the hash is chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):

# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64

Beware: the name of these parameters have changed with Redis >= 2.6.

Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.

2012-04-04 09:15
by Didier Spezia
Fascinating and detailed answer; I didn't know that. Thanks @Didier - Ofer Zelig 2012-04-04 12:25
Alright thanks a lot i am pretty positive that this will solve my problems. And yeah for 160mb its fine but i am expecting to work with up to 1gb of plain word data and didn't want that to spike to 10gb. Thanks a lot again, appreciate the detailed answer - cwoebker 2012-04-04 15:35
@Didier - Great answer! Couple of corrections though a) Hashtable entries are a single linked list, not double, 24 bytes overhead is correct though b) Redis object wrapper does not apply to each set/hash entries. It only applies to the top level key/value pair - so that overhead is constant c) You may want to indicate that zipmap is deprecated in 2.6/unstable, and that ziplist does the equivalent thing - Sripathi Krishnan 2012-04-05 06:58
@SripathiKrishnan - thanks, I have updated my answer. I still think that robj usage applies to all set keys though. I refer to the setDictType structure in redis.c and the corresponding functions, which define this behavior - Didier Spezia 2012-04-05 07:41
@DidierSpezia - re. robj usage : yes, you are right. Dunno how I overlooked that wrapper - Sripathi Krishnan 2012-04-07 19:18
Very nice & detailed answer.. Thanks. + - Amol M Kulkarni 2014-12-04 10:38
One caveat/correction "Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists." ... Yet above that you give an example of hashing the words, the modulo them by 10000. It won't evenly generate 100 items per hash bucket. Basically, some hashes, under some keys, will easily have more than 100 entries in them, due to how random hash codes distribute. Best to set hash-max-zipmap-entries well above 100 - Zombies 2016-10-08 21:10
Instead of storing:

words => set{ hi, hello, greetings, howdy, bonjour, salut, ... } you can store:

words:H1 => map{ hi:1, greetings:1, bonjour:1, ... } words:H2 => map{ hello:1, howdy:1, salut:1, ... } Cloud you explain more detail how to you do that?. Thank you so much @DidierSpezi - Sơn Lâm 2018-04-20 20:36



5

As for my experiments, It is better to store your data inside a hash table/dictionary . the best ever case I reached after a lot of benchmarking is to store inside your hashtable data entries that are not exceeding 500 keys.

I tried standard string set/get, for 1 million keys/values, the size was 79 MB. It is very huge in case if you have big numbers like 100 millions which will use around 8 GB.

I tried hashes to store the same data, for the same million keys/values, the size was increasingly small 16 MB.

Have a try in case if anybody needs the benchmarking code, drop me a mail

2014-02-26 18:23
by msoliman
How did you perform those measurements? Thank - Jorge Lavín 2018-11-16 06:48


2

Did you try persisting the database (BGSAVE for example), shutting the server down and getting it back up? Due to fragmentation behavior, when it comes back up and populates its data from the saved RDB file, it might take less memory.

Also: What version of Redis to you work with? Have a look at this blog post - it says that fragmentation has partially solved as of version 2.4.

2012-04-04 08:22
by Ofer Zelig
Ads