Hacker News new | past | comments | ask | show | jobs | submit login

Memory fragmentation is largely related to the allocator you're using (ie: glibc malloc, jemalloc, tcmalloc) and previously it was up to the OS to manage this (ie: freeing up unused memory).

Now with active memory defragmentation things are a bit more pleasant, specifically with high delete load actually freeing up unused memory in a timely manner without impacting performance too much.

Previously, to fully recover unused memory, you would have to restart the Redis server. Obviously this is not feasible but when Redis is using >50% more memory on a 120GB machine than it should then you will have to consider this an occasional housekeeping option -- now, as mentioned, this ridiculous task is no longer necessary.




>Now with active memory defragmentation things are a bit more pleasant, specifically with high delete load actually freeing up unused memory in a timely manner without impacting performance too much.

Is Redis managing this itself using simple calls to Jemalloc, or is Jemalloc doing it on its own because it has better algorithms than the OS?


With regards to better algorithms than the OS -- jemalloc is not quite there yet.

Here is an illuminating discussion between Redis and Jemalloc devs regarding this:

https://github.com/jemalloc/jemalloc/issues/566

Redis will perform its own housekeeping, hence the usage of the term 'active.' AFAIK, rather than metadata being stored for this, there is a periodic active scan and manual measurement.

See more here:

https://github.com/antirez/redis/pull/3720


120GB on a single thread, yeah. Redis has been abused for quite some time but 120GB is way out of reasonable reach for current CPU arch if you use a single core only (even if you switch off hyper threading)


Call me a slowpoke, but I wasn't aware that Redis was single core!


It makes it great for storing distributed locks, semaphores, and thanks to Lua scripting it can store/update cache invalidation lists - eg https://github.com/Suor/django-cacheops


just randomly curious here - why? hash time gets absurd?


Presumably, time complexity for most operations correspond directly with CPU usage.

See here for an example with RPUSH: https://redis.io/commands/rpush

RPUSH is O(1) which the best you can possibly get.

ZADD on the other hand is O(log(n)), which is quite good but at a large scale it becomes easier to run into performance limitations, especially on high workloads. Performing O(n) operations on large data sets is out of the question unless you're comfortable with Redis being unavailable for multiple seconds/minutes.

That said, you will have to limit yourself to operations that are computationally simple (ie: O(1) or O(log(log(n))) or pre-shard across multiple instances (or run a Redis cluster).


Redis on MOSIX might be an interesting experiment.


That's only part of the "os" if you mean part of the installed libc, as opposed to part of the kernel. Managing heap is done in userspace.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: