Because everyone insists on insanely distributed architectures, most people will never really see the point of Redis, which is that if it is running on the same machine as the application, it can respond in much less than a millisecond. That lets you do stuff in the application that you just can't do with Postgres. Postgres kicks ass obviously but it is not running in memory on the same machine as the application.
If all you want to do is queues and whatnot, then sure, you don't need an in-memory KV store.
The point of an in-memory KV store is to do stuff that needs the performance characteristics of RAM. You obviously can't get the performance characteristics of RAM over a network connection. This is like, a tautology.
If the setup is that only one local process will use on-machine Redis as an in-memory cache, you're better off using the data structures available in your programming language.
Yes, that's assuming a single local process. Redis bills itself as an in-memory data structure store, and at least in my memory, it was mainly pitched for doing interprocess communication with real data structures.
Because it's really, really slow compared to in-language variables? I've not yet ran out of memory because of program variables, and speed shouldn't be an argument for low memory footprint when it's already at best 1/10 of it's performance.
Redis locally is good to do IPC though. You have many scenarios where multi local processes have to exchange data. You have alternatives but Redis works and can scale up.
Unless you need persistence across process respawning, ipc across lots of subprocesses, or want to trivially inspect and namespace your entire in memory data.
But yeah redis is not an answer for most single machine questions.
all the other reasons beside - because at some point I might want to move the data cache off of the single machine and then I can do that with a couple of minutes work, yes then I am back to do I need redis why not use postgres, but I got months of better performance out of redis than I would have with postgres, if I need to move to separate machine I am probably doing good enough I can afford it.
Redis helps you get to the point (need cache on separate machines) where you might want to consider moving to postgres, or not because you don't feel it warrants the investment of programmer hours at that point.
on edit: got downvoted for ..? I guess what I said was so supremely stupid I deserved it. I do admit it was perhaps slightly obvious, but not so obvious that it warranted downvoting
the comment was quite clear that you are building today what you need today, and when you need to have another solution tomorrow you don't need to rebuild everything but just move it and update some connections - but if you build the other solution - keeping cache in the application - when you need the other solution you will need to rebuild everything.
How could that very clear statement be misunderstood as to be well I am building a bunch of stuff I might need to tomorrow, it is the exact opposite of that!
I guess it's my fault, nobody on HN seems to understand a single thing I say even though the words seem reasonably clear to me.
If you need that, you can use an embedded data store like leveldb/rocksdb or sqlite. Why bring another application running as its own process into the equation?
There's no universe where a single threaded embedded persistence implementation is slower than a single threaded application synchronously talking to a single threaded database over the network stack.
As far as isolation goes, if you are worried about the properties of reading and writing data to the disk then I simply don't know what to tell you. Isolation from what?
Why network stack? On same host IPC over shared memory is a normal thing.
Perfomance-wise, I do not know of a nice portable way of flushing changes to disk securely that does not block (like, e.g. fsync does).
If you own the whole system and can tune whole kernel and userspace to run single application, sure, why overengineer.
Otherwise software faults (bugs, crash due to memory overcommit, oom killer etc.) take down single process, and that can be less disruptive than full stop/start.
> On same host IPC over shared memory is a normal thing.
Not for Redis.
> I do not know of a nice portable way of flushing changes to disk securely that does not block (like, e.g. fsync does).
If you use Redis, you're either not waiting for writes to be acknowledged or you're waiting on fsync. You always fsync no matter whether it's in process or not or you're risking losing data.
Which process blocks doesn't affect performance, it's getting the data there in the first place.
> Otherwise software faults (bugs, crash due to memory overcommit, oom killer etc.) take down single process, and that can be less disruptive than full stop/start.
Even worse: Redis crashes and now your application (which hasn't crashed) can't read or write data, perhaps in the middle of ongoing operations. You have a whole new class of failure modes.
Running in its own process, and, better yet, in its own cgroup (container) makes potential bugs in it, including RCEs, harder to exploit. It also makes it easier to limit resources it consumes, monitor its functioning, etc. Upgrading it does not require you to rebuild and redeploy your app, which may be important if a bug or performance regression occurs is triggered, and you need a quick upgrade or downgrade with (near) zero downtime.
Ideally every significant part should live in its own universe, only interacting with other parts via well-defined interfaces. Sadly, it's either more expensive (even as Unix processes, to say nothing of Windows), slower (Erlang / Elixir), or both (microservices).
Either fork the process so the forked copy can dump its data (I think Redis itself does something like this), or launch a new process (with updated code if desired), then migrate the data and open sockets to it through some combination of unix domain sockets and maybe mmap. Or if we had OS's that really used x86 segmentation capability as it was designed for (this is one thing the 386 and later did cleverly) it could all be done with segments.
Redis is nice but you take a huge speed hit depending on what you're doing, compared to using in-memory structures. Note that Redis can also persist to disk, either by forking or by writing an append-only op log that can be replayed for reloading or replication. Anyway, you'veve forgotten the case where you want not only the data structures, but also the open network connections to persist across process restarts. That's what passing file descriptors through unix domain sockets lets you do. It gets you the equivalent of Erlang's hot upgrades.
Do you mean fork is a blocking operation? That's surprising but hmm ok. I didn't realize that. I thought that the process's pages all became copy-on-write, so that page updates could trigger faults handled by in-memory operations. Maybe those could be considered blocking (I hadn't thought of it like that) or maybe you mean something different?
> but it is not running in memory on the same machine as the application.
What's the overhead of postgres vs redis, when run locally? Why do you think postgres isn't run locally?
There's nothing special about postgres. It's just a program that runs in another process, just like redis. For local connections, it uses fast pipes, to reduce latency, and you get access to some faster data bulk transfer methods. I've used it in this way on many occasions.
Postgres has a shared memory cache, which can be set the same as redis, so your operations will all happen with in memory, with some background stuff putting it onto disk for you in the case your computer shuts off. Storage won't be involved.
BUT, postgres still has ~6x the latency [1], even when run from memory.
Pretty pointless benchmark since it does not use prepared statements (that much is obvious from the low TPS, plus I confirmed it with how he ran the benchmark). You need to pass "-M prepared" to pgbench. And it is very possible that the Redis benchmark is equally flawed.
If you have to parse and plan queries every time PostgreSQL is obviously much slower than Redis. It is much more interesting to see what happens if prepared statements are used.
Thanks, I spotted the lack of prepared statements and stopped looking after that but you spotted the rest of the issues too. And, agreed, it is likely that the Redis benchmark is flawed too but I do not know Redis well enough.
But the whole point here is that PostgreSQL will be used for other tasks e.g. storing all of your business data. So it will be fighting for the shared cache as well as the disk. And of course storage will still be involved as again you can't have in-memory only tables.
And having a locking system fluctuate in latency between milliseconds and seconds would cause all sorts of headaches.
If you are both small enough that you’re considering cohosting the app and DB, the odds are good that your working set is small enough to comfortably fit into RAM on any decently-sized instance.
> And having a locking system fluctuate in latency between milliseconds and seconds would cause all sorts of headaches.
With the frequency that a locking system is likely to be used, it’s highly unlikely that those pages would ever get purged from the buffer pool.
As pointed out by blackenedgem above: PostgreSQL has tablespaces, and one may simply declare tables which should stay in RAM in a tablespace built upon a tmpfs with enough reserved RAM to store them all. There is a only small associated burden (restoring the tablespace while starting PG).
No but it does have the concept of tablespaces. If you want you can map RAM to a disk location, set that up as a tablespace, then tell postgres to use that tablespace for your given table. Also set the table as UNLOGGED while you're at it.
A bit more work yes that could be simplified, but fully supported if you control the stack.
Yes. Putting a Postgres tablespace on a RAM disk (tmpfs) does wonders. Even if NVMe may be comparable to RAM by bandwidth, whatever Postgres has to do to ensure that data are written to the durable store as a part of a commit is still significantly slower compared to RAM.
Highly recommended for running tests, especially in CI/CD pipelines. Doing this simple change can speed up DB-heavy tests 30-50%.
> PostgreSQL does not have the concept of in-memory tables.
Is that relevant though? Some benchmarks on the web show Postgres outperforming Redis in reads as well as low-volume writes (less than 1k key-value pairs), and Redis only beating Postgres for high volume key-value writes.
Django has caching built in with support for Redis, and it also has an in-memory caching option which they label as “not for production” (because if you have multiple instances of Django serving requests, their in-memory caches will diverge which is...bad).
But for lots of cases, especially internal business tools, we can scale up a single instance for a long time, and this in-memory caching makes things super fast.
There’s a library, django-cachalot [1], that handles cache invalidation automatically any time a write happens to a table. That’s a rather blunt way to handle cache invalidation, but it’s wonderful because it will give you a boost for free with virtually no effort on your part, and if your internal business app has infrequent updates it basically runs entirely in RAM, and falls back to regular database queries if the data isn’t in the cache.
And I think this is the main use case they were looking for. If you have a web app where each request is a separate process/call (not uncommon), and you don’t have a good shared global state, Redis is a great tool. It is an in-memory data structure store that can respond to requests from different processes. I always considered it an evolution from memcached.
If you only have one long lived process or good global variable control, then it is much less appealing in the single-server scenario. Similarly, if you require access from multiple hosts, it becomes a less obvious choice (especially if you already have a database in the loop). And redis is also overkill is you’re using it only as a cache.
As in performance improvement - cache should never be considered a datastore, e.g. you can pull the plug and nothing else happens (aside losing performance). It'd be a lot more beneficial all the processes to have a local cache, themselves. The latter is at least 4 orders of magnitude faster than redis. Now you may like some partitioning, too.
Perhaps you could elaborate? It would be helpful to understand what Redis can do that cannot be done easily with local memory.
Acting as shared memory for an inherently single-CPU language like JS is one I can think of. However, I don't use Redis, so you'd be better placed to drive the discussion forward with examples.
Redis provides low-level persistent data structures which can be used to implement business logic distributed safely across a number of machines. That’s a LOT harder than in-memory in-process.
My Sidekiq background job system runs entirely on top of Redis. Structures like Sorted Sets become the basis for indexes. Lists provide extremely fast queue behavior and Hashes map easily to persistent objects. Databases, traditionally, have not performed well when used as queues.
Those are the big 3 structures necessary to implement anything: trees, lists and maps.
I'm a bit confused here, because the original comment was that "most people will never really see the point of Redis, which is that if it is running on the same machine as the application, it can respond in much less than a millisecond", to which the response was "there's more to redis than just being a K/V store".
I do see the point of Redis if you have multiple hosts, but I was unsure why someone would use it on just one host.
Right, it's still useful as shared data for multiple processes on the same machine. A SQL database mostly forces your data into one structure: the table. Redis instead provides commands which operate directly on those different data structures I mentioned.
I have Docker running in swarm mode where it will spin up multiple load balanced instances of my web app (where requests can get routed randomly to any instance). So I use Redis to store "User Session Information" so the session can be read/written from any instance, supposedly faster than using a DB.
When I've used redis to store web sessions, it is in fact acting as a k/v store. The session ID is the key, and the (serialized) session state is the value.
Just to add -- sometimes I use Redis when I don't have a trusted impl of HyperLogLog or sorted set (and I have a vague suspicion that I am going to do IPC later -- so not worth it to wrap my own HLL).
Overengineering/premature distribution is a real problem, but Redis stands for "Remote Dictionary Server." The purpose is very much not to run it locally! (Though that's a legitimate design choice, especially if your language's native dictionary doesn't support range queries.)
It very well could be running on the same machine, and communicating with the app using unix sockets, which is a hell of a lot faster than TCP. But no one seems to be doing that much either.
I feel that the virtualize and distribute everything to hell and back-trend might actually be about to break, there are signs, and G knows it's about time. The amounts wasted on cloud providers for apps that would run everything just fine on a single server, the effort wasted configuring their offerings, surreal.
> Because everyone insists on insanely distributed architectures, most people will never really see the point of Redis, which is that if it is running on the same machine as the application, it can respond in much less than a millisecond.
I don't think this is a realistic scenario at all.
If you need a KV store, you want it to be the fastest by keeping it in-memory, you want it to run on each node, and you don't care how much it cost to scale vertically, then you do not run a separate process on the same node. You just keep a plain old dictionary data structure. You do not waste cycles deserializing and serializing queries.
You only adopt something like Redis when you need far more than that, namely you need to have more than one process access shared data. That's at the core of all Redis usecases.
I develop and maintain multiple applications that use a worker pool, and are small enough to run on a single host. We used pg for the user sessions, which get read and written on every single page request. Some of our apps are Internet-facing, and web crawlers can create sessions that get read and written (recording recent pages) as they browse the site. We switched to a redis service on the same host as the app and saw 3 main benefits: faster session loading and saving, less disk activity on the Pg server (so all other queries run faster) and less writes to the Pg WAL, so our backups require drastically less GB per day of retention.
After the significant success of the first conversion, we've been working to convert all the rest of our apps.
And no, host language data structures aren't useful because they aren't in shared memory between all the worker processes, and even if we found a module that implemented them in shared memory, we like to be able to preserve the sessions across a host restart, and then we'd need a process to save the data structures to disk and load them back, and by the time we did that we'd have just reinvented redis.
This is the best response so far. Session churn creates lots of db activity but lots of it is of low business value. Better to offload to a separate process.
Also session data is often Blobs which db's don't process as efficiently as columnar data.
Over the network became feasible when HDD got markedly slower than NICs. It’s a nearer thing with NVMe.
I want a “redis” with something akin to the Consul client - which is a sidecar that participates in the Raft cluster and keeps up to date, cheaper lookups for all of the processes running on the same box.
The few bit of data we needed to invalidate on infrequent writes went into consul, and the rest went into the dumbest (as in boring, not foolish) memcached cluster you can imagine.
But as you say there was the network overhead, and what would be lovely is a 2 tier KV store that cached recent reads locally and distributed cache invalidation over Raft. Consistent hashing for the systems of record, broadcast data on put or delete so the local cache stays coherent.
> [Redis] can respond in much less than a millisecond.
I have no idea how fast Redis can get, but it is entirely possible for an RDBMS to execute a query in well under a millisecond. I have instrumentation proving it. If everything is on the same machine, I would wager that IPC would ultimately be the bottleneck for both cases.
If all you want to do is queues and whatnot, then sure, you don't need an in-memory KV store.
The point of an in-memory KV store is to do stuff that needs the performance characteristics of RAM. You obviously can't get the performance characteristics of RAM over a network connection. This is like, a tautology.