Maybe if antirez doesn't want to read others simplistic responses on Twitter, he...

antirez · on Oct 29, 2014

Did you read the post? "the 99% percentile is bad" is something I never wrote, it was part of a larger sentence where it was obvious it meant if the 99th percentile figure is too big.

About pointing to relevant papers: things like pointing to timestamped replication paper, which is a CP system for replicated state machines, in response to the fact that Redis does not support slaves when the master disk persistence is turned off, is actually just another instance of why it is not possible to have decent tech conversations on Twitter.

The problem experienced at Stripe is a result of Redis replication documented behavior, regardless of using Sentinel or not. Redis replication is a very simple system where replicas will try to exactly mimic the master, and if the link breaks, will try to connect with it again and again forever. There is no builtin HA, nor failover or alike.

So before considering failover (and yet IMHO pointing to timestamped replication is not very informative even in this context, since in distributed systems the details matter, and you can't just give a random reference to a completely different system which happens to have just superficially similar issues to fix), there were different useful observations to do.

Like: Hey @antirez, what about supporting master-slaves setups where the master can be configured without persistence at all, and yet when it reboots, it will not be considered viable for reconnections? Which is the fundamental problem: stopping the basic Redis replication behavior from working as it works, with slaves that always want to replicate the current master, which is not ok if we want a system supporing the master restarted without persistence (which wipes the dataset on restart).

However the whole problem with that is that there are a lot of people like you that will regard linking to a paper as a great way to help, and as a very smart thing, while there are other that are instead trying to work to really make stuff better. Before commenting you should make the effort to understand exactly the problem domain and its subtleness: exact details or what you say is not relevant in a discussion which is all about details.

notacoward · on Oct 29, 2014

See my response to seiji. Exact mimicry might be a defined behavior, but where is it mandated who should mimic whom? Where had you said, before this, that the node with no data must be the one whose (empty) view should prevail?

As for your false dichotomy between reading the literature and getting stuff done, or your "make the effort to understand" ad hominem - go to hell. I've been doing this longer than you, I've been doing it better than you, and I've been writing about it as well. You make the effort to understand the problem before you shoot yourself in the foot yet again.

antirez · on Oct 29, 2014

"but where is it mandated who should mimic whom"

In Redis if you don't use any HA system like Sentinel, the map is fixed, it is an old-style replication system where there is the master IP address written in the configuration file.

Since the system is not supposed to lose the data on restarts, this is fine, but as soon as you want to support a different mode of operation with persistence-less masters, this must be modified, being Redis used with HA or not.

Now if you want to put Sentinel in the mix, the problem with this setup is not that the returning master gets promoted with a broken data set, but that if the restart is fast enough, the failure detection of Sentinel is not triggered at all, so the configuration remains the same. Just what is still, and was previous of the reboot, the current master of the system, restarted with a wiped data set.

In distributed systems this is the process not acting as specified, since Redis processes must reload their dataset on restart, otherwise they break everything, per design.

Now if it is a good idea to change this design: in the future yes, but so far we had not diskless replication, so for the replica to synchronize to write on disk was, anyway, needed, so why turn off persistence, and why to support it if the disk is needed anyway?

See? Arguments instead of random blabling and we can construct a reality or a model we both agree about. Then we can debate about what we think is right or not.

notacoward · on Oct 29, 2014

"it is an old-style replication system where there is the master IP address written in the configuration file."

"Old style" doesn't explain it. Even if the original master has unconditional priority when it returns, that does not preclude it gathering whatever data might still exist from the others. I've been working on replication systems since '92, and I can't recall seeing any that would make such a poor choice. Can you point to any, or is this really a "new style" idea?

"we can construct a reality or a model we both agree about."

I will concede that the behavior might be compliant with how the system was specified. I'm not 100% convinced yet, but at least - now - you've made a credible case for that.

"Then we can debate about what we think is right or not."

Not. The data-preserving mechanisms (e.g. view/epoch IDs) are so easy to implement in this case that leaving them out is unjustifiable. You even seem to be coming around to that view yourself when you say "in the future yes" but apparently you can't bring yourself to admit that it was always the right choice.

antirez · on Oct 29, 2014

> Not. The data-preserving mechanisms (e.g. view/epoch IDs) are so easy to implement in this case that leaving them out is unjustifiable. You even seem to be coming around to that view yourself when you say "in the future yes" but apparently you can't bring yourself to admit that it was always the right choice.

It's up to you to form your opinion, but here are the facts:

1) Before diskless replication: even a master with persistence turned down, had to persist on disk, in order to support slaves.

2) Because of "1", it looked like futile to support this model of operations.

3) "1" is no longer true, I merged the diskless replication stuff just this morning.

Still I don't think you can form a fully informed idea unless you consider this: if you have persistence turned on in a setup which uses replication, like in most deployments using replication, you absolutely want the old behavior of Redis, of slaves reconnecting and replicating again on master restarts.

So when I say, in the future this could change, it is just as an opt-in option in order to support this new use case, not to say, the old behavior was crazy.

seiji · on Oct 29, 2014

does not preclude it gathering whatever data might still exist from the others.

It absolutely does preclude that because it's not how the system works. You've described a multi-master system while comparing it against a replication-only system. Apples and racecars.

leaving them out is unjustifiable.

Feel free to submit a pull request fixing any and all deficiencies you've found. :)

notacoward · on Oct 29, 2014

"You've described a multi-master system"

No, I haven't, unless you'd say it was already a multi-master system because it allows non-masters to continue without the master being present. What I'm suggesting is just a master being smart enough to recover its own state from where it had been replicated before. How is that even controversial? In what possible use case is it preferable to discard readily available data without an explicit user request to do so?

"Feel free to submit a pull request fixing any and all deficiencies you've found."

Give me some reason to believe it won't be torpedoed by the next bad decision or failure of diligence that comes to light, and I might. Acknowledging that this needs to be fixed would help.

dangerlibrary · on Oct 29, 2014

This whole little teapot tempest may be a people problem, but many of the issues brought up in the article - removal of context, selective re-tweets, general nastiness - are Twitter problems.

annamarie · on Oct 29, 2014

I think they are people problems.

Everyone likes being right + feeling important. Everyone is lazy. No one wants to be called out on it.

If you are forced to keep it <140 char then no one can fault you for being lazy with your replies and critiques. Of course people are going to jump in and offer their brilliant opinions. Think about the low-cost high they can get.

dangerlibrary · on Oct 29, 2014

This is a bit of a false dichotomy, I think. Of course there are good (and valid!) psychological explanations for the behavior patterns exhibited on Twitter.

When I say something is a "Twitter problem," I mean that these patterns are less prevalent on other platforms. I mean that Twitter has created a scaffold for communication that encourages certain bad behaviors.

One example: If I were to pluck the middle sentence from your response and criticize you for saying that "Everyone is lazy" here on HN, I'd be down voted into oblivion because I am obviously being a jackass. Twitter's structure can make it very hard to see when someone is being misquoted or their views misrepresented. It's the Fox News Soundbite version of online discussions.

notacoward · on Oct 29, 2014

I don't deny that Twitter constrains conversation in a way that can be negative, but is that what's going on in this case? Consider: antirez's blog post represents exactly the kind of one-sided un-nuanced polemic that he blames on Twitter, even though he has infinite space in his chosen medium to do better. What more could he have done to prove that the medium isn't the problem?

The problem is desire for control of the message. People who want that sort of control should just issue press releases. People who try to use the Twitter megaphone to promote their ideas, their projects, or themselves have to understand that others are doing exactly the same thing and sometimes the messages will conflict. The community into which antirez dropped this particular comment is one full of people running their own data-storage projects, academics promoting their own ideas, and others with more abstract (but no less passionate) beliefs about things like data protection or 99th percentile latency. I'm part of that community, and I've certainly had to endure pot shots against me or my project because of my presence on Twitter. It's part of the territory - just as it is on sites like this, or has been since forever on Usenet and BBSes and all the way back to the first town square. Among a thousand competing voices, yours might not be heard or understood perfectly.

annamarie · on Oct 29, 2014

Yeah, I can see your point, to say that it's a "twitter problem" for leaving that door open.

seiji · on Oct 29, 2014

data loss due to a broken replica-repair strategy.

That's the thing: Redis has no read repair strategy. Redis has a "be an exact copy of your master" strategy. It's not a secret. It's the exact design.

The complaints are like yelling at Linus when you rm -rf / your entire machine. Sure, it sucks, but it's a repercussion if your own actions, not a fault in the system. If you don't want to rm -rf / your machine, go use an OS designed for babies (cough ubuntu cough).

(Plus, there are already Redis improvements (designed within a day or two of the original problem being reported) to provide workarounds to users who _do_ want to run that exact use case. The answer to problems is solutions—not complaining and blaming endlessly.)

"the 99% percentile is bad" in a more negative way than he intended?

Text. It's only text. You can't read the intonation and people want to read absolutes. People want to read anger. Always anger. Always confrontation. It's possible the author of the text didn't mean to insult your mother. Breathe. It'll be okay.

The entire goal of the Internet is to get ALL THE ATTENTION YOURSELF. If anybody hates you online, it's because you got attention and they didn't. Nobody hates insignificant people. So, often times people with lower profiles/attention in conversations will try to increase their attention profile by arguing/hating the high-attention people.

If people hate you, you've already won.

notacoward · on Oct 29, 2014

When you say "be an exact copy of your master" is the strategy, you're kind of missing the problem here - who should be the master? In this case, Redis made a really dumb choice of which node should be master, and brought everything in sync by replicating emptiness instead of replicating data. There are very few use cases where that would be preferable. People have pointed out well known and fairly simple solutions to the selection problem, which are leading to the fixes and workarounds you've mentioned (though without the courtesy of acknowledging where the ideas came from).

Your analogy to "rm -rf /" is invalid, because that's well defined, well documented, and well known behavior. That's not true of Redis's autonomous and non-deterministic response to a failure (not to a user action). No spec or doc precluded choosing a different master and preserving data instead of discarding it. In the absence of such explicit guidance, preserving data should always be the default. How can it be user error when the user did nothing? Redis did the wrong thing because something was missed in its implementation, not because of any rational or deliberate choice.

seiji · on Oct 29, 2014

you're kind of missing the problem here - who should be the master?

The choice of master is a static configuration set by the user.

Redis itself has no failover or promotion ability. There's an additional thing called Sentinel that can failover and promote individual Redis instances, but it is designed to recover complete instance failures (without immediately restarting), so a quick restart means no failover happens [an improvement to the "quick restart" scenario is showing up soon].

Redis made a really dumb choice of which node should be master,

(see previously; master is static, defined by the user)

pointed out well known and fairly simple solutions to the selection problem

(redis doesn't select things)

Also, this issue showed up last week. Last week. People are making it sound like this issue has been ignored for years. Nobody ran into this (and reported it) until recently. This use case is already being adapted into SOP Redis capabilities soon.

Try running into a big problem with any other DB and getting both attention and a concrete fix within two weeks. For free. The entire progress of the project has paused to address these immediate user issues.

because that's well defined, well documented, and well known behavior.

The Redis behavior is: always be a copy of a statically configured master. When the master has an empty dataset, all the replicas replicate an empty dataset. Pretty simple. :)

No spec or doc precluded choosing a different master and preserving data instead of discarding it.

Yup, specs and documentation did exactly that. Redis has no failover capability on its own.

preserving data should always be the default.

Ooops, you just re-invented the Mac trashcan.

How can it be user error when the user did nothing?

The user disabled persistence, enabled replication, restarted the process with zero data, then the replication recovered and stayed in sync with the newly zero-data master.

something was missed in its implementation, not because of any rational or deliberate choice.

nopers. more a lack of thinking it through from the user's point of view. an exact copy of nothing ends up being nothing.

notacoward · on Oct 29, 2014

"People are making it sound like this issue has been ignored for years."

Hasn't it been? How does leaving that latent in the system for years make things better? I rather think it reflects on an inability to reason about failure modes (including user failure modes), and deal with them proactively instead of after data was lost.

seiji · on Oct 29, 2014

Arguing about developers not being omnipotent isn't very stable.

The users intentionally configured their options and the system responded exactly as it should have, given what it was asked to do.

notacoward · on Oct 29, 2014

This isn't about about omniscience (not omnipotence BTW). This is about a far lower standard of basic diligence, expected and met by most people who work on data-storage systems. If you're given some data to store, and there's an obvious way to retain/recover that data despite and intervening failure, then failing to do that is a betrayal of the most basic trust people put in data-storage systems. Congratulations, you've implemented the distributed-system equivalent of linking fsck to mkfs. Well done. Go pat yourself on the back for conforming to your specification.

felixgallo · on Oct 29, 2014

I don't think you're understanding redis or this problem correctly.

Redis lets you have slaves which mirror the master. Hundreds of thousands of redis installations use this pattern to provide read scaling and offline master-loss persistence, and in the normal case, this works great. I myself have implemented systems with hundreds of redis instances which have gracefully survived the loss of the primary.

In this particular instance, the user turned off persistence, didn't understand the ramifications, and then brought the master back up with an empty database after a hard kill without thinking things through.

Fortunately, the user was savvy enough to have kept backups off the slaves, as is the usual pattern, and so was able to continue service.

This is not a normal pattern and goes against the general practice.

Does that help?

notacoward · on Oct 29, 2014

I understand what you're saying, but I don't think it's a sufficient reason to throw away data. I've seen hundreds of cases where a GlusterFS user went against our advice and did something that ended up making things worse. Sometimes they even lost data. Of course, they always blame us. I'm pretty sure people who have worked on every single data-storage system ever have had similar experiences. Sometimes the user is just wrong and it's their own fault. Sometimes they're right because we made it far easier for them to make things worse than to make things better. In those cases we have to stop making excuses like "user error" or "RTFM" or "against general practice" or whatever. We need to help the user by not handing them bags of explosives. Which do you think is a better choice here?

* Default to preserving already-replicated data, provide "clean start" as an option.

* Default to throwing away data, maybe-someday implement an option to use data that's already present in the system.

Blaming the user won't prevent another user from making the same mistake with the same result. Saner defaults, and an implementation to support them, will. Who's going to complain that you saved too much of their data?

felixgallo · on Oct 29, 2014

The defaults are sane, and in fact the user here had to explicitly turn them off in order to do the thing they wanted to do. Once you reach into a configuration file and change a setting, I can't think of a software system in the world that protects you from your choice. Could you maybe name a few?

notacoward · on Oct 29, 2014

The user turned off persistence. There's no reason for a normal person to suppose that also means ignoring data that's in the system when the master comes up. The fact that the two are inextricably tied to one another in the Redis implementation is not the user's mistake.

felixgallo · on Oct 29, 2014

What do you think a slave should do if it is told to replace its state with empty state? How about half-empty state? There's really no answer that's satisfying for every possible use case (certainly I don't want my slaves to refuse if I tell them to clear the database completely on purpose). And indeed you haven't given any examples of databases that try to do 'better'. I think that's because there aren't any.