Hacker News new | past | comments | ask | show | jobs | submit login
Move Fast and Break Nothing (zachholman.com)
459 points by bpierre on Oct 8, 2014 | hide | past | favorite | 83 comments



The problem with the new motto is that it is uncontroversial. As Zuck himself once mentioned at Startup School, corporate mottos are valuable only if they are controversial, that is, "be honest" is not a valuable motto, because it is obvious and uncontroversial, whereas "Move Fast and Break Things" is not so: it says something about prioritization that not everyone agrees with, and thereby communicates where the company stands on this issue. Everyone wants to move fast as long as nothing gets broken. I agree that it is possible to leverage technology to move faster while keeping things less broken, but it does not change the fundamental dynamic that at some point, you will have to trade _some_ stability for speed, and the new motto does not take a stance on which one it prefers at all.


Move Fast (and Break Things) is the unwritten motto of every mediocre dev shop out there (breaking things being an unintended side effect of moving fast). It's not controversial.


> ...is the unwritten motto...

Exactly, it's unwritten because no dev shop would explicitly market themselves as being willing to break things in the name of speed. It's an unfortunate side effect that they don't want to call attention to, whereas Facebook wears it as a badge of honor.


Facebook, where the users are the QA department


This is the most important point here. Facebook can afford to do this because their users don't care. No one really needs Facebook or depends on it to work at critical times. Most products are not like that. Imagine what happened if Gmail started doing move fast and break things. Oops some of your email vanished.


On the contrary, I have to believe they do care. Many here might not remember or know, but early on, Facebook was very unstable due to the sheer server load. Performance is still an important metric for them. It is why they created HHVM, Hack, and React.js.

How often do you hear of Facebook going down or breaking completely?


I agree, remember Twitter? It started getting popular quite a while ago, but servers couldn't handle the load which lead to a lot of down time.

Eventually, users just left twitter and they shut down the site a year later.


You say it like it's a bad thing.


Sure, break things when you're trying out a lot of new features. I get that, and for the user-facing side of Facebook, it makes sense. It doesn't make sense (and is extremely frustrating) when you apply the same principle to your third-party APIs.


Its also the motto of many good companys. At least in my mind, move fast and break things, means that you iterate quickly and move into testing fast, to find the errors there.

This works of course better with exerimental APIs compared to things you have to put on CD and ship.


This is the problem with such mottos. "Move fast and break things" means exactly that, i.e. move fast and break things. It's just a catchy soundbite, nothing else. It's up to every one of us to fill in the missing meaning; therefore everyone arrives at interpretations he/she likes, which might not be the same interpretation as anyone elses.


And in the "ships CDs" camp, we have Adobe and Microsoft, long held as bastions of robust, stable, well-tested software.


There are other camps too. Like software consulting firms building software for large enterprises like banks, insurance companies etc. The consequences of breaking things are all written down in very exact legal terms. It is not at all fun when a regression bug shows up.


Perhaps, but when it's applied companywide, it's a bad idea.


It is perhaps controversial for people at the top not for a dev.


The irony is that moving fast is itself a side effect of breaking things. I've noticed that if you prioritize speed at the expense of quality, you get neither, but when you prioritize quality you tend to get both.


As is Move Slow (and Break Nothing)


Well, given that "move fast and break nothing" does have that catchy je ne sais quois (at least to me) that makes a motto resonate, I'm assuming it is controversial in some way. Maybe not in whether it's a good idea but in whether it's even possible. I think it is a pretty controversial/shocking claim to say that you can move with startup efficiency without sacrificing the reliability that slow-moving large corporations with pointy-haired bosses seek. If the industry readily accepted that claim, the industry would look a lot different from how it does now.


The new motto just sounds like "Pay My Rent and Keep My Money"; sure, I'd like both of those things and it's certainly going to cause a stir that I'm claiming I'm going to do both, but it's not like it's actually a thing that can be done, and the attention people pay to the claim is largely confusion about what I even mean when I say that.

By contrast, "Pay My Rent and Empty the Account" actually tells you something about the direction I'm going in, and what I prioritize when picking my actions.


These things are what can be made of them. "Just woks" isn't controversial either. But, in a context that you demonstrate (1) this thing is possible (2) we can do it, here are examples (3) It's a priority to do this thing other people will just pay lip service to.

Controversial or not, mottos become cliche, doublethink lip service unless they don't become those things.

All the business school bullshit: missions statements, code of ethics, motto, synergies, teamwork, a culture of innovation, etc etc. All that stuff is actually real, it exists and it can be very helpful. It's also uncontroversial that it sounds like bullshit because the people saying it most often and loudest are bullshitters. It's like listening to a priest who read 'Become a Spiritual Guru in 30 Days or Less' and went at it.


It is controversial, because half a decade of software development has proven it to be an oxymoron.

If the solution for correctness is 100% automated test coverage, then your forward progress will have significant drag (every new feature should break many tests).


Any discussion w.r.t to automated test should not start with an extreme example of 100% coverage because everybody else already know that's not healthy.


100% code coverage still doesn't give you correctness.


Breaking things is time expensive too, plus it erodes trust.

Not breaking things can be time expensive if you do it late, but it doesn't have to if you do it early. If making sure you don't break something is time expensive then you're probably not building on a solid foundation and solid processes.

Be foundation first. Incurring the time expensive to build a foundation (process, infrastructure, software) that will allow you to move fast with stability IS controversial; because it's not the "hack" approach.


So I think your take on it is Break Nothing and Move Slow, that's how you would make that choice given what mehrdada said.


"Slow is smooth, smooth is fast"


Maybe the controversy here is the notion that it is possible to move fast without breaking anything.


So "don't be evil" is controversial?


I thoroughly enjoyed the visual design and layout of this site.

That he took the deck and turned it into vibrant and unique feeling content made it feel more special than a normal blog post.

Not sure where I'm going with this but I liked the balance of medium.com-like-focus + the design and style content of a NYT special feature.

I'm assessing my reaction from a web analysts perspective, just interesting how much more engaged I was w/this from the first second than normal.


It felt similar to _why's stuff Like this 'Nobody Knows Shoes' book https://cloud.github.com/downloads/shoes/shoes/nks.pdf


I'm surprised that few seem to understand that Zuckerberg chose a motto with a deeper meaning than 'breaking products and code'. He meant break the systems, calcified behaviors, and system-think that ends up killing a company over the long term.

Move fast, and don't go in predictable directions based on existing corporate momentum. The only way to disrupt yourself is to 'break things' i.e. architectures, regular income, etc that may be holding you back from a better opportunity.


This is some deeply Talmudic reading of a fairly straight-forward description of engineering deploy cycles. But why stop after just putting those words in his mouth, when he could be retroactively understood to mean so much more!

Move fast! Run to all meetings! Take typing lessons so you move faster while coding! Break things! Break your laptop to get a faster one from the company! Move your laptop fast at the hard surface to break it faster and while moving faster!

Some may say, if you keep running around with your laptop, you might actually break it. I call that synergy.


I agree. From the article:

> I can take a pretty obvious guess as of the external manifestations of their new motto: it means they break fewer APIs on their platform.

I found this absurd. The idea that "move fast and break things" referred to a reduced level of API compatibility seems much too narrow. Yes, it's understood as a license to break parts of the site in the interest of advancing the product.

Facebook did much more than that: it broke/disrupted/redefined our social fabric, the way we understand each other. Along the way it broke promises, expectations, trust, fears, introversion, and boundaries. And it did this very, very quickly--ten years of Facebook and we are all different.

Imagine if the Manhattan Project used the same slogan. It's good for some, bad for others, but developed quickly, taken to market, and damn the consequences we're going to figure out what this thing can do.


Tangential, but does anyone know what Github uses to match their css and html classes in the integration suite? I often find that I don't clean up my css as the free as I should because I'm scared of breaking another page by mistake, so something that tells me "you're really not using this anywhere anymore" would be great.


I don't know what Github uses, but where I work, we do screenshot testing. We have PhantomCSS[0], which is a helper on top of CasperJs that visits pages, takes screenshots and compares them. There are a lot of other tools, Huxley and Selenium come to my mind, that do the same stuff. Even Facebook[1] tests this way.

[0]: https://github.com/Huddle/PhantomCSS [1]: https://www.youtube.com/watch?v=VkTCL6Nqm6Y


This post just touches on so many of my personal "yes dammit yes" buttons I am purring like the kitten in the gif (go have a look)

My new boss just gave a piece of advice that marries up very well with this - clients want Stability, Performance and Features - in that order. Consumers want features Then Stability and performance if they like it.

So yes - hurry up and put the talk online please Zach - Itching to listen.


Woah, I love your boss' advice! It is necessary to know your audience before you can prioritize.

Also, I don't think consumers want features in the sense of "more", but in the sense of "key". Maybe it is Key Features, Stability, Performance, More Features, in that order.


In my experience clients and consumers really just care about Feature Lists; power users maybe lean more towards Key Features. Performance and Stability are only a concern when they've already become a problem.


Really? I don't think consumers have shown at all that they care about (or are even aware of) feature lists. Even facebook is shifting toward a strategy of cobbling together single-feature apps instead of building one big feature-heavy platform.


As always, I enjoyed reading Zach's thoughts.

Spotted one tiny error:

http://zachholman.com/images/talks/break-nothing/anpp.jpg

annp should be anpp in this image.


Actually caught this last night and fixed it literally everywhere but the important place: this image. :) Fixed! Thanks for pointing it out.


... Two code paths (old&new) - I've been doing this to some extent without trying to get any insight, or make it a practice (of my own first), but often the old code just stays before deploying the newer (day or two), or before submission, and then it's killed (or #ifdef-out).

I liked the "empathy" bit though. Recently I've did the UCLA Extension TMP courses, and one of the lecturers - Jorge Cherbosque, PhD talked exactly about this and much more - here is the lecture (highly recommended)

https://www.uclaextension.edu/tmp/Pages/79th/D1.aspx

Is there a video behind the talk?


Reading the title, I thought it would be a post about TDD. I don't do TDD, I wish I had the time and resources and I am moving toward adding a more test-driven approach. Adding unit tests to my key libs allows me to move fast without breaking anything (for the key libs, our test suite is stupendously comprehensive).

Of course, during new feature dev this luxury isn't always available. The next best thing is to do a phased rollout where our users become our beta testers. This works really well because users can switch to the non-beta version through the menu bar very easily.

There are other strategies for moving fast without breaking stuff, but these two provide easy, big wins.


> What do you consider good feedback? How can you promote understanding and positive approaches in your criticism of the code? How can you help the submitter learn and grow from this scenario? Unfortunately these questions don't get asked enough, which creates a self-perpetuating cycle of cynics and aggressive discussion.

I'd be really interested to hear how teams codify this. To the extent that it's possible, I think feedback should be rooted in objective measures, e.g. styleguide/lint violations, test failures, etc. All too often, though, there are subjective things that come up: organization of code, interface design, using framework X instead of framework Y. These are the criticisms that most often lead to aggressive back and forths.


But those are the most important and interesting discussions to have, so I’m not convinced we should shy away from that sort of feedback.

That said, they’re also the sort of discussions which should often happen before code is written. Decisions can be changed much more quickly at that point, and people take it much less personally when you talk about how the design of something can be improved when they haven’t spent the time to implement it. :-)


Twilio follows a similar model for large scale changes, but at a distributed service level using a shadow proxy (https://github.com/twilio/shadow) rather than having parallel code paths.


What interests me in companies similar to GitHub is how the organisational politics reflects the code (or coding process)

I assume if I knew more about fluid dynamics one could make some decent correlations between blockages, eddies, and viscosity and middle management sign offs, manual testing and IM channels.

I guess I am looking for justification that a flat collegiate structure produces higher quality long term code.And some idea how to persuade people to adopt such a format - open, free flowing discussion about code and a data driven "prove it not say it with authority" seems to conflict with hierarchies.

Or maybe I am just fed up with organisational politics. :-)

If a githuber is on could they comment on how design decisions and business decisions are taken ?


"Move fast with stablity"

That's why small companies have a chance.

The big ones get slow and focus on different things. They have to change their maxims and their customers don't necessarily like that.


Really distasteful slogan. I think every engineer who has worked in corp. America is used to management asking for mutually exclusive things due to ignorance.


Comparing the results of requests processed against both the new and old system is a pattern. http://www.alwaysagileconsulting.com/application-pattern-ver...

It was also done in mainframes at the processor instruction level.


I think of "move fast and break things" in a different context that most people(maybe). It doesn't say move fast and SHIP broken things.

To me it speaks to the fact that you shouldnt be afraid to completely change something if its to improve the product. Eliminate technical debt. Dont code around a workaround that doesnt always work.


For me the best line was "You can have the best, most comprehensive test suite in the world, but tests are still different from production.". Getting the business logic working is often the easy part of distributed apps in production environments with real users and unexpected data.


While it is true that most tests are different from production, having a solid test suite can go a long way in moving with startup speed but breaking stuff as little as possible. We (as in me and a couple of others from our startup) have written a howto on using Docker and Ansible to build your test suite a few months ago. Strangely enough, it has almost the same title:

http://blog.mist.io/post/82383668190/move-fast-and-dont-brea...


Personally, I think this is the most important piece of the article:

"...I think the best way to get things done in a company isn't to bash it over your employee's heads every few hours, but to instead build an environment that helps foster those effects."



I agree that breaking a production user's experience is as verboten as it gets for agile software development. That's why we have massive test suites and CI infrastructure.

However, you do want to avoid "CI handcuffs", where either because CI is too brittle or because you're simply integrating too many independent projects which, in order to push a change requiring coordinating changes elsewhere, requires massively outsized developer effort in order to push things through with zero CI breakage. This is more of a problem in CI systems with binary metrics like PASS/FAIL where every change that doesn't PASS is rejected.

It's too easy to "wedge" a system like this, where project X's change can't go through without project Y being able to handle the change, and you end up introducing multiple code paths in both projects on a solely temporary basis just in order to keep CI green and happy. Rather than having a zero-tolerance CI failure policy, developers should be allowed to break CI temporarily, so long as they fix it in a timely manner (within an hour or two). Per-developer breakage metrics, to the extent they are needed, should not be in terms of breakage counts but instead breakage durations.

That is, outside of production, it's fine to break stuff and quickly fix it, so long as you don't leave it broken. The big problems are where domain-siloed developers break a zero-tolerance policy because it was necessary to relax it "temporarily", and things stay broken because the policy stays relaxed and cannot be reinstated without sirens going off. Then restoring the CI policy is blocked for everyone by the one guy who knows the AIX quirks for debootstrap or whatever.

Instead, breaking changes should be "allowed" where there is a window to fix the error and still move forward. Only when the window closes without a fix should the breaking change be rolled back (automatically). This line of thinking lends itself to formal, automated policy, but this depends first on judgment and cultural approval.

As it happens, the best CI system I'm aware of for truly distributed, multiple project integration is OpenStack's Zuul: https://github.com/openstack-infra/zuul I'm not sure if it accommodates my prescription above, but if not, they probably have a better idea.


I, uh, don't understand. No sane environment commits straight to prod. You stage to an integration environment. The integration environment can break, no problems. But you only migrate staged dev from the integration environment to prod when all the test pass. Is that not how everyone does things???


Github does it differently. They have "prod" stuff that everyone sees, and "candidate" things which are ALSO deployed from the same codebase (with different enabling flags), and then can do long-term tests (and smaller pull requests) to verify that the New Way (candidate) behaves identically to the Old Way.

Slide 35 [0] of this presentation actually starts the discussion of this exact thing, though Zach has talked about it before. Later, he shows a chart showing the differences as the code in the parallel branch changed over about five hours. (Wow, that's some fast iteration.)

0: https://speakerdeck.com/holman/move-fast-and-break-nothing?s...


Yes, what you describe is not at all at odds with what I described. If it wasn't clear, I was describing only developer interaction with integration testing per a CI system, with nothing to say about deploying to production other than the token "don't break users" in the first sentence.


I know at least one person working somewhere that has committing to TRUNK allegedly going straight to prod. Thankfully I've managed to avoid such mad places (more by luck than judgement, mind.)


If both projects share a source control system, you can commit a change and update its downstream usages all at once, with no breakage even temporarily. However, this requires source control that scales company-wide. (Git doesn't qualify.)


Not sure why you're bashing on Git here, it certainly does scale to and handle this use case pretty effectively.

This is precisely the problem that submodules solve, much maligned though they are. It's essentially a dependency problem, and we have good answers for those.

Edit: Also I am fairly certain that the kernal experiences these kinds of issues, and is the flagship user of Git.


The Linux kernel is tiny compared to a large company's need for source control. It easily fits on one machine. Think about what you'd need to put all the projects in an entire Linux distro in one source control system; that's more like the scale I'm talking about.

Linus built a tool that suits his needs well but he's not working at the same scale.


Yeah, git-repo-per-project was an unspoken assumption. Also, "upstream and downstream" may not suffice analytically, as many interdependencies in this model do not satisfy DAG, even if they ~should~. Agile projects aren't often built so much as grown.


I think the problem you're assuming here is that dependencies are automatically updated to use the latest version. That sounds like pretty strong coupling in a way that means other problems are lurking somewhere.

Team A should be able to develop without having to consult Team B constantly. That means you have to be mature about deprecating things before you just remove them, but I think that's what the whole article is about.


Yes, there are all the usual ways of deprecating things, migrating the callers, and then finally removing the deprecated methods. This is rather tedious and people tend not to do it for small cleanups like renames. If you can do it in one commit, you can work faster and hopefully get cleaner code.

Of course even if you have atomic source control commits, you don't have atomic deployment so there's still migration to take care of, but it works fine for in-process API's.


Of course, Facebook is supposedly one 50GB repo.


You don't need to break things with good regression testing.


Nicely done


Sorry to be That Guy but this is one of the reasons I'm really getting into Haskell, the more I study it. Clojure's still stronger for exploratory work, but Haskell's the only language where I feel comfortable that I'm breaking nothing when I "move fast" (at least, on code I haven't touched in a while).

Ok, I'm totally That Guy so I'm just going to shut up.


I think you must have missed the point somewhere. Test coverage and metrics are important not just to catch "programming" errors but also logical and semantic errors. Haskell only partially helps with those other categories.


Absolutely true.

I think that if you know how to use it, Haskell's type system can reduce your bug count substantially. It makes a difference, spending 30% of your time debugging as opposed to 70%. (These numbers are approximate and vary depending on the problem you're working on.) You still need to test, though. That is true. QuickCheck does a lot for that.


How do you evaluate Clojure's core.typed and Prismatic Schema? (If you've had a chance.)


I haven't. I tried core.typed a year ago and it was still not very mature, but I'm glad it exists and I'm sure that it has advanced a long way. It's an ambitious project and I'm really glad it funded (I donated).


Quite a bit OT, but I've never seen the transform-rotate property used as a styling in text...probably something that can/will be used to death, but it looks great here...gives the piece a very magazine feel (that, plus the handplaced-layout of floating embed images)


I was about to say the exact opposite. Every other slide had text that was Just a Little Bit Crooked, and it is driving me __batty__. If the angle were larger, and it weren't used on every other slide, I might think it was cooler (and it IS neat that you can do that with transformations!). As it is, it's really distracting from the message.

edit 2: Now that I've read farther, it bothers me less. :-) If I were part of the live audience, I would likey not have noticed. Please don't let the text presentation distract you from the excellent message. :)


Unfortunately, it looks awful in Firefox.


What version? I'm on 32 on Ubuntu and it looks clean and crisp.


All this crap is easier said than done.


Does the article get any better after the cat video and the mention of Cheetos™? I bailed at that point.


It doesn't. The author goes on a long, ambiguous tirade about Skittles™ and how they aren't as aerodynamic as once initially thought.

You did good by refusing to read the rest of the post and informing everyone of that fact on Hacker News. Cheers!


Fantastic response


Wow, no fun not ever.

To author: thanks for the reference to Coda Hale's Metrics talk.

https://www.youtube.com/watch?v=czes-oa0yik




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: