"With Storm, I distilled the realtime computation problem domain into a small set of abstractions: streams, spouts, bolts, and topologies. I devised a new algorithm for guaranteeing data processing that eliminated the need for intermediate message brokers, the part of our system that caused the most complexity and suffering."
As someone who has gone through the storm source in very fine detail, let me tell you how he did this. He hashed each tuple that needed to be processed then XOR'd it into a variable that started at a value of zero.
When the piece that needed to be processed was complete it would get XOR'd back into the variable. Once the variable hit 0 he knew everything was done! Pretty neat if you ask me.
Not quite. Every edge in the dependency tree is assigned a random 64-bit id, and when a tuple is acked it sends the xor of all the incoming and outgoing edges to the acker.
Random 64 bit ids are used in the process. So the probability of accidentally completing a tuple is very, very small (1 / 2^64 for every ack).
Counters don't work because of the asynchronous nature of Storm. For example, consider a topology that looks like this:
A -> B -> C
\-> D
Let's say A emits 2 tuples (+2 differential), B processed those and emits 2 to C and 3 to D (+3 differential), C processes 2 tuples (-2 differential), and D processes 3 tuples (-3 differential).
Everything's asynchronous, so the acker could receive the acks in this order: A, C, B, D
The counter would then look like this: 2, 0, 3, 0
So it would think that the tuple was complete before it actually was, which means the counter algorithm doesn't work.
"So the probability of accidentally completing a tuple is very, very small (1 / 2^64 for every ack)."
Which means, due to the birthday paradox, that you should expect your first collision around 2^32 tuples. Process a few million tuples a day (that's only about 25 per second) and you should expect your first error within your first year.
The birthday paradox doesn't apply. All that matters it the value at the time the ack is applied, so it's always 1/2^64 (because the xor of any number of random numbers is still random).
No, the birthday paradox very much applies. The chance of success for your first ack is (2^64-1)/2^64. The chance of success for your first and second ack is the chance of success for your first ack times the chance of success for your second ack, ((2^64-1)/2^64)^2. And so on for your third ack, and your ack.
By the time you reach your 2^32 ack, your chance of having all successes is ((2^64-1)/(2^64))^2^32, which is < 0.5.
EDIT: My test program seems to indicate that I'm wrong, but I can't see the flaw either in it or in my reasoning. Can you explain why the birthday paradox doesn't apply here?
EDIT': In the birthday paradox, it would be ((2^64-1)/2^64)) * ((2^64-2)/2^64) * ((2^64-3)/2^64), etc.
The birthday paradox concerns the chance of 2 random elements in an entire set being the same. There's no set here, so there's no birthday paradox. http://en.wikipedia.org/wiki/Birthday_problem
So the chance of success after 2^32 acks is >0.999999999, not <0.5.
It takes about 2^60 acks before there's a significant chance of a mistake. That's a lot of acks, so it will take an insanely long time for a mistake to be made.
The birthday paradox kicks in when you have a set of objects, and a collision between any pair is interesting. Here, only a collision with 0 is interesting.
Suppose that the sequence of values you have after each ack is A, B, C, D, E (five acks total). So long as A, B, C, and D are all nonzero, we're OK. With the birthday paradox, we'd be looking at A=B, A=C, A=D, A=E, B=C, B=D, etc. -- many more combinations.
The biggest problem I have with this approach (and believe me, I love this approach), is that it makes it hard to finish things.
For example, over Christmas, I built a small pretend-natural-language CLI controller for iTunes. I made a working version in something like four hours, spent a few days adding in crazy half-thought-out features like speech recognition and a web interface - and then I basically stopped development.
I didn't stop development because it got boring - I stopped development because I'd solved my own problem. Not beautifully (certainly not from a coding perspective), not efficiently, but the problem I had was solved.
The problem, then, is that once the "suffering" is gone, or sufficiently lessened, there is no real reason to keep building.
(oddly, my password for my old account no longer seems to work. I was hebejebelus)
"The problem, then, is that once the "suffering" is gone, or sufficiently lessened, there is no real reason to keep building."
Then how is the project incomplete? If it's not a product that you're planning to sell, put your code on Github or the like and others will add any features that you're missing.
WiB is more about appreciating barebone low-level tools, like C/Unix in the posted example. These tools are crude but technically polished, otherwise no-one would bother to use them.
It depends on what you mean by "technically polished." Unix handles interrupted signal calls by returning an error code that means "I was interrupted." This technique bunts on the hard problem of rolling back OS operations. It is not "technically polished" in that it does not solve all of the hard problems in front of them. But it's still useful, and it caught on because people used it.
Gabriel's insight is not about low-level tools. It's about at what point can you bunt on the hard problems, have an ugly work-around, but still be useful enough that no one will use "the right thing" when it eventually comes about?
> No they won't, because he hadn't even started on the "make it beautiful".
At the risk of sparking a language war--the iTunes control project is written in Python so it has a certain level of consistency to start with.
> When I want to solve a new-ish problem, I can't imagine grabbing some barely working cowdung from some guy's github repo.
One developer's cowdung is another developer's works-for-me code. :)
For me part of the standard development process is the "literature review" which consists of finding all the prior art (polished or not) and evaluating it. I'd much rather someone uploaded their code in any state than not at all. YMMV. :)
I think in this case the approach worked fine for you. Most of the times a hacky solution will do. Not everything needs to be cleanly architected and bug free, especially a weekend project.
It's a mistake to try to anticipate use cases you don't actually have or else you'll end up overengineering your solution.
I wish more people would think like this before sabotaging their perfectly good APIs with noise. I'm a huge fan of the Pareto principle in that regard. First and foremost, expose the 20% that allows me to be 80% effective. The rest can be figured out as we go, but at least, what you'll teach me today, I'll learn fast and will know well.
Especially if you want people to adopt your API's. If I'm looking at using a service and I have to choose between a complicated or a simple API, its an easy choice. Starting off, you want to solve the most annoying problems first.
> The most important characteristic of a suffering-oriented programmer is a relentless focus on refactoring. This is critical to prevent accidental complexity from sabotaging the codebase.
I can't tell you how many times I've seen accidental complexity creep in because someone adds a new feature without taking the time to refactor to the simplest set of abstractions. But - and here's the flaw in the approach - you have to be an expert in the code to produce such a set of abstractions, which is a potential bottleneck when your codebase is big enough to require multiple developers. Not everyone has the time/capability to be an expert.
"Flaw" seems strong. If you have a small enough and strong enough team, the approach makes a lot of sense to me. I wonder how big the original Storm team was.
This is really an extension of the advice to make sure you build a product that scratches an itch you have. Extend it to developing frameworks, and you have “suffering-oriented programming”.
Of course, the real problem is that there are certain spaces that wouldn't really be serviced if that's all we did. Education is an excellent example: the people who really feel the pains of education, students, won't really start having the ability to help solve those pains until they're further along in their education, at which point the earlier pains don't necessarily apply as well. In a similar vein, learning to program is a problem that's been on the map for a long time, and, while the situation is constantly improving, we're dealing with the problem of, by the time we've got the skills to really help solve it, we've already learned to program, and have lost some amount of sight of how learning could be improved, because we're no longer in the proverbial trenches.
So while “scratch a personal itch” is awesome advice, and almost certainly helps in product development and framework development, sometimes it isn't enough. Sometimes just following your curiosity or intuition and exploring a space that doesn't produce regular hurt for you can lead to an equally good result. At that point, what you need to make sure you have is external feedback from the people whose itch you're trying to scratch. Maybe the chances of failure are greater in these cases, but the opportunities for success are also probably greater.
I see this as more then just an extension. The basic premise is that you cannot build an elegant solution to a problem that you do not have a deep understanding off. When working in a domain that you do not know, create an inelegant solution which works. In this process gain a better understanding of the problem, and use this knowledge in a later stage to build a better solution.
I'm not sure I (fully) agree with your example of education.
In education the scratchable "itch" is felt by educators and parents in their offering of a service (education). The extent to which we as a society expend resources providing tools/resources for educators and allowing them to build/purchase solutions demonstrates the extent of (or lack of) our societal valuation of education. The itch in this is scratched by the service providers, not the consumers.
The trouble is that the scratchable itches educators and parents feel are not the same ones that the students feel. You need to be able to address all three itches in many educational products, but it's fairly difficult to be in all three roles to gain the understanding you need.
I found this insightful on the make it fast stage: "you might worry about things like asymptotic complexity in the "make it beautiful" phase and focus on the constant-time factors in the "make it fast" phase."
This. I've yet to see a client development project that made it past step 1 on anything other than the most trivial features. Sucks, but I got paid. What can you do?
I think the advice is intended for products you build for users, not products you build for clients. The right strategy depends on who your customer is. When you find a ripe market, you'll discover that your customers—i.e., the users—just want a solution, even if it isn't pretty. When your customer is the client, then you're selling to him, not to his users, which means you rationally should care less about the product. This can be painful, which is one reason why many startup founders prefer to avoid client work.
If you're good, you can trick the client a bit, deliberately don't completely finish part of 1 till you have most of 2 working.
Don't do this just for your own gratification, but only when it will genuinely help the client.
It helps when you get a sense of how long they expect the project to take - if you do it fast, but deliver a poor solution you are not doing your job.
If they want it fast, and they know it will be poor then fine. But a lot of the time they are not expecting fast, but then you deliver fast and they think "cool", not realizing you didn't really finish things properly.
Well written, I especially loved the definition: Suffering-oriented programming can be summarized like so: don't build technology unless you feel the pain of not having it.
Having said that, it would be cool to hear from a "devil's advocate". I am particularly thinking of cases where "making it possible" gives an 80% solution but a fundamental flaw or limitation makes the remaining 20% prohibitive, basically requiring a complete rewrite to move forward. Is this a problem in practice or are 80% solutions usually "good enough"?
If your spec is good, it's rare, because you're most likely to see a showstopper appear 10% or 20% of the way into making the 80% solution - and that's early enough that you can rearchitect and continue.
The cases where it becomes almost impossible tend to come from the "reuse something written to a different spec" projects where you're forcing things through the wrong architecture. The warning against over-focus on genericity at the end of the article is directed at exactly this - when you aim for generic you can often find yourself building to a spec without concrete goals, thus the actual problems are poorly dealt with.
I would very much love to find out what kind of techniques you folks used to ensure some degree of quality (or sanity) in the "make it possible" phase.
I'm basically wondering if you used anything like CI / TDD / Code coverage or any other quality ensuring techniques at such an early stage. One might argue that it's counter-productive, since you're going to likely rip-out most of that code anyway during the "make it pretty" phase. Others will state that keeping high code quality throughout even prototyping will make you waste a lot less time on debugging and therefore make you overall on the long run.
To me it feels like it'd have to be a very delicate balancing act.
A few other commenters have suggested that they don't get a chance to get past "First, make it possible." to making things beautiful, then fast. The person writing the checks doesn't care if the implementation is horrible, as long as it works, etc.
Thing is, both points of view are correct, in different ways.
The kind of long development cycle Nathan alludes to here is not something a lot of people can do, but is one of the risks you can take / benefits you gain if you're your own boss. I wonder if it's more common in larger companies, who can afford to polish some projects before putting them into production? With a startup, it's more likely that there's one or two projects, and both are P1-CRITICAL.
For people wanting to explain it to a boss, I think it's best put in terms of technical debt[1]. This project will be expensive, but by spending more time/resources up front, there is a much lower cost to adding features or fixing bugs in the future; this includes getting new coders up to speed on the codebase, turnaround of critical features, etc. The cost is initial time to market - it will be longer before you see version 1 going out to people who can use it.
Taking on technical debt is a perfectly valid business decision under the right circumstances. If time-to-market is critical and you're planning on having enough money when it's successful to pay off that debt in the future, then maybe you want to throw out making it beautiful / performant just to get something in front of people.
Planned technical debt is a business decision. It's the folks who buy up front and ignore their debt that end up with problems, but that much is something we're all familiar with. One way or the other, that debt is paid eventually, whether with cash or time.
Even if you're in the unfortunate position of turning out software that sucks for people who don't care though, there's light at the end of the tunnel! The best thing about programming is that the more you do it the better you get at it. The faster you get at it. This means you can build refactoring into your projects and estimations without making a big hoo-hah (as it were) about it.
Open source projects are another outlet people have to scratch their perfectionist itch; if you start a project, you decide the timelines, features, level of polish, etc.
The key here is "an unfamiliar domain". The "make it possible"-part is empirical research. So, you could make it beautiful and fast the first time around, but some would argue that it'd be better to first make it beautiful and then optimize where necessary.
In software engineering the "thinking beforehand" approach is probably riskier than the "think as you go" approach. For example, often the requirements aren't well-defined when the project starts.
Contrast this with physical-stuff engineering (which I think is what you're referring to by saying "engineering background"), where the requirements tend to be better-defined and the cost of experimentation/refactoring is a lot higher.
"In theory, there is no difference between theory and practice. But, in practice, there is."
IMO its best to think about a problem for a little while until you come up with some reasonable solution, then hack it up, then revisit your original decisions and see if you can do it better.
One example where "engineering" versus "agile" paid off:
In 04/05, I single-handedly wrote a large Windows Mobile (.Net CF) and SOA platform for managing proof of delivery across 5000 devices which used server-push messaging over semi-persistent GPRS links. This required about 6 months of work after 3 months of R&D. It was delivered on time, with zero defects, worked perfectly on low bandwidth connections and required no training for the user (less than 5% of the userbase had problems using it with no training).
It was fast, even over low bandwidth connections. Virtually real time when a link was up.
It was beautiful because it was easy to use and abstracted the messaging system and connection availability entirely.
It was possible because it worked first time with no regressions or defects in the field reported in over 6 years. It had to work first time as there was no upgrade channel for the users.
Where's the payoff? All I see is a system that had to work first time, so "agile" wasn't even an option. If there were an auto-update mechanism to get everyone on the latest version, it seems to me that you might have delivered months earlier and polished it over time.
It didn't have to work first time. People expected it to work first time and the expectation was fulfilled. Would you deliver a stinking turd first time and spend 6 months fixing it? That appears to be the way people want to work these days but believe me the customers are fucked off with it. The entire industry is getting a bad rep due to this laziness.
You're making bad assumptions. You can either deliver a "stinking turd" in a month and spend 5 months fixing it, or spend 6 months developing and deliver the whole thing in one go.
Suffering-oriented programming rejects that you can effectively anticipate needs you don't currently have.
I do anticipate, however, I never do it in code. Rather, I find that just thinking through some scenarios helps me to see where the code might start evolving, and to make sure that that part of the code is isolated enough to change without having to change everything else, if the time ever comes.
(Perhaps Nathan includes this in his "make it beautiful" phase, but I felt it deserved to be made explicit. I've too often heard YAGNI used as an excuse to not even think about a problem, let alone code for it.)
I agree, as some agile guru said, a plan is useless but planning is essential. An important thing along with avoiding making useless features is also to avoid building a design that is hostile against features we do need to add. Building a good design without having any anticipation of what will come is in my point of view impossible. The tricky part is to keep the anticipations and assumptions to a reasonable level.
Good point. Breaking down the code into "isolated enough" units as you write it shouldn't add much complexity or effort. It actually makes testing / debugging and refactoring easier. You cannot anticipate what will need to be added, expanded or improved, but you can be sure something will.
Awesome. I have this exact same methodology...Now can you write an article about how to explain how this works to Business people/non-tech people, who have no experience in "iterative program development" or why the programming process MUST be iterative?
Everytime I mention code refactoring to a non-tech person the response is similar to "Why would you program the same thing again?" And your article is the exact reason why. You can't know everything you need to know in the beginning of developing some brand new technology. If that was the case, then every startup would be successful.
So in agile terms, besides splitting down the work by user stories it can also be split down by function (make it work), design (make it beautiful) and last performance (make it fast). Quite a natural approach to prioritize that order depending on what stage the project is, early, mid or late. It requires a continuous refactoring but in the business man's point of view it's rework. But the end result will probably look more like a second generation product rather than a first gen with mediocre function, design and performance.
As someone who has gone through the storm source in very fine detail, let me tell you how he did this. He hashed each tuple that needed to be processed then XOR'd it into a variable that started at a value of zero.
When the piece that needed to be processed was complete it would get XOR'd back into the variable. Once the variable hit 0 he knew everything was done! Pretty neat if you ask me.