Automation Should Be Like Iron Man, Not Ultron

erikb · on May 21, 2016

I remember around 2010 people adviced the same principle but with different words. At this time Go was still a game that couldn't be beaten by AI alone, so it was taken as example. And people found that if you let a person play, he can only achieve that much, mostly disappointing compared to perfection (with some exceptions), if you write an AI it won't be able to solve all the problems themselves. But if you combine them, using a human player that gets calculation backup by AI, that draws diagrams ( heat maps in case of Go), that gathers data, this is what enables the human to perform on professional level (about 3dan pro if I remember correctly) despite being a total beginner (15 kyu on Kiseido Go Server).

I completely agree with that and believe that the profesional and scientific world is actually also developing in that direction.

jfoutz · on May 21, 2016

In 1968 Douglas Engelbart used the wonderfully space aged term "brain amplification" in his mother of all demos.

In practice, i'm not sure how it actually works. Systems work can be automated away. And it becomes brittle. Something like emacs or vim absolutely feel like iron man armor. I can't really see what vim for systems work would look like.

My vague thoughts are, maybe build systems that give lots of information about what's happening right now, and how it's different than the past. Lots of commands for introspection. some commands for intervention. At scale, picking a specific machine to remove from load balancing pool or restart or destroy and reallocate at a keystroke might be handy.

It's an interesting perspective, but the difference seems subtle in most practical contexts.

falcolas · on May 21, 2016

Here's one example from my experience:

There are many MySQL tools which are able to automate the process of failing over between a master and a slave. Github initially chose a solution which would trigger this failover automatically, when a master was unresponsive for a particular amount of time. One bad query came in, and the tool did it's job by failing over from the master to the slave. Here's the problem - the query was automatically retried against the slave, causing it to fail over back to the master. Repeatedly.

The solution? Page when MySQL became unresponsive, let a human initiate the failover, and let the tools handle the failover itself (which is a very complex dance to perform manually). I think they eventually moved to a model where it would fail over once per few hours, and alert if the condition occurred again.

MikeTLive · on May 21, 2016

The same questions a human will ask can be encoded into your admin automation. It is only as powerful or accurate as the process allows.

If you don't already have a procedural flowchart build one. Have the human document the questions asked and answers followed.

falcolas · on May 21, 2016

Only if you can foresee the problems and solutions to encode into the system. If you miss one, the system will fall down or make the wrong decision.

XorNot · on May 22, 2016

That's a problem of historical data. What are you really doing when something "seems unusual"?

virgilp · on May 23, 2016

You're using your best judgment?

sampo · on May 21, 2016

> In 1968 Douglas Engelbart used the wonderfully space aged term "brain amplification"

And Steve Jobs in 1990, "And that’s what a computer is to me ... it’s the equivalent of a bicycle for our minds."

https://www.youtube.com/watch?v=ob_GX50Za6c#t=25s

danudey · on May 22, 2016

Here's how I built our deployment system at work for our various interdependent packages and services, each of which is a git repository (and where libraries are symlinked into those git repositories as needed).

First, I built a tool called checkout_package which clones/updates a package and then recursively installs and symlinks all of its dependencies. Devs used this for local installations as well as on production servers.

After that had its problems ironed out and everything was working, I built a system called command_runner, which was a simple JSON-based RPC daemon. This had the ability, among other things, to call out to command_runner and return the result back to the callee. Now our developers could run command_runner, which we knew worked (and knew how it worked), on remote machines without having to log in. A simple shell script was all that was required to loop over every machine to update them all.

After that, we built a web interface that could let you choose which packages you needed to update, and it would run command_runner in X number of threads to do the updates concurrently. It also returns the output from command_runner (which is extremely terse if everything went fine) back to the user of the web interface so they could tell when something went wrong, and retry if necessary.

We also use monit. I added functionality to command_runner to run and parse 'monit summary' to see what was installed on and running/not running on a given server, and to run monit commands (start, stop, or 'restart', i.e. stop and then start) on a given server. From there, we followed the same pattern: shell script to iterate, integrate into web interface.

Eventually, we tied the two together. After you deploy, it takes you to a version of the services management page that only shows (and already has selected) the services which are, or depend on, the packages you deployed.

None of our system abstracts away any of the details of what's happening; it merely provides an easier way to do what you would be doing anyway, and provides more information than you would normally have (for example, the package deployment page can show you the git diff between what's live and what's HEAD, so that you know whose changes you're about to deploy). Because the devs still use checkout_package at the command line in their local environments, it requires that tool to remain both usable and sufficiently user-friendly, and for nothing to be wrapped up into any other layers of automation that is necessary for doing the fundamentals.

This way, we have managed to prevent our deployment process from becoming impossible to reproduce by hand if necessary (e.g. when deploying/rolling back changes to our deployment system).

The system provides extra information that you wouldn't have had, does some tasks for you that you don't have to do manually anymore, and does stuff for you all at once instead of having to do the same task over and over again per server, but fundamentally it's more like cruise control than a self-driving car.

Houshalter · on May 22, 2016

That works because computer game playing was totally different than how humans play. Humans can recognize common patterns, and computers can brute force many possible positions and evaluate them. The two complement each other.

But now AlphaGo beat humans using neural networks that can learn to recognize common patterns similar to humans. And it was combined with tree search to evaluate many more possible positions than a human could.

So I doubt human and AlphaGo would have any advantage over just AlphaGo. In the future the same will become true in many other domains.

donlzx · on May 22, 2016

I actually think AlphaGo is a perfect example of Human-machine cooperation. Deep neural network is the easier part, the intuition and architecture design by Humans are the most critical part.

barkingdog · on May 22, 2016

Here's where your analogy breaks down: what if the state of AlphaGo enhancers was as good as AlphaGo AIs? Could the current state of the art AI beat a human working with a state of the art augmented AI based assistant?

The lack of an answer is an interesting question, and I wonder how much it has to do with cultural concerns. Obviously, "machine beats man at intelligent task" is a lot sexier in some ways.

rqebmm · on May 22, 2016

> I doubt human and AlphaGo would have any advantage over just AlphaGo

I know what you're saying, but I think it's important to remember that AlphaGo was created by humans. The nature of Go pattern recognition was almost assuredly factored into the creation of the algorithm.

srtjstjsj · on May 22, 2016

No. AlphaGo used a general-purpose algorithm with basic Go rules and novice-strategy optimizations.

danblick · on May 21, 2016

This reminds me of arguments in David Mindell's book, "Our Robots, Our Selves", particularly what he calls "the myth of full autonomy":

""" The final myth is the myth of full autonomy. Engineers and roboticists tend to believe that technology is inevitably evolving toward machines that act completely on their own, and that full autonomy is somehow the highest expression of robotic technology. It isn’t. Full autonomy is a great problem to work on in the laboratory. But solving the problem in an abstracted world, difficult as it may be, is not as challenging, nor as worthwhile, as autonomy in real human environments. Every robot has to work within some human setting to be useful and valuable. """ - http://www.davidmindell.com/qa/

As a case study, Mindell talks about commercial airline pilots and the use of auto-land vs. heads up displays. (HUDs keep pilots' skills fresh and keep them engaged and informed during landings.)

I like that the article suggests a technique for building systems, where Mindell's book isn't really useful in that way.

akkartik · on May 21, 2016

By a strange coincidence I just happened to be reading http://www.vpri.org/pdf/sci_amer_article.pdf when I spotted this thread on another tab. I found it after reading this quote from "A small matter of programming":

Kay's colleagues have created a simulation construction kit for children so they can build their own simulations. The construction kit lets children write simple scripts that model animal behavior. Using the kit they can change parameters to see how changing conditions affect the animal. With the kit, the children have tools that give them tremendous scope for intellectual exploration and personal empowerment. Kay reported that the children have, with much enthusiasm, simulated an unusual animal, the clown fish, "producing simulations that reflect how the fish acts when it gets hungry, seeks food, acclimates to an anemone, and escapes from predators."

http://www.amazon.com/Small-Matter-Programming-Perspectives-...

hyperpallium · on May 21, 2016

aka intelligence amplification, not artificial intelligence (automation).

"how people's behavior will change as a result of automation [with examples of learning]" is learning-focussed - a similar focus to experimental MVPs that teach you technology and market needs.

Literal, linear amplification is easy, but amplifying intricate tasks, reducing work while retaining control is difficult. You need to understand them to some extent, which requires the data of experience. There's no simple formula to "automate" this...

Abstractions like libraries, modules and data types are one way. Going to fundamentals, number is a great abstraction: automating addition and multiplication doesn't lose important control. But if you create a bad abstraction, you'll have to change the interface semantics, which may easily have far-reaching effects.

It's incredil;y, almost impossibly hard to come up with great amplifications - even Euclid didn't come up with a positional number notation.

Instead of perfection, we should try for some improvement. And this article's suggestion of how behaviour will change - especially learning - seems a good start.

iofj · on May 22, 2016

I write automation tools for a living. The problem with this approach is that it's no different from full automation. Things would collapse if the automation fails just like it would with full automation.

In this case after management realized the increased efficiency and changes your 20 people systems management team into a 10 people systems management team. Or they put new tasks on you to justify the size of your team.

Then your amplification automation breaks ... and your productivity drops by 80%-90%. And suddenly your team cannot handle even running at 10% capacity. Calling everyone in and working 16 hours straight you get this "heroically" to 50%, but the automation fail still breaks the business.

kemiller · on May 21, 2016

Gosh I've worked this way for years, but never seen it articulated so well. We used it for content moderation. Let the algorithm comb through thousands of messages looking for patterns and then present prospects to humans for final judgment. Great article.

mattip · on May 21, 2016

Add visibility and transparency to complex automatic systems. A good monitoring system to reflect the internal state, and interfaces that can be simply understood between small components that have a specific task. Then the automation becomes less magical, and the knowledge of how it works can be passed on by teaching the meaning of each state variable in the monitoring system

solipsism · on May 21, 2016

I think Iron Man vs Ultron is a false dichotomy. Both are appropriate in certain use-cases. And often Iron Man is a stepping stone on the path to Ultron. Driver assistance vs self driving cars is a perfect example.

The idea that fully automated systems are by their nature "impossible to debug" is just false, demonstrably. Fully automated systems don't need to be a ball of mud. Not participating in the processes may make it harder for a user to know what might have gone wrong, but this is a natural trade-off and is often one which should be made.

Better advice would be that systems shouldn't be fully automated until they're very robust. The author actually covers this. It's unfortunate the title and main point of the article seems like it's saying "nothing should be 100% automated". There is a place for J.A.R.V.I.S., after all.

naasking · on May 21, 2016

> Driver assistance vs self driving cars is a perfect example

Not entirely. Drivers overriding automated systems introduce more errors than human alone or computer alone.

pm90 · on May 21, 2016

> Another leftover process was building new clusters or machines. It happened infrequently enough that it was not worthwhile to fully automate. However, we found we could Tom Sawyer the automation into building the cluster for us if we created the right metadata to make it think that all the machines had just returned from repairs. Soon the cluster was built for us.

What does "Tom Sawyering" mean? This is the first time I've seen it used as a verb.

bananabiscuit · on May 21, 2016

Tom Sawyer tricked his friends into painting a fence for him by making it sound like it was fun. And these guys simillarly tricked their process to set up new machines by making them appear like machines coming from repair.

brador · on May 21, 2016

Good code comments can help this. But as we all know the more you work on code the more useless comment blocks you're left with until in the end the comments are meaningless and redundant in your sphagettic coded mess.

Solution: Descriptive Variable Names. You'll always have variables, they're a foundation of any code base, and by naming them well they flow through the code describing what's happenening with 0 extra work needed.

With good variable names you can then pepper the code with good meaningful one line comments when needed.

YesThatTom2 · on May 21, 2016

Yes. The solution to bad automation is good variable names. I can't agree more.

Coincoin · on May 21, 2016

I strongly agree that naming variable is an effort worth the benefits, but naming stuff correctly is not 0 extra work. As a matter of fact, judging by how many people give crap names to their stuff, I'd it's one of the most difficult task of designing something.

Finding a name that is both descriptive enough and not overly specific, that describes the 'what', not the 'how', doesn't conflict with other concepts and is not a 20 words Java style keyboard breaker, requires quite a lot of effort.

gnaritas · on May 21, 2016

> As a matter of fact, judging by how many people give crap names to their stuff, I'd it's one of the most difficult task of designing something.

You're absolutely right, the two hardest things in programming are naming things, cache invalidation, and off by one errors.

50CNT · on May 21, 2016

I swear I have to start coding with a thesaurus. I mean even basic variable names are sometimes hard to find words for. Say I define a list i aggregate output into which is returned at the end of the function. Do I call it output, output_list, result, result_list, result_l, r_list, o_l, output_l, o_list, aggregate, return_value, file_list, path_list, directory_list, dir_list, dir_l....

There's so many possibilities, some of the utterly rank, some of them decent, and a fair amount of good descriptors, but which one is best?

And that's just the naming of one variable. Then there's function names, module names, package names.

Is there some guides on this or something?

gnaritas · on May 22, 2016

> output_list, result, result_list, result_l, r_list, o_l, output_l, o_list, aggregate, return_value, file_list, path_list, directory_list, dir_list, dir_l....

You're making it too hard, use normal whole words, call it "result" and be done with it; it's a temporary local variable, its name simply isn't that important. Class and method names are far more important because they have much larger scope.

Here's the trick to naming things well, name them and move on and then when you have to come back later-which you always do-guess what the name is; if you named it well, you guessed right. If you guessed wrong, rename it to match what you guessed it would be. Over time your naming will get better.

extrapickles · on May 21, 2016

I prefer to name things after the major operation that is done with/to them and what they are if that not coveyed by the type. This is very imporant for strings (or array of string, etc) as the type tells you little about what that string is/for.

Using your style, variables would be named something like directory_aggregate, filtered_names, etc.

YesThatTom2 · on May 22, 2016

As the author of the piece I'm disappointed that after reading a 6 page article all you got was "use better variable names".

I have to ask: have you never worked in operations or did you not read the article or were you trolling HN?

0xbadcafebee · on May 22, 2016

For those who don't want to read all 15 pages, this can be summed up as "automate using the Unix philosophy." If you are not familiar with the Unix philosophy, here you go: https://en.wikipedia.org/wiki/Unix_philosophy

Allow me to answer the original question in a simpler way.

Q: Dear Peter: A few years ago we automated a major process in our system administration team. Now the system is impossible to debug. Nobody remembers the old manual process and the automation is beyond what any of us can understand. We feel like we've painted ourselves into a corner. Is all operations automation doomed to be this way?

A: So you wanted to go to the moon, and you hired a blind retiring man to build a rocket ship with no blueprints using parts from a junkyard.

Don't do that. Instead, build lots of stairs. Lots and lots of stairs.

Operations automation is different than typical software engineering because it isn't a finished product. You could kind of compare it to a constantly evolving organism. It needs constant care and attention and special handling, or it will get sick and fall over. This is due to non-indempotent operations, entropy, and changing requirements.

An operations automation tool is never really complete. Probably modified over time by multiple people, most of whom no longer work at your company, and the changes were never documented, and probably were mostly made up of edge-case exceptions to allow some team to get something done for a weekend deadline. And now you need to change or debug it without breaking anything.

When you write automation tools or processes, they need to be so dead freaking simple that just looking at the source code explains everything about it. It needs to be incredibly simple to tie together with other tools, so when it becomes too horrible to maintain any longer, it can be rewritten in a weekend. And it needs to be fault-tolerant, in the sense that it can fail and continue working anyway.

If you have to write a last-minute exception to fix something that will only last a week, you can literally fork the original automation task and make a new one called "THIS-IS-A-TEMP-HACK-fork-of-automation-task.pl", just to make sure nobody confuses it with the normal, not-hacked-up task.

--

I once had to overhaul an old Ops tool that would simply generate a config file used to control the kickstart, post-install setup, and configuration management of thousands of servers. It was a 5,000 line Perl program with no subroutines. I re-wrote it in Perl in a modular fashion, using advanced OO techniques and all the fancyness I could cram into it to cover every corner case of how it would be used in the future.

Then I rewrote it as lots of tiny shell scripts. Guess which one ended up working out better?

At the end of the day, all you need is to use the simplest and most direct method to automate little things, and use the same approach for everything you touch. It will grow and change over time, but as long as you are consistent with making small, S I M P L E, modular things that do two things well (the original task, and compatibility with other programs), you should be fine.

Oh - and if a given automated task takes longer than 20 seconds to explain to someone, it's too fucking complicated.

ihsw · on May 21, 2016

I am guaranteed to be in a minority here and everywhere else -- I think free will should be opt in.

The progress of humanity has been underscored by the goal of freeing up physical and mental effort, and we have never hesitated on relieving ourselves of the former so why stop at the latter?

Zekio · on May 21, 2016

Won't you then have the issue of your ironman making ultron?

earlyresort · on May 21, 2016

No, since Ultron was made by Hank Pym.

Zekio · on May 22, 2016

Actually Ultron is made by Hank Pym and Iron Man, As a prison guard. But Hank is the project leaderish

ambirex · on May 21, 2016

... says the human. ;)

sandworm101 · on May 21, 2016

Ironman? Really? Should technology be accessible only by billionaires and used primarily to maintain their status as final arbiter of its use? Should governments, representatives of people, constantly have to beg/borrow/steel technology from these billionaires whenever they want to actually get something done?

Ironman isn't how technology should be managed. In modern times he is the embodiment of closed-source IP law channeling profits to the already vastly wealthy.

kemiller · on May 21, 2016

I think maybe you are latching onto the wrong part of the analogy.

sandworm101 · on May 21, 2016

Give me a minute. I'm working on Ultron as a metaphor for the f/oss movement disrupting the status quo.

kemiller · on May 21, 2016

OK, if you can pull that off, I'll give you a pass.

PhasmaFelis · on May 21, 2016

I agree. Also, it's very irresponsible to imply that all automation systems should be equipped with anti-tank missiles and deadly repulsor beams. The author really should have known better.

Additionally--and I realize this is a minor point--there's no reason why all automation systems should be liveried in red and gold. We don't need corporate fashion police!