I've worked with multiple services in multiple teams where upstream fixes take a while and meanwhile devs and ops people get paged like crazy for a diagonized and remediable problem. Agreed that logrotate config needs to be fixed for this case but it is only a simple demo for auto-remediation. For years, Cassandra dead node replacement is a 6 step manual process. You'd think upstream should be fixed but unfortunately not. So StackStorm fills the gap between what is ideal and what is running in production. Usually, there is a gap. See http://docs.datastax.com/en/cassandra/2.0/cassandra/operatio... vs https://stackstorm.com/2015/09/22/auto-remediating-bad-hosts.... That is just another example.
It's not only about that, - cleaning logs is just simple example. The main big thing is about IF-Then-Else and it's up to you to choose what you put after that IF.
Things like:
* Building fully automated and really complex CI/CD workflows from several tools
* Do something with your AWS or RackSpace clusters based on monitoring event from NewRelic, Sensu, Nagios
* Automatic node replacement in cluster, migrating MySQL master (sleep well!)
* Security automation, based on detecting erroneous events and automatically freezing account/activity and then notifying human about the incident
* Create JIRA issue as part of Workflow, kind of detailed report after some action being done
* Listen for new events/changes in Trello/Kafka/GitHub/RabbitMQ/anything even Twitter and trigger an action
* Folks even using it for Smart Home Automation
* ChatOps thing: obtain info about your infrastructure from Chat or trigger your favorite CM tool: Puppet, Chef, Ansible, Salt.
Most probably anyone can imagine lots of use cases with their favorite DevOps tools, how to tie them together.
let's scrutinize. And please do challenge and point out what still feels wrong.
* first, a library of scripts (actions), a shared one. each action is atomic, linux style, doing one thing well. A common pattern in ops. now with CLI, API and UI. Feels right so far?
* second, combine these actions, building blocks,into workflows (workflow is action comprising actions). why not script? a) transparency of state (it ran 3 steps and failed on 4th) b) reliability, like 'restart workflow from a point of failure' c) carrying data - scripts pipe strings, workflows pipe JSON.
* Add chatops. Any of these actions or workflows exposed in any chat with couple lines of meta. And any events sent to chat with rules
Good things begin to happen here, even before wiring events with actions. Shared context, integrations, quickly building more actions from existing actions, full audit...
* now, add IFTTT - firing these actions on events. Quite a lot of cases fall into this.
It's a challenge to single-out on one use case. A trivialized example, as log-file delete, is dismissed as "baidaid". Complex examples are domain specific and harder to grasp. We think we are on something here. We think it's not a bandaid, it's a glue. Needed in many cases.
what you do, is a yaml file that goes and delete some files around when this happens....
.. instead of... fixing logrotate config
i dont know, it feels wrong: as much work, except it also takes setup, new machines, new stuff that can fail, be misconfigured etc.