This was definitely an interesting comparison but to correct a few misconceptions:
Ansible has 810 contributors at this point. I'd love to say I wrote everything but it's a huge shared effort.
We also have a lot of mods other projects don't, so some comparison aspects were not even.
We do say no when we disagree. I think that's important.
Filtering and testing makes a project what it is to a degree. There is always the project and development list to discuss things and they are really big lists. All being said not transferring a file verbatim is for example still the right call for us.
Try what you like by all means! But I would suggest that it not be inferred I eat children :). Only sometimes!
"The only complaint that I have here [about Salt] is that they are sometimes less rigorous than they should be when it comes to accepting code (I’d like to see more code review)."
Keeping high quality across a project requires discipline. And that discipline can sometimes seem cold.
"pull request welcome" is at the warm end of the spectrum.
Yeah I don't think we've ever meant "pull requests are welcome" as a "screw you guys!". We actually mean it's welcome.
When we don't want something, it's more like "I don't think we are interested in that feature".
The big green web merge button on Github is a scary beast, and if we risk a few users for stability and taking our time, I'm cool with that. I think a lot about running a successful project is working with a contributor and helping them get the pull request into good shape.
Those that can deal with the process and power through it become better contributors for later.
We want to very much avoid being Wikipedia, while still being a canvas for massively widescale contributions.
anyway, stability to us is very important. Security and usability (and docs) are important. Those things come first before we take on new features.
Will and I disagree from time to time, but in the end, we're both way better for it, and he keeps me honest.
Anyway, for those reading the article - read all commentary, and try both. Try Puppet and Chef too. If you like Ruby, you might really dig Chef even, and we're ok with that. It's all good and there's plenty of users to go around :)
And in my experience, submitting a well-reasoned, simple pull request to add a small change or fix a bug always results in a merge.
As someone who also maintains a few (much, much smaller) OSS projects on GitHub, I really understand the 'no' mentality. It's often much harder to say no, but usually I try to put it in a positive way (yes, this is a worthwhile idea, yes, it looks like it could help in this situation, but no, I won't be merging it because I don't think most of the project's users would benefit from its inclusion).
Part of the difficulty comes from the dynamic changes of GitHub. 10 years ago, usually folks would discuss a change prior to submitting code.
Now, it's more common for someone to assume code is wanted, and then it's easy to be a little disappointed when you find an upstream would want it implement differently.
In all though, GitHub has done wonders for standardizing contribution processes.
It's unfortunate that this article focuses on running the playbooks/salt states locally. The use of ssh by ansible was the killer feature for me. Configuring a remote cluster without requiring a persistent master. There are valid arguments for maintaining a persistent master, but it's just not in the cards sometimes.
I know salt-ssh exists but it's still alpha, I look forward to seeing how it pans out and whether it can avoid being a second-class citizen to the persistent, non-standard zeromq sockets.
That being said, ansible configuration files are fairly hacky and conceptually just don't quite fit. Some modules support a full yaml-dict whereas others need the string with key=value parts. Sometimes you need to wrap your jinja2 syntax in a yaml string to avoid it being parsed as a yaml dict. There's just some things that don't quite add up so there's definitely room for improvement.
I think I'll live with it until I gain confidence with nix though!
Salt devs don't have any reason to make salt-ssh a second-class citizen, because they're working on a third transport. Everything's going to be (mostly, already is) abstracted from the transport so that salt-ssh, zeromq, and raet (the new transport, a kind of hierarchical distribution of messages to deal with massive deployments where the zeromq one-master-to-all-minions setup has scaling problems) are interchangeable. Also, raet uses CurveCP rather than rolling their own crypto, minimizing area where they can screw up enc/auth.
This is good in theory, but in practice, there are known bugs against salt-ssh for which certain operations and states don't seem to work properly. (At least one of which I believe I pushed.) In hindsight (The problems I ran into with it were rather early into my multi year salt experience) it's highly possible in my naivety I was trying to do something that's simply not supported like tying some ext pillar in or something, but I have strong memories of bigger problems... (Wish I had a better recollection, but it's been a while)
The long and the short that this rambling was meant to convey: Salt is still very much in development. There are multiple open bugs on multiple core features (win repo comes to mind) which simply do not work as documented, period. That being said, when I made the same decision process for the company I was sysadminning for at the time as the author is considering, I went with salt, (with much the same background knowledge), and even knowing what I do post factum, I don't think I would change that decision. (I can give more justification as someone who had to live with their choice if anyone is curious, but I feel like I'm already rambling a bit.)
I'll follow-up with a post about how we're working without a master. We need neither SSH, nor a master.
We're very heavily using autoscaling, which makes SSH a no-go. Ansible has Tower for this, but it's proprietary. We /could/ use a salt master for autoscaling, but we prefer masterless in this situation because it scales better.
ansible-pull is available for those that need to invert the architecture, though we're finding most users in companies who need autoscaling can afford Tower. Price points are definitely important in that regard, but ansible-pull does exist for those that would rather go the pure OSS route.
Tower is also free for up to 10 nodes. See my comments above about why we went that route - being able to build products versus having to become a consultancy or support outfit makes it easy to keep ansible to be easy to understand and rock solid, and most people are quite happy with that split.
Open core also has its set of issues. For most open core products I've used over time the community starts creating alternatives to the proprietary products and the upstream slows its acceptance of open code. The upstream will also tend to spend most of its time working on proprietary features.
Whether or not tower costs money, it's still a worry of being a single point of failure for autoscaling, which is part of why we avoided masters.
We've never held back anything from Ansible, really. Rather, Tower is more of a product on top that provides some extra enterprise features that most of our user bases don't need (but they should try it, because they might!).
I think if you see things like Windows being part of Ansible proper, it's clear we're not holding that back. But there are also tools the OSS community can't build easily, things that involve coordination around database schemas and (ick!) status meetings and UX mockups.
Yes, communities can build them, but occasionally, just occasionally, companies can build them better. And this is one of those cases. Our business model basically funds Ansible and also makes Tower significantly more capable that way, and it only becomes something you need when you can afford it. And it's not so much because we're a company, because I've got tons of awesome folks working in 100% full time, and that's a lot of power to build good stuff. Most likely your company employs a few folks as well :)
So on the "open core" comment, Ansible won't, for instance, ever have proprietary modules. That's something we said we don't do. Ever.
As for Tower, the small guy isn't going to need it yet. He's probably ok with pure Jenkins fronting the show. The big guy probably needs it and a super-well-tested environment and a guy to call when it has issues.
I don't know anything about other communities you've been a part of, but I think our track record shows what goes where
and people are comfortable with it. Ansible isn't open core. It's the real deal. We take that seriously.
Yet, I think the general assumption that all software has to be purely 100% open is flawed, but that in general, open source communities can build some things in GREAT fantastic ways, and certain layers do benefit from being free software. But companies need to exist. Including yours! (Though I do love me some Uber).
Anyway, ansible-pull is indeed an option if you wanted to go that wrote, or even doing image builds with Packer. Both popular options for immutable systems and/or autoscaling, sans commercial bits.
But is commercial software dirty? Heck no. Ask any SaaS company :)
But is commercial software dirty? Heck no. Ask any SaaS company :)
The pedantic in me feels compelled to point out that commercial SaaS doesn't have to mean closed; the company I work for is a good example, where our products and service are based on an third-party AGPL licensed software called Odoo[1] (and are therefore AGPL licensed themselves).
In any case, we do use Ansible here, and are happy with it :)
We're using Ansible and building the AMIs on a dedicated ec2 instance (started for a build and shut down afterwards). The AMIs are fully baked and environment information is configured via user_data in the launch configuration.
We use SSH to communicate with the build instance as a result, but I'd rather spend time during the build than during start-up of a new instance.
Not the person you asked the question of, but we’re building AMIs (and VMware images) using packer.io (via the masterless puppet provisioner). It works nicely and with a minimum of fuss.
While I have been a happy Ansible user for some time, the criticisims that the author pointed out that really resonated with me were:
- Ansible is slow even when it doesn't have anything to do. This is true. For example, we manage lists of former users that should not exist on systems, this gets quite slow. I think that the slowness is mostly due to SSH, but it could be smarter about bulk operations, I suppose.
- Custom DSL looping and conditionals. This was intended to make the system simpler and easier, but I agree with the author that I also have to revisit the documentation since looping in a template (jinja) is different than looping in a task (with_ directives).
- Task variable registration opacity. Yup, lots of debug: actions.
People in IRC are pretty friendly, but I did get a tone of "you're doing it wrong." This is exemplified, I think, by the author's "global ignore_errors" feature request. I made a suggestion that ansible-playbook should be able to run a role without having to create a stub playbook that calls the role. I ended up creating a bash script for it, but the response on IRC was in the vein of: I don't use it that way, you are doing it wrong. To me, Ansible is another tool in my sysadmin chest, I am going to use it in the way that works best for me. It's nice if the tool supports my workflow.
The remarks about the friendliness of the Salt community are enough to get me to take another look... Oh, and also that Salt released its webUI (Halite) to the community, but Ansible's AnsibleWorks is closed. A UI can go a long way towards increasing usage.
It's lightyears better than Puppet/Chef, and I am glad both exist. :)
So I think we do believe in teaching users the way to use the tool, rather than making every possible request in cases where things aren't clear. And that usually means making the docs self-convey what those ways are.
When you get to a project of Ansible's size, yes, we do have to be pragmatic about what we spend time on, so we like to look for patterns. if something gets heard from 15 times, it's definitely a thing. If something gets heard from once, we're most likely going to show the idiomatic way to do something in Ansible.
AnsibleWorks is actually not our company name, it's just "Ansible, Inc", and yes, our UI is closed source. But that allows us to hire a ton of people to work on it too, and I think we've made the right choice. I wanted our company to not become a support firm or a consultancy, and focus on products, such that we would always be motivated to keep the tool as easy to use as possible. The product thing is the natural place to take it in that case.
It's still free to use for 10 servers forever, and I think most of our users think we made the right choice there.
As for slowness, do check out the blogpost linked below, though upgrades for particular modules are always welcome.
I do think the custom DSL was 100% the right choice, as Ansible is a 100% valid data format, machine parseable, rather than something that only evaluates as YAML, and is not parseable.
Try "-v" if you'd like to see output without the debug, though the idea about having a "verbose: True" on the task might save some output. I'll think about that one.
If you think ssh negotiation is the slow point with ssh, have a look into 'persistent connections'
The below is an example setup of a session that persists for ten minutes after last logout. Subsequent ssh attempts (or new parallel ssh attempts) will piggyback onto the session and avoid the renegotiation delay.
host *
ControlPersist 10m
ControlPath ~/.ssh/master-%r@%n:%p
ControlMaster auto
I would add that the default output is a bit kludgy.
Rendering output to JSON is super annoying when your commands have lots of '\n....\n....\n....\n....' in them and you're trying to find the "line" where the relevant error message appeared from the command called by your state.
Also, there are some messages (e.g. ssh key verification failed) that require a higher level of verbosity than they should for the correct error to appear.
Still light-years better than puppet or chef though.
> It's lightyears better than Puppet/Chef, and I am glad both exist. :)
I'm pretty new to the world of CM, and have just started playing around with Chef and Vagrant. I've been pleasantly surprised by the utility of Chef (i.e. miles better than setting up machines by hand or shell script and worth the learning curve).
Are there any particular areas of weakness when compared to Ansible/Salt/etc.? I've read a few Chef vs. Ansible vs. Puppet style blogposts, but they never seem to come to particularly strong conclusions.
From experience here is some of the CM tool downsides that might help you
Chef - Ruby DSL is hard if you don't know Ruby. Lots of infrastructure to manage (if not using hosted Chef). On the fly orchestration requires 3rd party tools or Enterprise License.
Puppet - Custom DSL is json-y which for some is easier than Ruby. Scaling problems because puppetmaster compiles the manifests (instead of having nodes compile). 2 tools/interfaces for config vs orchestration (mcollective) gets confusing and not very consistent with features.
Ansible - pretty much a lot of what the article said. A bit slow and custom loops/dsl sometimes gets confusing. Managing hosts file is mostly the only "infrastructure" you need, but still is annoying. No Windows support (yet)
Salt - Not as mature so it can't do some advanced stuff Puppet/Chef can do. Last I looked at web UI (Halite) it was not much to look at. Hardly any integration into 3rd party tools (most favor Puppet)
Don't get me wrong, I love CM tools and the pros list would be 1000x longer than cons. But they all have some big downsides that hopefully will get better in the future.
Everybody seems to be taking about moving away from Puppet lately. Maybe I just don't do anything sufficiently complex with it, but I've never had any problems or gripes with Puppet. 99% of the time it seems like the thing I want to do has already been done in a well-written module on the forge.
The author seems to cite two main reasons for wanting to move away from Puppet: their codebase was large and badly structured, and their techops team didn't know Puppet well enough to manage it. Neither of these sound like problems with Puppet itself -- they're certainly not unique to Puppet. I'm not convinced that moving to a newer, less mature technology (which I assume techops don't know well either) will solve these problems.
There's definitely more reasons. I didn't want to detract too much from the topic of the blog post when I wrote it, since the post is already obscenely long.
Puppet doesn't have native support for a lot of things, which require us to either implement it in puppet's DSL, or in custom ruby, which the upstream won't take. For instance: git, gems, pip, virtualenv, npm, etc. etc..
Puppet doesn't have looping. I'm always told: "Iteration is evil. Puppet is a declarative language and if you're needing to loop you're doing something wrong." But it's simply not true. Looping making things insanely simpler.
Puppet isn't executed in order, even for the same service in the same environment across systems. You have to very diligently manage every require for ordering, and no one does it right. This had lead to systems unable to run first runs really often, which causes problems with autoscaling. I don't enjoy spending my time cleaning this up often.
Puppet's DSL is full of little gotchas that constantly cause issues for developers who aren't very familiar with Puppet.
Half of our team was very familiar with Puppet. If you look at my blog, quite a few of the older posts are about Puppet. I worked on the puppet infrastructure at Wikimedia Foundation for a long time, and released all of the puppet code as open source (they have 60k+ lines of puppet).
I'm a little sad because most of these issues (as I understand your description them) are already fixed or well underway :( It's probably too late for your specific case but I'd like to reply anyway since a lot of this is "conventional wisdom" based on old information. Full disclosure: I'm the product owner for Puppet and before I worked here, I ran it in large-scale production since 2008.
Not quite sure what you mean by 'native support', but gem and pip package providers are built-in. there are high-quality modules for git (puppetlabs-vcsrepo), virtualenv (stankevich-python), npm (puppetlabs-nodejs), etc -- it's a design decision to move much of this into modules and out of core so they can iterate faster.
While the model definitely wants you to describe relationships between resources if you need to send subscribe/refresh messages, there's toggle-able ordering algorithms that will let you run them in manifest order -- I blogged about it here: http://puppetlabs.com/blog/introducing-manifest-ordered-reso...
The parser and evaluator are undergoing a total rewrite to be an expression based grammar, which is explicitly to make better definition around the language and eliminate the gotchas -- https://docs.puppetlabs.com/puppet/3.6/reference/experiments... (this will also be the default on the next semver major)
Native support for things is irrelevant cause you can use modules from forge, and the community is the largest of all other CM tools around, so I hardly believe that you lack something there.
Actually, you can circumvent lack of looping with defined types and calling them with array. In my opinion if you need loops in your infrastructure code you're doing something wrong.
Saddest thing is that from all the people who brag about migrating away from puppet online nobody actually mentioned some of the drawbacks that are REAL and present - and not even discussed in Puppet community - like lack of simple search function vs complexity of exported resources... that means that people are moving away for reasons different then functionality alone...
Another real issue is the slowness of compile process, which happens on the master. But it's OK for "smaller" deployments - like if you don't go above 10-20k nodes.
Had the same thought. Puppet code was bad and no one knew Puppet. Seems like a fine reason to move. But could be the other way around. Could be Ansible code is bad, no one knows it, lets move to Puppet!
The cool kids have a new fad so you're not cool unless you dump puppet. No technical reason at all as near as I can see. Its pretty much the same as "Perl hate", why do we hate Perl? No reason at all, other then being cool means hating Perl! Very middle school social dynamic.
My puppet manifests is 16K. My modules is larger but I've got some large files stuck in there (long story)
There are meta questions like:
What are you doing with 15000 lines of puppet? I have a couple thousand and feel a bit over extended, like why am I doing this.
How are you replacing ten lines of puppet with 1 line of alternative when all I'm seeing in the examples is replacing 3 lines of
Like, where is the big win where those 3 lines of puppet are being turned into 0.3 lines of Ansible?
There is also the question of why I'd configure individual groups on individual machines instead of just tossing it in the LDAP once, probably by hand. Or distributing a system wide /etc/groups much as I used to share a division wide emergency /etc/hosts (like, this is the minimum /etc/hosts required to conveniently fix DNS if DNS breaks).
(edited to add actual numbers. I have ldap and getent group | wc -l reports 76 groups. I could replace that with 76 groups * 3 lines per group plus a blank line between entries = 304 lines of hand maintained code. But in 3 lines I could distribute a golden /etc/group to all machines. Or in a few more lines I could make all my machines use LDAP and get passwd and some other stuff centrally controlled for free (and yes I use ldap for passwd and no I use kerberos for auth, so passwd just holds home dirs and stuff like that). So I could write hundreds of lines of puppet to get out of editing one golden group file or get out of running ldap, but the alternatives are so much easier...)
There exists a meta question of allocation of resources. You can do "everything sysadmin" in puppet. Or make a universal does it all gold image that is well backed up and enables or disables parts of itself based on role and never automate its configuration at all, just spin up images and give them "special" hostnames and they sort themselves out. Or not automate trivial parts. Or place some weirder config stuff in a shell script technically not part of puppet other than being distributed, run, and tested for error free operation. Or a mix across all. So I could see a "gentoo-like" start with an official distro image and use nothing but puppet to do everything taking 15000 lines of code, maybe. But that sounds hard... do it a different way, no need for different tools.
I have a category on my blog dedicated to LDAP: http://ryandlane.com/blog/category/ldap/ I used it very heavily at Wikimedia and had very nice integration with Puppet. In general I think it's good to avoid LDAP if possible. It adds a point of failure and assuming you're not managing thousands of users (we were handling about 5k users in Wikimedia Labs), it's generally more work than managing users in Salt/Ansible/Puppet.
We didn't save a lot of lines of code replacing the user/group code with Salt. We saved a lot of lines of code by using native support for git/pip/virtualenv/npm/etc, which were implemented as a mix of custom puppet DSL and ruby.
We could have likely saved 3-5k lines of code from a puppet rewrite from scratch, but it still wouldn't have been as simple as the Salt or Ansible code.
"Its pretty much the same as "Perl hate", why do we hate Perl? No reason at all, other then being cool means hating Perl!"
Ummm, no. Believe me, having worked in a PERL shop for nearly 10 years, I was ecstatic to start working in Java (!). And I have zero regrets. It's nothing to do with being "cool" or any other pointless patronising insults to other developers. It's that badly written PERL is the worst excrement ever to have been smeared on a computer screen. Yes it can be clean, clear and readable, but only in the hands of an experienced expert (using "modern PERL", which only started to exist ~6-8 years ago) backed by stringent code review and consistent team practice. Without the infrastructure in place, you end up with reams of unmaintainable, slow, buggy, eye bleed code. I've seen plenty of PERL from plenty of projects and plenty of different developers in different institutions and companies and the vast, vast majority is crap.
The last bit of code I had to rewrite (5 years effort by a weak PERL coder, replaced in 3 months in Java, maintained by someone with no previous Java experience quite happily) would have made you cry. I have never seen anything like it in any language (note, I don't work with PHP either), and I don't believe such a steaming mess would be possible in any other modern programming language, let alone from someone who had been a professional programmer for 10 years.
To me, your attitude is the problem the PERL community has. The language deserved the bad reputation. Until modern PERL appeared, it was almost impossible for a beginner to produce anything like good code. It is possible now (I have seen beautiful clean PERL), but instead of trying to educate and bring people back to the fold, the community has a massive chip on its shoulder, refuses to admit the problems PERL has (still), or that it ever had any, and proceed to insult everyone else. If PERL hadn't learnt from the trends in other languages (esp a decent OO system, Moose), it would be literally dead by now.
I haven't looked at Salt, but I had a love/hate relationship with Ansible so far.
To be clear: Starting with Ansible was amazing, the first couple steps were easy and enlightening. Maybe I'm expecting too much now and act entitled or something? That said, it broke down rather quickly.
- My first issue was documentation. This article is correct about the current state of the documentation, but the site was in a really bad state in limbo (between redesigns or something) for quite some time. Offers on the mailing list (Not by me) to restructure the website, as a community effort, were declined. Basically the documentation was, from this point of view, unusable before the current design went live. Broken links, no easy structure.. It was 'an adventure'.
- The bigger/biggest gripe: Everything I try to do in Ansible seems to turn into a shell script. Limitations in Ansible and the "Use a template for bug reports"/laggy response on GitHub lead to workarounds all over the place, where I had to resort to 'raw:' and/or 'shell:' where there should be a reasonable way to do things. One (of quite some) examples would be [1]: For starting random services (postgresql, dovecot in my case) Ansible just breaks and hangs forever in my environment. Ah well, let's resort to shell: service postgresql start (which .. doesn't do change tracking, isn't the same thing .. but works).
I'm really happy with what Ansible allowed me to do. I'm not satisfied with the result I have here and still look for a way to drop all my (necessary!) debug: and shell: modules for a different solution.
The docs reorg you mentioned happened Christmas of last year and most people are really really really happy with it now. We haven't done a major reorg sense or needed to, but the company was only a year old at that time, and it got to the point it needed to be done. Definitely took a while to appreciate all the different learning styles of people using the docs to find something that works for everyone and took some wrangling with Sphinx too!
I don't think it's fair to say we declined community help because one of the most amazing things we have in docs - the module docs generator that builds half the website, is a community addition. There were also various attempts to build Angular JS versions that looked crazy awesome, but the search engine problem wasn't solved at the time, so we were unable to use them.
I'm not sure why people don't like the template, but it's a common feature in Bugzilla - frankly, we spent so much % of our time asking what Ansible version was, this allowed us to service everyone's GitHub a LOT faster, and gives us the ability to work through everything so much faster and ensure better quality.
The bug template is important. As for lag in GitHub response, there's a priority system for tagging tickets, where we hit P2 items first, and then some others. Ultimately, we're devoted to stability and hitting the biggest things first, and have to avoid "hey look, a squirrel" syndrome. Part of the cost of having one of the most contributed to projects in GitHub in terms of users is does take a while to review everything and we spend a lot of time on triage.
Hey. I think you misunderstood what I was trying to say.
The docs: Well, they were in a mess for a while when I started and I agree that they're really neat now.
Declining community help: I was referring to a specific ml thread that I stumbled upon when I was unhappy with the (previous!) state of documentation, wherein someone asked whether you (both the company and you as a person) would consider putting the site in git / opening it for community improvements. You declined. That doesn't mean that I judge you for that decision, it just seemed like a wasted potential at that time to me (Given: "Site in disarray" and "Free help offered"). Nowhere did I state that you don't accept community support per se.
Template: Well, the big problem might be Github's support for this 'feature'. If I want to file a new ticket [1] there's nothing helpful here. Yes, there's a rather bland "Review the guidelines.." link, but frankly I didn't click that. Why? I know how to use Github to file tickets. It doesn't say "Please read this or your tickets will be closed" or even better, just embeds the template you require in the new ticket form. While I certainly understand that you want/need some structure, the user experience is currently Not That Good.
Lag in GH response: That .. wasn't actually my point. My (random, sample) ticket was promptly active, nice people discussed it. I don't even care too much about the fact that it isn't solved after six month. I was mostly trying to point out that Ansible, for me and in my personal use cases, seemed a little unreliable and incomplete. This is one of the reasons I _need_ to use shell: or I cannot have a playbook that starts postgresql or dovecot, period. Is it important for Ansible Inc or the world? Probably not, but workarounds like these are the reason I don't like looking at my playbook anymore.
I rejected Dockerfiles because a random list of shell commands isn't what I wanted. My Ansible files are now a mix of clean/official modules and some of the very same random shell commands, and not by choice.
Let's close with:
- I appreciate your project/product. It helped me a lot (see first sentence in the gp post)
- I'm sure Ansible works great for scenarios of various sizes. I don't claim my experience is to be expected for everyone (but note that some people at least have expressed similar feelings about the 'yml files turn to shell scripts' idea)
Ah, the site in git. Yeah ansible.com (our corporate presence) being in git is unlikely to be a thing :) Nobody does that of course, but we do have the entirity of docs.ansible.com in git and that's been that way for a while - and there are github contribution links on most docs pages that aren't code generated. The ones that aren't you can edit the module source directly and the DOCUMENTATION are embedded in there.
I really wish GitHub did have template support and have asked a few times :) We've actually never auto-closed a ticket so I'll smite that comment, we never implemented it. However the template is still helpful and all that. The new GitHub issue reorg is a step in the right direction and I think they'll continue to improve it over time. We definitely could be in something like JIRA, but, ick, that's not where the users are and the barrier to entry to tickets there is high. So we're left with whatever workarounds :)
Anyway, comments are all good, hope that clears things up a bit on our end too.
"Everything I try to do in Ansible seems to turn into a shell script"
This was my disappointment with Ansible (and other CM tools) - so why not treat the shell as the basic unit of action? See my post elsewhere on this page for more: https://news.ycombinator.com/item?id=8135823
This comment is disturbing because it assumes there is a wrong way to do things. In fact, the point of managing state is to react to the changing state of different resources (ie. services in a service-oriented architecture, the physical or virtual systems they run on, the networks that connect them, etc.) and to automatically resolve failures through known and tested state-migrations. If you missed that, you're in no position to be calling people wrong. Further, anyone wrapping bash in python and calling it elegant is insane.
"In fact, the point of managing state is to react to the changing state of different resources (ie. services in a service-oriented architecture, the physical or virtual systems they run on, the networks that connect them, etc.) and to automatically resolve failures through known and tested state-migrations."
They should be part of the definition of your system (ie the state), not changed on the fly.
If I said ShutIt was elegant, I was wrong (not sure where I did). It's not elegant, just as the real world is not.
Anyone trying to make config management look elegant is selling you a pup.
However if you want to write a deployment script, it also lets you, rather than fighting it kicking and screaming :)
A thousand times this! I, personally, find YAML easier to grok than whatever Puppet was using (see, post-puppet PTSD selective amnesia). And, anything that doesn't work, on a deadline, can be shell scripted now and modularized later.
The term CMS has been used to mean configuration management system for longer than it has been used to mean content management system. One can find articles from 1990 using it in the former context, while the latter appears to have been used since the late 1990s.
Ansible is nice, but I share the same gripes as darklajid. Plus, with Docker taking off, I question how valuable Ansible will be going forward. I see it as a "nice Chef" or "usable Puppet". Not revolutionary.
The overlap of Ansible and Docker is pretty stratospheric in adoption levels. As more fleet management services exist, to us, it looks like another VM type, and all those cloud modules will also help orchestrate it.
But now, people are using it for both image builds and placement in great number.
I read that for some of you the Chef experience was painful. I'm using chef-solo with the chef-solo-search cookbook and everything is working pretty fluently. Each of my node owns the entire repository and apply chef-solo on itself. With a cron to periodically update the chef repository, it is really confortable.
I agree that using chef-server is a bit painful (that's why I don't), but otherwise there are a lot of cookbooks and it works well. What kind of bad experience did you get?
I wish that Ansible would work with orchestrating Docker containers.
Here's my thought - Docker is replacing the use case for using Ansible/Chef/Puppet for a lot of people. It is far too easy to build portable docker machines and deploy them on bare metal. For me, the use case of provisioning a softlayer server and then setting it up using Ansible/Chef is no longer present.
However, the problem of orchestrating a bunch of Docker machines is still unsolved. I was hoping that Fig would solve it, but by their own admission [1], Fig is going to be closely tied to Orchardup and not intended for general use.
So, if I want to launch a hadoop cluster over 20 Docker VMs, physically hosted in 5 different servers... I really have no way today. Notice, that the complexity includes setting up bind-volume mapping, logging, passing of variables from one Docker VM to another, etc.
I'm not sure if Chef is more suited to this, given that Octohost moved from Ansible to chef for a Docker PAAS [2], but I would definitely love for Ansible to do this part really well !
True, but they are unviable for most startups.
most of the solutions outlined here are very, very heavy. I'm a 2 man startup and really cannot invest into mesos to deploy a 4 VM cluster.
But the news that Kubernetes is leveraging SaltStack is hopeful.
You might be interested in the Openstack deployment tooling called 'tripleo'[1] which has similar questions and has avoided all the current config management tools. The general gist is that what you're describing can be done using tools like Cloudformation/Heat or the newly minted Terraform, since they can both orchestrate the hardware/cloud resources and pass data in/out of the guests.
thanks for this. this is great, but could you build a more sophisticated example with port mapping, volume bind mapping and passing of variables to containers?
if you look at a fairly trivial fig.yml, you'll know what I mean. This is what enables a fairly common usecase (e.g. wordpress docker -> mysql docker) to be setup fairly quickly.
You might want to look into Apache Mesos. I don't know it it works with Docker specifically, but it does manage Linux containers (which Docker is based upon).
Fleet part of CoreOS does orchestration by leveraging systemd.
I am actually working on stateless deployment of interdependent docker containers by pushing docker state to VMs much like Ansible.
I probably am dating myself here. But with cloud infrastructure what is the point of these configuration management tools? To get the configuration of an instance just fire up a copy of instance. To install something new, have a script install it on one manchine - monitor it, then start deploying it. You have versions of instances, backups, and exact copies. If you want to push, it is 10 lines of bash with a git pull and ssh public keys. AWS has an amazing API. I have seen guys try these 9k+ lines of complex syntax Salt systems only to break things, misconfigure them, and leave the system totally dependent on the author (aka the genius). We have ran systems of 100+ machines with a few lines of bash - so I am blown away at this new complexity. PLEASE help me out.
This is also more of the 'containerized'/Docker-like infrastructure development workflow.
Tools like Ansible and SaltStack also provide pretty robust infrastructure orchestration/management tools that are conveniently provider-agnostic. I save a ton of money by spreading out servers for one particular service over a bunch of lower-cost providers (rather than AWS), and use Ansible to manage them all.
If you play in one particular cloud infrastructure, image-based configuration and provisioning may work fine, but if you need to support the movement of images from developer workstations through to different hosting providers (whether using Docker, CM, or bash scripts), Ansible can help with that (as can Packer, Terraform, etc.).
There is certainly a cost/benefit to these tools you have to consider.
Have a few machines you only do basic admin on occasionally? A CFM is probably too complex and a waste. Have a huge infrastructure that scales rapidly, and you have daily changing requirements, or repetitive tasks? It's a life saver.
If you can happily and efficiently manage 100+ machines with a few lines of Bash.. you probably shouldn't change that.
Just copying is not enough sometimes. If you want to clone some production images to your dev/test environment you need to change some params in production image to make it work.
For example, if you have nginx in production environment that proxies queries to set of upstreams it's necessary to change server addresses in that upstream to local dev servers.
>I have seen guys try these 9k+ lines of complex syntax Salt systems only to break things, misconfigure them, and leave the system totally dependent on the author (aka the genius).
A lot of it is job security, even if that job doesn't pay them anything.
> I did get a “pull request welcome” response on a legitimate bug, which is an anti-pattern in the open source world.
Can someone explain why this is an anti-pattern? Is there some sarcasm I'm missing? Seems like exactly the kind of response I appreciate when I submit issues in open source projects.
"Pull request welcome" usually means "This is a legitimate bug, but I don't care enough to fix this for you."
Some people believe that maintainers should fix all bugs that are reported to them. Other people believe that the open-source nature of the software should cause people to fix their own bugs and contribute the fixes back to the project, and both camps often believe that demands on their own time and effort are unreasonable.
In our case, one of the things I want to do is run it as a fully legitimate open source project.
In this case, we're going to be open and say when we can't work on something, or when we're unlikely to work on something, because we've got those 800+ contributors at or door asking for things.
There's a lot of triage.
In the past I've seen other projects take a few alternate routes - leave everyone hanging (unfair) or auto-merge everything (unstable). So that's kind of where we're at.
We do recognize we don't have /limitless/ resources, but this is kind of what you get for having a project on GitHub with so many stars and forks.
The user and testing community is absolutely awesome, but I when we say we aren't going to do something, it's because we want to be clear where we stand or have a conversation, or encourage people to contribute.
As Spock said "the needs of the many, outweigh the needs of the few or the one". Triage!
This is not a problem of entitlement where people expect that you fix their bug for them. This is an anti pattern because many people consider this type of answer rude and it doesn't create a welcoming community.
Saying "This is a legitimate bug, but I don't care enough to fix this for you." is already an order of magnitude more polite than "Pull request welcome" or the older "Patch welcome" , explaining in details why and if necessary how open source work even more so. You have to remember than not every one know the Open Source community speak. If you can guide the reporter on how to create said pull request, even better.
Yes its take more works and it's less fun than hacking at code, but building a great community is a lot of works. It's also, for me at least, what separate good projects from great ones
This is a legit bug results in the bug staying open.
Pull requests welcome is "I feel this is a feature, but we'd be open to you working on it".
I think one of the great tragedies of the internet is people assuming people say things they don't mean.
And yes, building a great community is a lot of work, and it's something we spend a TON of time on. And it's why we have one of the most contributed to projects on Github.
Getting to 810 contributors is really hard, and you don't do it easily :)
In this specific case I submitted a bug and was told the bug wasn't valid and it was closed. After I pointed out why this was in fact a valid bug, the bug wasn't reopened, but instead left closed while I was told "you're welcome to submit a PR". Basically I'm being told the bug isn't important enough for the upstream to fix and that they care so little about the bug that they won't even leave it open for someone other than me to fix.
It's generally considered a rude response in the open source world because it's telling users they aren't worth your time. It's a warning sign of an unfriendly upstream.
I'm sorry you feel that way. In our case, we get a TON of bug report traffic - many are just user questions which we'll direct to the list, some are just nice to haves, we file most of the good ones, but not always.
Though I would consider performance tuning of the user module not a bug, and I do not think the newline behavior of copying the file on the filesystem was a bug either.
A discussion on ansible-project would have been welcome after you felt we had taken the wrong track, but when we feel some requests aren't worth our time, it's because we have a huge audience to serve and are triaging everything.
We feel it would have been unfair to you to let it sit infinitely when we were unlikely to spend time on it.
Yep. I understand that, but part of having an open source project is that others may find open bugs and decide to fix them because they're also having the same issue. Closing legitimate bugs hides them from the world and also gives people the impression that it's not something to fix.
The performance issue was very likely one of the more major deciding factors. Managing users was so slow that it was painful to do small iterative development. Slow performance is definitely a bug.
I think it's something that can be improved, yes. I'm not sure it's a bug, and I'm not sure it's really all that slow. We're talking about 0.5 seconds and maybe it could get down to 0.4? If you dig into the module I'm not sure what you would change. (Again, a fine discussion for ansible-devel probably? How would you solve it?)
In your case, managing a list of 80 users to be sure there or not, I might have suggested perhaps tagging that action and only running that every so often, but I do think that, in general, it wasn't a pressing thing for us.
There are going to be occasional tradeoffs to the way the task system does work (ability to be split declarative/imperative), but those are some of the prices to be had for the flexibility that can by (like "register:" versus the limitations of a server side compile up front).
I think I'm ok with that, all being said. It's how ansible came to be.
There are choices that you take building things one way versus another, and if we're down for time for a coffee and three spins around the office chair, or time for coffe and two spins around the office chair, it's still in statistical noise territory.
We have spent a lot of time optimizing the HECK out of the SSH transport, but no matter what, almost all deployments in any config tool, the majority of the time comes down to waiting on yum and apt. And yum and apt are brilliant and I love them, it's just where things lurk :)
As a comparison, Salt checks the users, groups, ssh keys, etc. in under 1 second. Ansible to do the same set of actions was taking nearly 2 minutes. This was just to check, not even to take action.
So, yeah, the majority of time in an initial run is waiting on apt/yum, when nothing is changing the majority of the time is spent on checking things.
When you're making config management a part of your application's deployment process, waiting on checking is painful. This would have added 2 minutes to deployment time. When we doubled the number of users managed it would add 4. That's a really, really long time.
Ansible seems like a cool project... thanks for stopping by here.
One question, though. 0.4 seconds seems like a very long time to query one user/group, though. You should be able to query thousans of UIDs a second, minimum... unless you're using LDAP and you have a slow network or something. You could write a simple C program to query a bunch of UIDs and I bet it wouldn't even take a millisecond to run. So where is the overhead here for ansible? I apologize if this is a dumb question... I am not very familiar with the architecture.
it's definitely true that most of our users are deploying configs and applications so there's not a lot of user management, but the user management is definitely robust.
We're using GNU user tools in many cases for correctness and efficiency of not reinventing the wheel so you might wait a little more for them.
We're open to tuning but it's really not been a problem.
It's still statistical noise in the end, and, yeah, like we said, we've got more important things to work on first. It would be nice if we had time for everything, but this just doesn't rate in the grand scheme of things right now, still.
Someday, perhaps! Meanwhile, try things out, I don't think this will matter in practice for most folks :)
> In this specific case I submitted a bug and was told the bug wasn't valid and it was closed.
Maybe that is what you might have said in your blog post instead of a snarky comment. Remember, many people in open source are not from the upper middle class United States / West Coast and will likely not pick up on clever passive aggressive jabs. I completely understand that they are great for being able to deny any accountability for your attacks, but it generally leads to a lot of misunderstanding. Especially from those who do not speak English as their first language.
I really don't mean this to jump on you, but hopefully you might take it as some advice when dealing with large distributed projects that passive aggressive snark, I would guess, ends up actually going over the head of 50% or more of the people.
It wasn't meant as snark or passive aggressive. It's a common topic in open source projects, but it's possible that I was speaking towards an audience that already knows the topic. Sorry if I hadn't explained well, that's my failure.
> The DevOps team felt that the Puppet infrastructure was too difficult to pick up quickly
Uh. I hate to break it to you, but rewriting your infrastructure from scratch isn't quick either.
> Code should be as simple as possible. Configuration management abstractions generally lead to complicated, convoluted and difficult to understand code.
All code becomes complex over time if you do something different with it. Refine your abstractions instead of throwing out code. Or use more composable components instead of writing new code.
Finally i'd add that before you throw out a thing, your main concern should be "is there something we cannot do with the existing thing?" There will always be a better wheel, but if your existing wheel works, you should probably stick with it.
Having deployed salt to a medium sized cluster ~1500 farm machines, and around 1500 desktops, the one thing that salt won't do is scale.
Salt has a lovely system where clients attach themselves to a zeromq and listen for commands. However after about 500 clients it starts to fail silently and not all clients update properly.
The way we get round it is to run salt-call on the client at specific intervals. The other annoyance is that is horribly slow (60 seconds plus to run 100 ops (excluding yum operations))
having said that, the YAML syntax with optional python extensions is grand. Whether its quite ready for mainstream adoption is another matter. It sort of works for us.
Personally a fan of Ansible, but I've also been pretty impressed by SaltStack as well. Either is much simpler and easier to use than the older generations of configuration management tools (cfengine, Puppet, Chef.)
I've never heard of ansible being referred to as a new generation. What do you think defines this generation? I use puppet and chef a fair bit so I'm just curious on the new features offered.
My take is that this "newer generation" of tools seems to focus on combining configuration management with orchestration. Chef and Puppet let you define the static state of the world but leave it up to you to figure out how to transition when something needs to change.
On the other hand, Ansible works well as simply a remote task runner (like Fabric). Salt is the one I have least experience with, but I had a conversation with the creator once and he seemed excited about the orchestration possibilities with Salt. If I understand correctly you can react to events that get triggered either manually or based on a condition on some other server you're managing. So both of these tools make it easy/natural to do something like run a rolling restart of a group of servers.
I'm not finding the new generation term particularly meaningful.
One thing that was somewhat unique about Ansible was it was designed for rolling updates as the initial use case, and the desire to solve deployment problems rather than just CM problems.
Everybody tends to view orchestration differently, so see our take:
>I've never heard of ansible being referred to as a new generation. What do you think defines this generation?
IMO, it's three things:
* A push-by-default model rather than pull-by-default (that never made sense to me: option, maybe - default, HELL no).
* A focus on minimizing the dependencies (puppet has a ton of annoying unnecessary and attack-surface-increasing/ RAM-gobbling dependencies from the agent to the SSL authentication).
* Not using a DSL - just using YAML and an intentionally dumb templating language - helping to enforce a far cleaner separation between configuration and code (the divide can get muddied with puppet because its DSL is too powerful).
We've been very happily trucking along with Ansible the past year or so over here at Front Row. Tried Chef for a few weeks, hated every moment of it, switched to Ansible and it all made complete sense.
For us Ansible takes care of configuring the various types of machines we have in AWS, of building, testing and deploying binaries, of configuring and keeping our development environments in sync and more.
It's pretty exciting that the project keeps getting better with every version.
I've been using SaltStack + SaltCloud in a production environment for the past six months or so -- it's been a total joy compared to my experiences with Puppet / Chef.
We are moving from puppet to salt and I'm half way through and so far my git commits looks like this over the past month
puppet repo -14000 lines
salt repo +1600 lines
What it really comes down to is salt has a ton of built in modules while puppet the old way to do it was add it as a module in your main module search path which for us was in our repo
Too many of the other alternatives seem to be focused on the easy part of the problem (running commands on lots of nodes) without putting enough effort into the hard part of the problem (automatically deciding which commands to run to get to the desired state).
It would have been interesting to see them add Puppet to the list of tools to evaluate (while doing their best to do so objectively as 'new users'). It seemed to me like most of the issues they'd encountered were self-inflicted, rather than the result of using Puppet specifically?
Interesting to see that Salt seems to have a slightly higher following here compared to Ansible.
I'm managing around 10-15 servers only but after having it all set up with Salt for the last year, I am now migrating it to Ansible despite it being a big hassle. I find it much more straight forward and am happy with the documentation so far.
Salt has bitten me twice in that after (non-master) server updates commands would fail with non-descriptive error message. I reported it as bugs but got too frustrated in the end and decided that with a new server I will start a migration to Ansible.
Very happy so far even though I do see the problems of speed (haven't investigated tuning it) and that it seems to require too many shell work arounds. But conceptually it seems much cleaner to me.
Definitely investigate the tuning options. ControlPersist + pipelining does awesome wonders. We have pipelining off by default for max compatibility just so nobody gets stuck on an initial install, but feel free to stop by the list if you have questions.
Using "with_items" on yum/apt transactions also saves giant loads of time keeping things in single transactions.
I have used Ansible 6 months ago and it felt slow.
The biggest issue which is intrinsic to the model is that each task is executed sequentially across all the target hosts. It makes it's behaviour easy to understand but it also makes each step as slow as the slowest host.
Another issue that might be fixed now is that each task is essentially a script uploaded to the target and then executed locally. Unfortunately at the time the scripts weren't cached properly so N invocation of the same task would mean N uploads of the same script.
That being said it's really simple to use and I recommend it if you don't have an existing infrastructure management system. Ansible fits in well as an orchestration tool.
Please read the tuning article on the blog for sure. It's definitely not slow and we have folks updating 5000 servers in 5 minutes. (Yes, really!) ControlPersist and the like are key, and we'd be happy to help discuss options for you.
As for sequentially, set --forks to control parallelism. Steps are executed in order, but that's true of all CMS systems.
> It's definitely not slow and we have folks updating 5000 servers in 5 minutes. (Yes, really!) ControlPersist and the like are key, and we'd be happy to help discuss options for you.
It's good but I wouldn't describe this as fast, it should be possible to increase the performance by another order of magnitude with some optimisation. Web servers can easily serve 5000 request per second even when SSL is involved, why couldn't Ansible do the same ?
After enabling ControlPersist, the next optimisation is to run Ansible in the same datacenter. Latency is a killer when deploying to us-east-1 from Europe.
> Steps are executed in order, but that's true of all CMS systems.
It's true on a single host (although puppet's and salt's ordering is not guaranteed). Ansible also orders across all the hosts. If you have tasks A->B->C, ansible will first run A on all the hosts and collect the results before moving to the next step. Each step is thus as slow as the slowest execution.
I remember initially looking at the saltstack docs and deciding, like the author of the post, that they were extremely dense at first glance. It's interesting to read that after he'd used salt for a while the dense documentation was useful.
Does anyone have experience using Configuration Management software in a heterogeneous environment? For example, I've seen large environments running Windows 2008/2008R2/2012/2012R2, various flavors and versions of Linux including Ubuntu Server, CentOS, SUSE, etc... What's the pretty? What's the ugly?
I understand consolidation and standardization of operating systems is usually the best state to be in, but in a lot of larger companies running legacy software it's not economically feasible to do.
We are very heterogenous--something like 60/40 Windows/Linux split.
Traditional Windows folks don't really use configuration management or even have any clue about it. Or at least that's my impression. I'm a Linux guy and have been fighting a one-man battle to CM-ize our infrastructure. I have no interest in using Microsoft's DSC on the Windows side (their brand-new CM-like solution in PowerShell) and something else on the Linux side, and since I'm a Python developer I gravitated to Salt.
I love SaltStack (no real experience with Ansible). Although it supports Windows in a sense, it's very rough around the edges. Many modules will fail or have weird edge cases on Windows. I've gotten to the point where the only module I really trust to work 100% of the time is cmd.run (which executes arbitrary shell commands). That said, it's been a total win so far. I've almost completely replaced ad hoc Windows server provisioning with version controlled, documented Salt states. It's glorious.
Mm, I'd say you're right on about some things, but slightly off the mark on others. Traditional windows folks certainly know at least some things about CMS, or rather, CM like functionality. WMI/WDS and friends are surprisingly robust when it comes to things like provisioning and patching, and powershell has been (and I say this as a primarily linux weenie) a breath of fresh air in the windows ecosystem, although I can't speak for its capability specifically as a CM utility. What I'd say is true is that windows folks don't typically know about linux CM, and visa versa. (At least, I certainly didn't know squat about windows CM when I started working in a heterogeneous system).
We made a similar choice as you did, going with salt for certain functionality (because as you found, weird edge cases/fragility of salt on windows) but at the root of things, you use the tool that works well for the system. And in some situations, that means living in a bipartisan world (WDS for windows deployment, spacewalk for linux) or looking for a solution that plays well in the sandbox with both (well), which is a bit rarer, ala salt.
I'm sure there are people who solved this problem way more elegantly, but for being pretty damn understaffed and new to devops when we started, it worked surprisingly well by the end of things :)
>Traditional Windows folks don't really use configuration management or even have any clue about it.
That's a tad unfair, I could say just as easily say the same thing about some of the Linux admins I've worked (and interviewed) with but that's not taking the discussion down a constructive road.
CM/DSC methodology is about awareness of the technologies available. There are a lot of admins out there, regardless of OS expertise, who've never heard of it full stop. I learned about it whilst working as a developer in the banking sector 12 years ago but using eye-wateringly expensive tooling from the likes of IBM and CA.
We have a 65/35 Windows/Linux environment, I have for years wanted to "CM-ize" our environments but we have two different silos of scripts and tomfoolery that get stuff done, we have a lot of friction points because of this. But one of the problems with CM tooling such as Chef, Puppet, Ansible and Salt has been the lack of sane support for Windows. Puppet seems to be getting better at it compared to the other three contenders. For example handling reboots sensibly [0] (and you know how Windows loves its reboots, and in the right order after some MSI or MSU has executed).
There is also a somewhat blinkered world view with regards to Windows i.e. "yuk, windows, not touching that", and at the risk of offending some, it's snobbery and cargo-cultism. A lot of the young folks around here have probably never tried modern Windows server management, it ain't that bad these days. If you can be bothered to learn bash and all this clever stuff on Unix, you can get a handle on learning Windows config management with Powershell which is very bloody good now.
The result is that we have silos of C/VBscript and Powershell code that go and built Windows environments in their own special Windows way because previously tools such as Chef, Ansible et al and their respective development teams don't (rightly but mostly wrongly) don't see any value in Windows support.
I speak as a platform agnostic devops person who has to live in both worlds and has supported Windows and Linux/Unix for longer than most of you have been alive :)
I work for a cloud service provider, and we use Chef in a heterogeneous environment. Several flavors of Linux, and Windows 2003-2012 (both 32 and 64 bit). The pretty is that Chef supports Windows very well, and the mature community cookbooks have good support for Windows as well. The ugly is that it makes testing more complex, but things like ChefSpec and ServerSpec + TestKitchen and Jenkins make it possible to release robust code.
The other CM software may have good Windows support as well, but I don't have any direct experience with it. Either way, the testing is the more critical component here, no matter what CM platform you choose.
ChefSpec and Test-Kitchen are really awesome. I tend to see Chef as a framework for automating infrastructure, not as a scripting language/environment to define resources.
Chef pays off in large scale infra or highly dynamic environments but chef-solo is still a bit lame (I juse knife-solo for that[1]). So most people seem to start with no-devops, shellscripts, puppet/ansible… later they will understand why there are more complex/flexible solutions out there.
It also depends on the background of the DevOps people: Coming from software engineering, you're probably familar with concepts like DRY, YAGNI and principles of clean and robust code. However when your team consists of people with admin-background, they have probably no experience and will write very bad code especially in less strict scripting languages. They are probably happier and more productive with strict configuration files (e.g. YAML) but in the end, they need to start programming…
This is the sort of question that needs a blog post to answer, IMO.
I have not had enough time with any of these tools to speak to the pretty, but I can speak to the ugly. The chief issues with these tools on Windows are package management, overall speed, and community focus on not-Windows.
Package management is the worst, IMO, and it stems from Windows and the majority of it's 'software universe' being commercial. Software is expected to install on many editions of Windows; it is not common to see edition-specific packages for anything not otherwise edition-specific. Software can be packaged and installed many different ways, some of which do not support unattended installation. It's not always clear whether a package is installed at all. It's usually difficult to repackage software that doesn't work the way you want it to, and even if it's easy to do you probably can't redistribute the result.
So yeah, in general, package management is the ugly.
In theory Puppet would be good for the Linux servers at least because it lets you declare things in an abstract way that can hinge on variables like distro, release, etc.
In practice the Puppet language is only tolerable to the extent that it provides (or helps you create) abstractions for everything, and now you have two problems as they say.
Interesting article but the design of the site made reading it give me a headache. I recently also made a move as well, went from Chef to Ansible and I am really happy I did. Chef was a pain.
My major pain point was the complexity of Chef Cookbooks in comparison to Ansible Playbooks. I could hardly wrap my mind around how to write my own Cookbooks after exploring some of the ones I used (Ruby, rbenv, Git, Nginx.) Another major thing for me at least was the documentation, it seemed like Ansible had better documentation to me vs Chef. Finally another thing was the product offerings of Chef vs Ansible. Chef has some paid versions that offer more features, whereas Ansible is feature complete for free and they offer a GUI and Support instead which I preferred.
I look for two things when considering configuration tools.
1. How does it handle cross-cutting concerns?
2. How does it handle complex configuration files?
For the cross-cutting concerns I use the firewall as an example. I look to see how multiple projects and modules (that are going to be installed on a machine) can declare their firewall rules.
On the complex configuration files, I usually consider Nginx and how to define multiple SSL certificates, SSL ciphers, load balancer backends, multiple web sites, and rules for locations on those websites.
On Nginx... perhaps I'm lost in the docs but beyond simple installation I don't see either attempting to handle the config files. Is it the case that one should deploy their own config or write something to define the config from templates? I must be wrong on that, but lack of clear and deep documentation on how to configure Nginx would mean I touch neither and stay with Puppet.
Any (configuration)file can be installed and/or templated with both Ansible and Salt. This includes whatever Nginx has for configuration.
I'm not 100% with both, but I guess you have nginx be installed in some dedicated pillar/playbook and you can have your application pillar/playbook include templated configuration files to be inserted into /etc/nginx/conf.d and notify the service to be reloaded somehow.
But when it's clearly a scenario that everyone using Nginx will be writing these templates, surely it's better to have a well maintained master copy of them.
The complexity usually comes in having multiple projects wanting to modify the template(s) to wire themselves up.
A good sign for config tools is a feature rich and well maintained recipe/playbook (whatever you want to call it) that is able to do the non-trivial things (most deploy scripts for nginx don't seem to deal with SSL particularly elegantly with all of the options involved).
What I look for in a config tool is such good defaults for handling these complex (but commonplace) scenarios, that the recipes/modules/playbooks are mature, dependency-free and well-maintained.
I guess I'm spoiled by programming in Go, I've got used to the idea that the language includes a stdlib comprehensive enough that 90% of what you need (even with those complex things like "give me a web server") is all built in.
That's the problem I'm trying to solve whenever I consider abandoning Puppet... dependency hell.
But I also remember the pains when I first used Puppet: cross-cutting concerns and complex configurations.
I find it works well, each node pulls configuration from github, an rsync share, or similar, and executes locally. So there's no master in the traditional sense.
With the history of some very serious issues with the salt crypto, I'm a little concerned that there doesn't seem to exist any good documentation on the past and current state of the protocol security from the salt project?
As I said up-thread -- perhaps I'm not being fair, perhaps I'm just not aware of where to look -- but I've yet to see anything that puts me entirely at ease: have new members been added to the team? Has there been a successful audit? Did the attacks turn out to not be practical?
While I might not have the same confidence in paramiko as I do in openssh -- at least it works with a well-tested protocol -- and more importantly -- with a rather well-known protocol -- it's easier to evaluate. If someone can get root access via ssh that is bad. If the risk is limited to someone stealing a private key, then that is at least something to plan around (and make decisions around).
I feel like every discussion of configuration management should start with what scale you are talking about. Managing 100 servers is quite a bit different than our environment which is about 7500 logical hosts on 5000+ physical servers across 5 datacenters (real datacenters, not cloud).
We looked at Salt versus Ansible and chose Salt mainly due to scaling concerns with Ansible. I believe Ansible has been addressing this, but at the time we did our evals last year it was a concern. We skipped Puppet due to DSL and Chef because we didn't want to delve into Ruby (I love Ruby, but it's not a tier-1 language for us like Python).
So far in our largest datacenter, which has 2700+ hosts, we are able to manage it with a single Salt master. That took some tuning, but it works. We have tested bringing it offline to make sure the "thundering herd" problem is mitigated.
Just wanted to say I love ansible!!! After the nightmare that was Puppet/Chef, ansible has been just what the doc ordered. I keep all my playbooks under version control (git) and deploy via ansible-playbook. KISS philosophy; it has worked out better than anything we used in the past.
Salt is also ok I think. I don't quite understand custom DSLs though for configuration management. Giving users a library of idempotent code components like chef does I think is way better than a custom language that is almost but not quite or maybe turing complete. At some point you are going to want to iterate and loop over stuff and if there is anything that ant has taught us is that imperative things are better handled with imperative language constructs. Trying to shoehorn everything into a declarative format is the wrong approach.
The benefit with Chef is that it is always Ruby. There is no dropping in/out of anything other than Ruby. As a Ruby programmer I quite like that. It doesn't fight the language it is embedded in and uses all the language idioms to great effect.
> No masters. For Ansible this meant using ansible-playbook locally, and for Salt this meant using salt-call locally. Using a master for configuration management adds an unnecessary point of failure and sacrifices performance.
There are two models for delivering state to your infrastructure nodes. Pulling and Pushing configuration. Ansible Pushes code from the controller to your nodes, while salt, puppet and chef all pull state from a master somewhere.
Like twic says, Ansible does not have a master.
The original author says no masters means faster performance. What he means is that pulling configuration from a remote checkout equals faster performance, which is true because it can be loadbalanced etc.
A chef/puppet master can have features such as search and service discovery that should be a large red flag for SPOF problems.
But Ansible doesn't have masters! It has a machine where you run Ansible. But that can be any machine, as long as it has Ansible installed, the Ansible code checked out, and an authorised SSH key. If your usual machine goes down, just check out the code and run from a different machine. The idea that you need to use local playbooks to use Ansible masterlessly just seems mistaken to me.
Moreover, any scheme which involves running local configuration (whether in Ansible, Chef, or Puppet) involves either pushing configuration updates to machines, or having the machines poll for configuration updates, in which case it's no different to running remote configuration or having a master, respectively.
I don't get the point about open ports. Are you running machines without SSH? If you are, well done. But if, like most people, you're not, then you already have all the port you need open.
I haven't mentioned an Ansible Master? I referred to the user running ansible as the controller.
Running local configuration and checking out local state can indeed be very different from having a master. Like I said, master features often include features such as search and service discovery. Checking out state from version control does not have those features therefore the user implements those features on his own with stateless cookbooks/pillars/modules/whatever. The remote checkout is not a SPOF and a master is.
You are right in regard to the open ports, it is uncommon, though I have seen it with workstations. I edited the post!
I'm currently working on a whole bunch of Ansible stuff, and I'm loving it. I definitely agree with the author that the docs for beginners are excellent. No experience with Salt as of yet, but I'll probably spin up a VM and screw around with it at some point.
We're in the process of switching from Puppet to SaltStack. It's a change measured in light-years. We didn't evaluate Ansible, so I can't speak to it -- but we are extremely happy with Salt's speed, flexibility + extensibility.
I feel like the entire configruation management movement has passed me by. I still don't understand what value there is in chef/puppet/salt/ansible/docker vs bash or even Perl for that matter. Someone care to set me straight?
When you're managing more than a handful of servers, you very quickly start wanting to be able to run the same command on multiple machines - "upgrade all my API boxes to the not-vulnerable nginx", for instance, or "push this binary out to all my database servers". These sorts of services make that straightforward, and generally provide a large library of prewritten modules to do moderately-complicated things without having to write a lot of boilerplate or read somebody else's Bash scripts or Perl.
Well, I've done that just using remote shell commands. And I'd have an easier time reading someone else's bash than I would their ansible whatevers. Is it actually more concise?
When written correctly, it's idempotent. I've done a lot of server management with bash and it's a lot easier to achieve idempotency with something like Chef.
It's usually more concise because idempotency comes by default. Instead of saying "create this file" (which might throw a 'file already exists' error the 2nd time you run the setup script), you say "ensure this file exists".
Some bash commands are idempotent too (e.g. apt-get install), but it's not something you can rely upon, and you often have to code the idempotency in yourself.
What if you have to upgrade a software package and add a new config file that is different on every server.
I guess you can do that via a horrible sed command, but having native template support with variables is pretty nice.
Same for things like "tune the amount of worker processes depending on the amount of CPU cores the machine in question has".
Things like Ansible/Salt and similar tools cut out a lot of the boilerplate. There are also plenty of modules you can reuse without having to roll your own. You can achieve the same results using Bash/Perl/Python but a lot more effort is needed.
* Your servers are all kept in a known state that is described by your code. If you don't have configuration management, you have to remember that postgres is installed on X server and has Y configuration. Which is ok for simple configurations but quickly becomes a total mess for semi-complex configurations.
* You can go from a clean server to a fully fledged working server (or set of servers) in a single, repeatable step.
* You can remove repetition from your code base and decouple your server configurations. Database password needs to be used in six different places? No problem. Specify it in one centralized configuration file and then just use template variables to make it appear wherever it needs to be upon deployment.
According to the chart Ansible has been around since 2005? Salt is barely 2 years old?
Look at the uptick of the last few months and SaltStack seems ever slightly steeper than Ansible.
I am SaltStack guy, albeit a newbie. Barely got done installing (much much easier than Puppet) and trying out few commands. I was hooked on SaltStack when I was able to run following command once and get result from multiple machines near simultaneously.
> salt "*" cmd.run "df -h'
I get result from all machines near simultaneously.
Above command is same as you logging into each machine (say hundreds or thousands) and running 'df -h' to see status of your storage space. You could write/test/deploy a shell script and push it out to all those machines. Or set up some monitoring system. Or install SaltStack across your network (very simple to do) and run above command once on your SaltStack server and get immediate feedback.
I tried working with Puppet long time ago. The idea of having 20 minute window for pushing out changes never seemed attractive to me.
Salt all the way. We moved digedu's highly distributed infrastructure from puppet to salt and couldn't be happier with salt-cloud, salt-master, salt states, pillar and jinja.
It layers a custom DSL on top of a perfectly adequate language, uses standard terms like classes in non-standard ways, takes away the linear top/down flow that most programmers are used to, forces sequencing through notification chains, steamrolls over error messages willy-nilly, etc. Although I'm a bit biased so a few more data points would be helpful.
I notice none of these problems on a 3.2 deployment with about 500 nodes.
I find that Ansible is roughly identical doing the same things on the same machines timewise, not going to get into the subjective argument about the language, the parser syntax is backwards compatible between major releases (and they go well out of their way to warn you what you'll need to change before something actually does break), and I don't see how it's any harder to test locally than any other config management tool.
This isn't true as of more recent releases (since around ~3.0), they appear to have finally gotten their act together.
> really hard to test locally
Tools like Beaker are finally the norm, so I've high hopes for this improving over the coming year.
But yes, crazy slow alone destroys everything. Typical "fixes" include going masterless, yet there's no standardised distribution methods so you need to invent that yourself. Embedding all files into catalogs, thus turning network overhead into CPU overhead etc.
Not to mention in the insane memory usage client side, i.e. on every single box.
I posted this on the site, thought some might be interested here (disclaimer: I'm the creator of ShutIt):
We had similar requirements in our company and ended up building our own tool for building containers in docker and shipping those. So far it's working out really well, particularly in the "ease of learning" department.
- No optimizations that would make the code read in an illogical order.
ShutIt is "pure ordered". Each module has an ordering and code is strictly sequential. It even outputs the commands into a "black box" recorder on the container which can then be used to port to other CM tools if desired.
- Code must be split into two parts: base and service-specific, where each would reside in separate repositories.
Please note that I have only had experience with Salt and fabric.
Salt falls short of what you want in the corner cases:
- We've found it's darn hard to upgrade. (To be clear, we'd like to upgrade by transitioning the master to a new VM; for one, this means things are clean (we can provision our salt-master through a fabric script), but it also allows us to change the amount of memory available.) The minions, when disconnected, do not reconnect to the hostname in their config: instead, they endlessly reconnect to the IP that the DNS resolved to when they were started. You can't simply change a DNS record and have the minions move. Please note that we're a bit behind in releases (we're using 0.17.2, IIRC) because of the difficulty of upgrading.
- YAML was a terrible choice for "state" files, in my opinion. State files contain lists of commands to execute on a remote host being configured: trying to specify args to functions in YAML is awkward.
- I'm of the opinion that the master-minion relationship is backwards. I'd be much more interested in something that connected to the minion. In particular, this would help with upgrading (the minion is controlled by two masters for a short period).
- The command line utilities are prone to user error: they return success during failure, they return no output and success because your states took too long to run, and it got bored. You can look up the job ID, but it's painful.
- The errors are utterly useless. In particular, Jinja rendering errors tend to reference incorrect locations in files, returning nonsense such as use of an undefined variable on a blank line.
- The output is useless too: you get a (very) verbose listing of everything that succeeded or failed. Telling if anything failed is the trick: it's buried in all the successes. (Terminal find is my friend here, but still, you have to be careful to watch out for boundaries between runs and not read an old run's output.) As discussed, the return code won't help you here.
- AFAICT, you need to be a particular user, and there is really no ACLs to speak of. All of our Salt stuff currently runs as a single user. People inevitably step on each others' toes.
- Non-responsive nodes are not mentioned in the output: they're the same as if they didn't exist! This results in some really wacky stuff happening. If you have variables that are lists of machines, the machine simply won't be in the list. This means if you need N of some type of machine, that list will be empty. (This often then triggers the aforementioned unreadable jinja error output, if you assume the list to be non-empty.)
- There is little capability for actual processing on the master itself. Sometimes, you need to coordinate the actions of several nodes together, such as generating keys for each node, and then distributing all keys to all nodes.
I've had experience with Chef, Puppet and Ansible. Ansible is the least complex, and we're using it daily. Re: Ansible community dynamic - I've gotten unfriendly feedback a few times and agree with the negative reputation. Aside from community, Ansible is a big step up, and I suspect Salt would be as well.
Don't read too much into responses if we don't go out of our way to say "Hi yall", but we do try to say thank you a giant ton.
We're pushing an IRC channel of about 800 people now, and I think we're mostly just trying to be concise in the waves of giant teaming hordes of Ansible users :)
If you don't let that get under your skin, you'll be fine! We're happy everyone is here, usually. Though we'll also share when the design decision of something is that way for a reason.
As it is said, "go not to the elves for council, for they shall say both no and yes" :)
Mostly we're just trying to get you on your feet as quickly as possible.
Try the tool, by all means, if we're ever short, it's because we're so incredibly busy, and we're thankful for every user we have.
I usually look at Salt as more advanced and perhaps a replacement of Puppet and other such declarative configuration system. I see Ansible as more of a replacement of a bunch of SSH + scripts.
This was definitely an interesting comparison but to correct a few misconceptions:
Ansible has 810 contributors at this point. I'd love to say I wrote everything but it's a huge shared effort.
We also have a lot of mods other projects don't, so some comparison aspects were not even.
We do say no when we disagree. I think that's important. Filtering and testing makes a project what it is to a degree. There is always the project and development list to discuss things and they are really big lists. All being said not transferring a file verbatim is for example still the right call for us.
Try what you like by all means! But I would suggest that it not be inferred I eat children :). Only sometimes!