bcoca's comments

bcoca · on Jan 20, 2017

yes, many times we need to work 'around' the tools as they do not do what we want and cannot disable to avoid confusion:

https://github.com/ansible/ansible/releases

bcoca · on Jan 20, 2017

As an OSS maintainer that is currently struggling with this I can tell you the solutions are not that simple:

- We use several mailing lists (like a forum its good for async communication)

- We have several IRC channels for immediate feedback

- We don't have glitters/slack/discord/etc because more avenues of communication actually makes us scale less

- reviews are not rubber stamps

- we use github to host our repos and as a ticketing system

- we have developer docs (sometimes they do lag or are incomplete)

We still have much larger backlog than we want and plenty of people mad/exasperated on our scale and turnaround for versions and features.

I don't think UI is an issue as much as other factors, many other things will affect the project.

As for the communication channels, a good community actually offloads work from maintainers. As users get to know/use the project and pay attention to the channels, they end up doing a lot of support and freeing the maintainers to work on the code. To get to this point the maintainers have to have put in the work and have helped the community gain the knowledge needed. That said, it is not a license for maintainers to check out, just a good way to share the burden.

Github has a huge community around it, but it's ticketing system is limited and not great for big projects, they are working hard to fix this, releasing many updates and features this last year that have made our lives easier, but it is still an issue (see "open letters" from last year for details).

Much time is spent dealing with insufficient data from users/contributors or just plain unreasonable requests: "soo this does not work in AIX 4.1 (current is 7.2?) with custom built python 2.4 from which you removed core libraries and a non POSIX custom built grep? no .. we don't have tests for this". A good ticketing system can help, but it will not solve the issue (skynet might, but that might be too drastic).

A factor is maintainer 'control', it is hard to let others 'raise your baby', specially as their vision might diverge from yours. Even w/o sparking flamewars there is a constant time sink in discussion of ideas, feature set and roadmaps. Design by comity or popularity does not normally lead to good results either and IME sparks more flamewars than it avoids.

Another is code quality, most projects start w/o a big test infrastructure (one guy really doesn't need much), once you have hundreds of contributors you struggle to get proper linting, testing, access to resources, etc.

No one writes perfect code, no one can perfectly review code, more code == more bugs == slower turnaround.

Reviews are a bottleneck .. yet they are the primary guarantee for code and interface quality. Rubber stamps will create more work for you in the end as you'll spend multiple cycles fixing bad code , tests can only help you so much so much of the burden falls on the maintainers.

Tests are code too! People forget this and put little/no review into tests or have many trivial tests (check that setters/getters actually work...) to get 100% code coverage. Code coverage is not a goal, code quality is. The point here is that test are also a lot of work, specially getting them 'right' for your project, what this means is also fodder for flame wars. Not having good testing, also creates more work as more bugs will get into the code.

Most contributors won't maintain their code, many in OSS will contribute code and then disappear, any problem with it has to be dealt with by either the maintainers or someone else in the community that might also 'contribute and evaporate'. As code bases grow and features are added, the maintainers increase their burden. This is alleviated by increasing maintainer numbers, but that is always trailing the burden by a lot.

Adding too many maintainers, it gets to a point that adding maintainers creates more burden, coordination, design decisions, knowledge sharing, etc ... team management in general.

...

I'm going to stop now, I could probably write many pages on this but I think the above is enough to show that this is not easily solved by using a few new tools or UI, docs do help, but that is another maintainer burden.

Maintainers normally just use what 'works for them' and most will look to improve the process. We all get many suggestions, it is impractical to try them all and the 'latest trend' dies down faster than the time it takes one to shift the workflow to try it. The few times I've been able to do so, it is either not an improvement or just not enough of one to justify the effort of switching the workflow.

If someone has a way to make managing big|popular OSS projects simple and seamless ... LET ME KNOW!

fourthark · on Jan 21, 2017

This is the correct response IMHO. It's a social problem and all this yakking about tools and web interfaces versus email must be coming from armchair experts who have never tried to be a maintainer.

I encourage them to try it and learn what the real difficulties are. More maintainers are needed almost everywhere, just choose a project and dive in. And please, remember your health and the real world too! They will be easy to forget.

Lowering the cost of entry for contributors is no longer a problem. I doubt it was ever a problem.

I also doubt computers have changed the organizational problems very much. The same complaints (with some translation) could be made for any nonprofit, and many businesses too.

ZenoArrow · on Jan 20, 2017

>"If someone has a way to make managing big|popular OSS projects simple and seamless ... LET ME KNOW!"

I've had ideas about this, but just to be upfront I've not tried them in the real world.

Basically I think if code is designed around specifications it's possible to automate a lot of what goes into software maintainance.

Let's imagine a scenario. I'm designing a piece of software. The first thing I do is to define specifications on how the system sits together (i.e. the function interfaces, the general architecture). I then put into place tests that can validate that these requirements have been met. If you're familiar with functional languages essentially you're looking at a bunch of type signatures and a map of how they can interact.

Next, I put in place a CI system. This will run tests every time someone submits code to the main repository, as well as run linters, code style checks and performance checks. Commit access is completely open. If the code matches the expected style, doesn't cause the related tests to fail, and doesn't cause performance regressions it's accepted into the codebase. If it doesn't it's removed.

With this approach, development discussion between people can be focused on altering the specs to permit refactoring and/or new features to be added.

Any thoughts?

Ajedi32 · on Jan 20, 2017

This is similar to the typical workflow I see in most GitHub projects I've worked on. Once you submit a PR, various tools will come along, lint your code, run the tests, check your test coverage, etc and give you a report. Then a project maintainer comes along and manually reviews your code to check for various style issues and things that can't be checked automatically (like whether they even want this feature in the first place) and if everything looks good they click merge.

> Commit access is completely open. [...] With this approach, development discussion between people can be focused on altering the specs to permit refactoring and/or new features to be added.

Not gonna happen. Why? Because: http://www.commitstrip.com/en/2016/08/25/a-very-comprehensiv...

ZenoArrow · on Jan 20, 2017

>"Not gonna happen. Why? Because: http://www.commitstrip.com/en/2016/08/25/a-very-comprehensiv...

That's a misunderstanding. Most specs aren't fully fleshed out code, they're the bare bones of describing what something needs to do.

Think of it as if you're designing electronics out of integrated circuits. You can design a schematic just by knowing the inputs and outputs that the integrated circuits need to provide. The actual implementation of the integrated circuits is abstracted away.

It's the same with the relationship with specs and code. Specs are meant to be at a higher abstraction level than the code they describe. You can use code to write specs (and there are advantages to doing so), but the idea with the specs is to define something which is universally true, rather than get into the detail of the work done to meet this specification.

An obvious example of this is design by contract:

https://www.eiffel.com/values/design-by-contract/introductio...

Whilst code contracts aren't necessary in all languages (this is a good starting point for looking into why: http://blog.ploeh.dk/2016/02/10/types-properties-software-de... ), I believe they do offer benefits to software written in most imperative-style languages.

Ajedi32 · on Jan 21, 2017

> That's a misunderstanding. Most specs aren't fully fleshed out code, they're the bare bones of describing what something needs to do.

Right, the problem though is that you seemed to be suggesting that the spec would be so detailed and complete that a script checking that the code submitted in the PR matches the spec would be good enough to decide all on its own, with no human intervention, whether or not the code can be merged:

> Commit access is completely open. If the code matches the expected style, doesn't cause the related tests to fail, and doesn't cause performance regressions it's accepted into the codebase. If it doesn't it's removed.

A spec that detailed would basically have to _be_ the code itself.

ZenoArrow · on Jan 21, 2017

>"A spec that detailed would basically have to _be_ the code itself."

No it wouldn't. The specs are applied as part of the test suite. Please read up on code contracts to see how this is possible.

If you didn't like the link I shared before, here's a video:

https://m.youtube.com/watch?v=YQJMe0Eahyg

stordoff · on Jan 21, 2017

> Commit access is completely open. If the code matches the expected style, doesn't cause the related tests to fail, and doesn't cause performance regressions it's accepted into the codebase. If it doesn't it's removed.

It seems to me that any spec. that went into sufficient detail to allow this would be more or less writing the project in another language, meaning there's no actual benefit. I can't imagine that a spec. that doesn't get into that level of detail would be sufficient to prevent all malicious/unwanted commits (say, subtly weakening cryptofunctions or leaking user data over side-channels).

ZenoArrow · on Jan 21, 2017

>"It seems to me that any spec. that went into sufficient detail to allow this would be more or less writing the project in another language, meaning there's no actual benefit."

The benefit is in designing code at a higher level of abstraction, one that can be easily reasoned about. It is possible to design code at a high enough level where the specs and the code are one and the same, but most languages haven't got the type system sophistication of something like Idris or Haskell, which is a key component of pulling off this feat. The vast majority of code is written in languages that do not lend themselves to code as specification. Code contracts and other complimentary techniques (such as automated test generation) can go a long way to counteract those shortcomings.

Crypto functions are a special case. In this instance you won't save time by defining specifications as the requirements on algorithmic correctness are much higher than average. However, in this case the ideal would be formal verification, and that still requires a specification, it's just likely to be more verbose than you'll want for day to day code checking.

Lastly, consider the alternative if you don't use specifications. At this point the burden of performing code checks is with humans. With a large, fast-moving codebase it's unreasonable to expect any one individual to understand all the parts that constitute the whole at a sufficient level to stop new bugs creeping into code. It happens on every project, no known exceptions. With that in mind, why wouldn't you want to put in a framework to help catch bugs automatically? New bugs will still occur, but with a well specced program this should be at a vastly reduced rate.

fourthark · on Jan 21, 2017

This doesn't deal with the design of code and the long term effects of choosing the wrong API.

It also doesn't handle new features, at all, because the tests have not been written. Even if you enforced a requirement for tests with coverage, the tests could (and usually would) still be wrong.

Trying to take conscious, adaptive, intelligent response out of the loop is a mistake.

ZenoArrow · on Jan 21, 2017

>"This doesn't deal with the design of code and the long term effects of choosing the wrong API."

Yes it does. With this approach, any change to the design requires a change to the specs to be made first. I already indicated the specs could evolve over time to allow for refactoring and new features.

>"It also doesn't handle new features, at all, because the tests have not been written. Even if you enforced a requirement for tests with coverage, the tests could (and usually would) still be wrong.

Trying to take conscious, adaptive, intelligent response out of the loop is a mistake."

You've misunderstood, as I stated before, discussions still happen, they're just focused on the specs.

fourthark · on Jan 21, 2017

I agree with Ajedi32 on this. Sufficiently detailed specs will be indistinguishable from code. So reviewing code and its behavior is more efficient than having another language to review.

It may depend on the domain, however. I find that a lot of misunderstandings arise in these discussions because people assume the problems are the same in all kinds of computing. That's not so.

I work in viz/UI (interactive charting) so there aren't great libraries for testing or writing specs. All the features interact in unpredictable ways.

Your process may work better where the API is purely functional, where inputs and outputs and side effects are better defined.

I'm happy for you if you've applied this successfully!

throwawayish · on Jan 20, 2017

I very much agree.

Much of this is in the area of management (and thus social - building a culture around a project), and has relatively little to do with tooling per se.

nodivbyzero · on Jan 20, 2017

Totally agreed!!!

bcoca · on Jan 13, 2017

because it uses existing interpreters on the target host, fixed locations only happen in a homogeneous environment. Most IT shops commonly have to deal different OS/Distrbutions/Versions so the same way you cannot have just 1 tshirt size for everyone you cannot have 1 interpreter path.

I'm saying 'interpreter' instead of Python because you can create modules in any language, Ansible only ships with Python ones, but Perl, Ruby, etc modules exist also and usable by Ansible.

bcoca · on Jan 13, 2017

actually, the 'controller' specifies the interpreter to be used at the client, there can be more than 1 and the '1st one in path' is not always the correct one.

bcoca · on Jan 13, 2017

it is not a host definable fact, it is a 'host variable' which you normally define in inventory.

ensignavenger · on Jan 13, 2017

From the advisory:

When Ansible runs on a host, a JSON object with Facts is returned to the Controller. The Controller uses these facts for various housekeeping purposes. Some facts have special meaning, like the fact "ansible_python_interpreter" and "ansible_connection".

So while you may normally define it in the inventory, it sounds like, from the advisory, it can also be defined by the host.

bcoca · on Jan 13, 2017

disclaimer: I'm one of the maintainers

I'll try to answer several questions so this might get long.

First, the procedure for disclosing the CVE is something we discussed internally (including security professionals), as all over the internet there are 2 or more views on this. The decision arrived at doesn't please everyone (I doubt one that would exists), but it is a recognized way of dealing with security issues, so it is what we followed.

The CVE can be dense explaining how the exploits work. The simple version: it is a rehash of an old problem which we had thought solved, the researchers proved us wrong by finding ways around our filtering. The vulnerability is pretty hard to trigger and requires both a compromised system to intercept the Ansible calls and return faulty data and intimate knowledge of the existing playbooks and systems used to force the arbitrary execution.

All software has flaws, this is not an excuse, it is a fact. Not all software or flaws have the same scope though. As a automation tool that is often used to manage things with high levels of privilege, we take these things very seriously and we do our best to prevent it in the first place or remediate it as soon as possible. As an OSS project we appreciate the eyes and efforts many put into finding these flaws which end up making the software better (and me a better programmer).

Ansible is not idempotent, it is declarative, which does help the user create idempotent plays. True idempotency depends on many things, the modules used, the problem addressed and how the playbooks are written, etc. Both of the following are valid ansible tasks, but the implications are very different even if the result ends up being the same.

- shell: usermod -G user bcoca

- user: state=present name=bcoca groups=user

I hope this helps,

rarrrrrr · on Jan 20, 2017

Odd. I reported this very same form of vulnerability to the Ansible team in the 1.5.4 series in 2014, where the code basically eval'd the "facts" discovered from a system under management.

There was this "safe_eval" function which filtered input in a way quite inconsistent with its name. The Ansible team was responsive and pleasant to work with!

https://groups.google.com/forum/#!topic/ansible-project/MUQx...

But I suspect lots of remote control and monitoring software products might have security bugs like this where they assume that the returned information from systems under management are trustworthy.

Edit to add: Here's the patch made to safe_eval in 2014. I had suggested using literal_eval instead but I guess a Python 2.6+ requirement wouldn't work. https://github.com/ansible/ansible/commit/998793fd0ab55705d5...

Edit again: Ansible is a pretty great product, and IMO one of the first of such tools to seriously improve the UX for sysadmins. Thanks for maintaining it!

bcoca · on May 26, 2016

speed has been both better and worse, so depends a lot on what your playbook is doing, size of your inventory, vars, etc

the only answer i can give you is: test.

We do try to keep decent performance, but this is not our main focus.

adamors · on May 26, 2016

I use around 1-3 variables per module and 4-5 modules per playbook. Other than that, the actual tasks are trivial (install package x etc.)

bcoca · on May 26, 2016

we actually avoid the requests lib for this and many other reasons, we don't use libcurl as we try to avoid extra dependencies when possible.

The issue is more basic than requests, the actual python http/url and ssl implementations have these issues, we have patched and added warnings to indicate which minimal python versions you can use and have SNI work.

therealmarv · on May 26, 2016

thanks for clarifying!

bcoca · on May 26, 2016

Support for python3 has been on the roadmap, sadly most of the installed base of servers out there uses 2.x and in many cases 2.4 (centos/rhel5).

This is NOT a switch from py2 to py3 we are aiming to support BOTH at the same time, this is not a trivial task (specially with 2.4) and will probably take us several versions to implement.

Kudos · on May 26, 2016

Several majors or minors?

bcoca · on May 26, 2016

minors we release every 3-4 months (or plan to)

bcoca · on May 26, 2016

disclaimer: i'm an ansible dev.

As the ticket shows, this was added in the 2.x release of Ansible, that is why the ticket was closed.

Adding this info required a major revamp of the parser, which we did in 2.x, for this and many other reasons. This is not a simple change in 1.x and we decided not to backport it.

bpchaps · on May 26, 2016

Could you? Tons and tons of companies are forced into using outdated versions (for tons of reasons) where point release updates are still possible.

I'm sure it's a lot of work, but a lot of your core and original users would appreciate it. Not implementing something as useful as that just has a "we got 'em, no need to do anything else for them" vibe.

devy · on May 26, 2016

> Tons and tons of companies are forced into using outdated versions (for tons of reasons) where point release updates are still possible.

Exactly my point! Ansible devs should own up to it for admitting their BAD design choice (I heard someone from here saying intentionally not use a parser but use YAML.) of not being able to get syntax error file name + line number!

My story was that debugging ansible playbook with loads of nonsense cryptic error messages due to syntax errors that I have to spent hours to figure out what went wrong was a complete frustration. It was so bad that we eventually re-designed our infrastructure to cut out Ansible and never look back again. So much for Ansible 2.x that it doesn't even matter. Ansible lost its appeal in 1.x, and that's it. No more 2.x upgrades.

Lesson learned here is that if a young software tools got adopted but early version caused so many frustrations that the authors don't care about back-porting bugfixes, all future version becomes irrelevant.

bpchaps · on May 26, 2016

It's gotten so bad with these sorts of tools with their incredibly annoying design flaws that I've been debating creating my own using rpm and ssh on top of bash with some python.

I get why it's important to have a lot of the features, but when the devops crowd thinks [0] is acceptable (or at the very least, doesn't scream about it), then it might be time for something new.. with [1] in mind.

  [0] $ man salt 2>/dev/null | wc -l
  140905

[1] https://xkcd.com/927/

mbakke · on May 26, 2016

Which design flaws are you referring to? As a recent Salt convert (and Puppet expert), I was perplexed as to why such a nice tool would have a man page 40x longer than bash.

Turns out it includes extensive documentation for all states supported by Salt, generated from the online documentation. Compare this to the Puppet manual:

    $ man puppet
    PUPPET(8)                Puppet manual                         PUPPET(8)

    NAME
       puppet

       See ´puppet help´ for help on available puppet subcommands

Obviously no-one will be using everything that Salt supports, so it would be nice if it was broken into sub-sections. But I much prefer having all documentation available in a manual to looping through every module and run "rdoc" as in the Puppet case.

Any sufficiently popular configuration management tool will have equally long documentation.

There are a couple of simpler options available:

http://www.nico.schottelius.org/software/cdist/

https://github.com/brandonhilkert/fucking_shell_scripts

bpchaps · on May 26, 2016

"Design flaw" might not be right description. Maybe, "Naive implementations with nearly useless documentation" is better.

When I first started using puppet, I would go home absolutely exhausted every night. Their design choices, along with a lack of documentation, would turn the equivalent of a 20 minute bash script into something that would take days. Best example I can think of is the arbitrary ordering of module execution. I understand why they do it this way, but if they documented it in plain sight, then maybe my desk wouldn't have a forehead shaped dent in it. Similar goes for the other tools.

The only thing I can think of that prevents companies from releasing proper documentation is because of their expensive support contracts. There's a lot of incentive to make a standard incredibly difficult.

I just want accessible tools :(

edit: fss looks pretty neat! Kinda rpm-y, but it fits a nice middle ground.

mbakke · on May 26, 2016

I get your sentiment. I only started using Salt some months ago, but it was really a breath of fresh air compared to Puppet. However it took me a while to realize its true strengths since the documentation is very...dry.

IIRC Ansible, like Chef, does serial execution. That is, states are applied in the order they are written. I think that's part of the reason it has gotten so popular, as you'll have very few surprises in the vein of "what do you mean there is no ntp_service -- it's right there next to the config declaration!".

What Salt got right is the pillar, for which the Puppet equivalent (Hiera) was an afterthought. The Salt engine allows you to generate states from the pillar, rather than making poor clones of Puppet modules (as most of the formulas I've seen online).

However that flexibility is not documented anywhere, nor part of best practices. Nevertheless I'm about to release a set of formulas that are truly pillar-driven with no hard-coded stuff. Keep an eye out for the accompanying blog post :)

Of course, if you don't care about data/code separation and just want to get stuff done, there are better tools. And if you have full control of your environment, I would strongly suggest using Guix and/or Nix rather than these legacy configuration management tools.

bpchaps · on May 27, 2016

I actually really like salt, just not its documentation. Like you said - it was a breath of fresh air. :)

Last time I used it a few months ago, it was while helping out another admin. He had a lot of confusion on how to do certain things, and the documentation wasn't very helpful for either of us. The solution came from github, which has become my go-to for these sorts of tools. Though, that could easily be supplemented with some better examples on their site, or in their, erm, man page.

I doubt we'll have some great tools with great documentation anytime soon (or monitoring tools!), but a sysadmin can dream... :)

technofiend · on May 26, 2016

Although it's cool you guys fixed it - thank you for that - those of us sitting out in RedHat land may not see 2.1 sitting on EPEL for a very long time. So it's a legit request.

wyaeld · on May 26, 2016

its in 2.0, which is available on RHEL

technofiend · on May 26, 2016

Ah ok if it's in 2.0 then you're right! I thought it was exclusive to 2.1.