Hacker News new | past | comments | ask | show | jobs | submit login
How we might abolish Cabal Hell, part 1 (well-typed.com)
75 points by tux1968 on Sept 30, 2014 | hide | past | favorite | 81 comments



Why do we keep running into this problem over and over again? Why do we need one package manager per programming language? Why is software distribution so damn hard?


Because general purpose package managers suck at dealing with the concerns of individual programming languages. E.g. installation paths, virtual environments, different compilation methods, etc.

There is also no single cross-platform package manager. Mac has macports and homebrew, windows barely got one, linux has yum, apt and others. As a language designer, you don't want any part in this, you want your code to work. So you make your own.

edit: To clarify, I'm suggesting that we're stuck in a local maximum - now that all these systems already exist and have widespread adoption and tool dependency in their own environments, there isn't much of an impetus to use Nix or something similar even though in the grander scheme of things it'd be way nicer.


I think the solution is to separate the problem into two parts

- The "puts things on disk" part (posix "install" or a low level tool like dpkg or rpm is most like this)

- The "determine what needs to be installed" part, (aptitude, yum)

I'm all for a per-language or system version of the latter, but once that's done, you should generate packages installed by the former.

Heck, wrap all the commands (cp/mv/install/chmod/chown/etc.) that write stuff to permanent places on disk to actually do "add to a package", give it a basic name/version number, and have the low level tool handle adding/removing it from the system (or multiple systems, or deploy it, etc.). All the dependencies, compatibility, etc. are handled by the higher level system.

This gets you the best of both worlds - system level packages, and the ability to install whatever you need. FPM (https://github.com/jordansissel/fpm) is a pretty good example of this philosophy.

But, instead, we get every punk ass CPAN descendent spraying it's crap all over the filesystem, needing the whole build environment installed, touching the internet in weird ways that don't guaranteed repeatable behavior, etc. sigh


To add to that, the concerns that operating system package managers have are very different from the ones that programming language package managers have.

With an operating systems package manager, the focus is on shipping working, uber stable code, which is unlikely to break someones system. This (imho) is due to two different factors: 1) Operating systems have to work, no two ways about it, if your OS is broken, everything else is broken. 2) Users of operating systems are not necessarily experts. If their package manager ships experimental code which breaks their particular system, they don't know how to report the bug to the central OS maintainers. I think this is handled somewhat by having different repositorys with different levels of stability, a la arch linux's [core/extra/community/testing/AUR], however at the end of the day, the main repository must avoid having breaking code.

With a programming language package manager, the focus is on having up to date features, and catering towards power users who, if things go wrong, can generally fix them, or at least know how to contact, and how to phrase their requests for assistance. This means that it's generally more acceptable to expect users of a programming language repository to occasionally be served broken code, as defects will be reported more rapidly, and more clearly.

I find it interesting comparing the two package managers that I use most often in my day to day computing experience: Cabal, and pacman (Arch linux). Pacman offers a single version of each library per repository, and that version is as stable and tested as the repository requires. This is in keeping with the spirit of an operating system package manager, as it allows power users to install unstable packages from more experimental repositories, but tries to serve as stable code as possible to general users. Contrast that which cabal, which basically offers no stability guarantees, but allows for much more fine grained control over library versions, sandboxing etc.

TL;DR

In my opinion, OS and Language package managers have goals which are at odds with each other: OS package managers want to maintain stability, Language package managers want to allow for bleeding edge code. The concessions that cabal makes towards stability (versions, sandboxes etc) often cause other problems as well.


Haskell folks seem to be gravitating towards the Nix package manager, which is general-purpose. We need to stop the proliferation of a package manager per programming language and make systems package managers capable of handling the use-cases of pip, bundler, npm, cabal, etc. Nix and Guix are the only 2 general-purpose package managers that I know of that can do it.


Wouldn't say that the community is gravitating toward Nix, it's kind of a false split since Cabal and Nix can only be used together and Nix can't replace cabal. Though Nix can be used to manage a wider package environment than cabal-install can and then cabal is left to the role of a Haskell build system which it excels at.


Thanks for the explanation. I'm not a Haskell programmer so I was basing my observation off of some Haskell hackers blog entries.


> Nix and Guix are the only 2 general-purpose package managers that I know of that can do it.

For anyone constrained to GNU/Linux, right?


Nix works on OSX (although the packages themselves may not)


> For anyone constrained to GNU/Linux, right?

I don't feel constrained by a free software operating system. Besides that, a bunch of people use Nix on OS X. On Windows I suppose you're stuck with the inferior package managers.


"I don't feel constrained by a free software operating system."

I appreciate the sentiment, but there's a substantial difference between "constrained to" and "constrained by".


There are many other OS out there.

Why do people only think of the triad?


I don't think it's gravitating "toward" Nix. Many members of the community are trying to use Nix personally and maybe Cabal proposals are inspired by Nix. I don't think the idea today would be that the entire community should move to Nix, however, not by a long shot.


Because general purpose package managers suck at dealing with the concerns of individual programming languages. E.g. installation paths, virtual environments, different compilation methods, etc.

And yet all these nifty package managers can't even seem to get things as simple and long solved as permissions straight. Even autoconf sets permissions better than, say, pip, where I have to remember to check my umask before I "sudo pip install something".


In my local group of haskellers, nixos is taking off as a system level package manager that does virtual envs, and multiple package versions pretty well. I haven't had a chance to play with it, but perhaps a sufficiently smart package manager can do both?


The problem Cabal is trying to solve here is supposedly more difficult than that solved by Gem, Pip, and other package managers. GHC does some kind of cross-library optimization that causes dependency problems in routine situations.

I agree we shouldn't have one package manager per language, but I think so far no one package manager has proved sufficiently general and robust to serve all use cases. For example, I think something like Nix would be wonderful for all language communities, but it doesn't run on windows. That immediately takes it out of the running. And as far as I'm aware, all other package mangers would have the same "Hell" problems of cabal because of the GHC optimization mentioned above.


It is more difficult. By design Haskell shifts an enormous number of failure modes to compile-time and as such cabal gets an unnecessary amount of blame for library bugs. If you ``pip install`` a package that transitively dependencies on another library where the author has accidently introduced a backwards incompatibility in the API that occurs in 5% of use-cases, pip won't detect the change and will happily install the package and 95% of people will assume no problem exists until they hit a runtime failure. The job that pip has to do is trivial actually. In Haskell if there is an incompatibility that breaks the interface, and in the presence of "wild west" packages there will be, then it will manifest as a compile time build error instead of proceeding.


Not only is the package management problem more difficult, it's not the problem that Cabal (the Common Architecture for Building Applications and Libraries) was intended to solve! cabal-install is not a package manager; it can't uninstall packages or even record which packages it has installed.


Let's not forget they are all garbage from somebody's perspective.

I know Python. Would I prefer to write my package manifests in some bastardized Ruby DSL or Python? Of course I'd prefer Python and the Ruby guys would prefer their bastardized DSL. I don't blame them.

How do we bridge that gap? Python => Ruby => Python is bad enough. What about M4 or AWK or Makefiles or Haskell or god help us all some shitty custom format ala Puppet?

You might as well demand everybody on Earth speak English.


> You might as well demand everybody on Earth speak English.

That's working out quite well so far.


>> You might as well demand everybody on Earth speak English.

> That's working out quite well so far.

Not really. English is a minority language, spoken as a primary language by only about 5.5% of the world's population.

http://en.wikipedia.org/wiki/List_of_languages_by_number_of_...

One more juicy anecdote: there are more Chinese people learning English right now, than there are English speakers in the U.S..


Why are you considering native speakers to be the relevant statistic here, rather than total speakers?


Secondary language is good enough. As long as I am likely to be able to communicate with a random person on Earth, I'd say the goal is achieved.


Your postscript seems to confirm GP's point. b^)


Not if the students' plan is only to remain competitive with native English speakers. Knowing how to speak English as a second language, for pragmatic reasons, isn't the same as being an English-speaker in the sense of the parent post.


Why does the reason matter? It's the ability to speak it that matters.

You don't need to be an expert on the language used in your package management. 'second language' is a rather apt analogy.


> Why does the reason matter? It's the ability to speak it that matters.

Yes, I basically agree. I think there's a difference between a native speaker and someone who can order food in a French restaurant -- I mean, without getting snails when what he really wanted was escargot. But I agree, it's not a very important distinction.


I don't know that that's a sign of it working well...


I think a lot of it is that people are scared of automake, so they reinvent their own build systems, with varying degrees of badness.

Then they claim that theirs is "easy" but they've only done the first half of "simple things should be easy, hard things should be possible".

A properly autotooled package offers an amazing level of control about where to put things.


Some package managers are part of build systems too, e.g. the Maven ecosystem. Now in what language should you write extensions or plugins? There will be djihad.


Cabal is by far the worst thing about Haskell. With no other programming language have I ever had to deal with these difficulties. It made me stop using Haskell.


Mostly because these difficulties are front-loaded by the Cabal system. Version/implementation conflicts are faced by all other package managers, but most solve the problem by (a) waiting until you notice your program has broken and then (b) suggesting you nuke your package sandbox.

Cabal just tells you up front when this will happen which can be annoying. That said, the prior mechanism requires that you notice when version mixes are causing issues.


I understand that is's a very hard problem to solve, but there has to be something better than the Status Quo.

A couple of months ago, I wanted to write a simple Web App using Yesod. The only way to get a working installation of yesod and all the dependencies was by using the sandbox thing. This however meant, that after every code change, you had to wait ~10 secs for the project to rebuild. That doesn't sound like much, but it gets annoying very quickly.


Yesod is a very challenging project to build. I'm not a user of it so I can't speak from personal experience, but the common understanding seems to be that the particular constellation of Yesod packages together hover very close to unbuildable. They are also all together quite large and rely on a lot of compilation-time tricks (template haskell) which can increase build time.

I think it's unfortunate that Yesod gets billed as the premiere web framework for Haskell because it does require some extra work to get compiled.

In fact, the entire Stackage ("stable hackage") project, I believe, was originally developed in order to get some tooling which would allow Yesod to be consistently built. If you think you're going to be primarily using Yesod then I suggest you switch to Stackage [0]. Stackage prevents install issues by building the entire Stackage group together periodically to ensure that there are no build problems. The downside is that Stackage is partial and slow to track Hackage.

You may want to try Happstack: http://happstack.com/page/view-page-slug/9/happstack-lite-tu...

[0] http://www.haskell.org/haskellwiki/Stackage


"which can increase build time."

Yeah, that's the biggest hit to my productivity, when working on a moderate sized Yesod project. Otherwise, the experience has been great.


It should only be a significant problem when you edit modules with TH. I don't know how often that occurs within a Yesod project. I imagine you can accelerate builds by isolating the TH modules.


Well, a good portion of meaningful additions touch at least one of config/routes and config/models, which means rebuilding a whole bunch. There probably is some way to better isolate things, but I'm not immediately sure what it looks like while still getting the DRY and static checks (which are certainly valuable!).


Yesod is infamous for being breakage prone. In fact the author now only supports users installing yesod via stackage rather than Hackage, because stackage forces globally coherent fixed versions for every package in stackage. (Basically that every package should be build able together). If you're doing application dev (rather than lib dev), stackage will help.

Or use snap, I've never had build hell with snap. But if yesod works for you, awesome! :-)


Yeah. I will probably never touch yesod again after what it put me through.


Perhaps someone more knowledgeable will chime in, but I don't think there should be any increase in incremental build times just because you're using a sandbox.


That sounds correct.


That's not a sandbox problem, that's the cost of developing in an AOT-compiled language. Sandboxes do require you to recompile Yesod and all its dependencies (> 10 minutes) every time you start a new project though, which is pretty annoying.


I agree that it's not a sandbox problem, but it's not inherent to "an AOT-compiled language" - builds of my yesod projects take disproportionately long compared to my other development, almost all of which takes place in compiled languages.


The C# compiler is insanely fast, so it's not for all AOT-compiled languages, just ones that require the compiler to do a lot of work.


[AOT = Ahead Of Time, for anyone, like me, who didn't already know]


It think that's a part of it, but I don't think all. As I speculated in a previous comment (https://news.ycombinator.com/item?id=8207860), I think the Haskell ecosystem may well wind up with more breaking changes at the root of diamonds in the dependency graph than other ecosystems, for a few reasons.

I've been meaning to try and gather some empirical data as to whether this is actually the case.


I was wondering if someone could explain something to me. I understand the issues involving inconsistent environments, but bad constraints don't really seem like a particularly challenging issue to me.

If you have libraries A and B which each require mutually exclusive versions of library C, what is stopping the compiler from adding both versions of library C to the compiled code for A and B to call separately?

Obviously this would result in larger binaries and should generate a build/install warning, but it seems like this would solve the overwhelming majority of Cabal's dependency resolution issues.

There's clearly something I'm missing since this has been an outstanding issue in Cabal for quite some time. Would someone with a better understanding of GHC/Cabal's compilation/linking/installation steps mind shedding some light on this?


I maintain the package manager for the Dart language. This question comes up all of the time. The problem is that these libraries may interact at runtime:

    * A gets a value from its version of C.
    * It passes that to your app.
    * Your app gives it to B.
    * B gives it to its incompatible version of C.
Haskell may handle this scenario differently because of its type system, but in Dart, this would cause lots of user problems.


That's exactly what can happen. Cabal endeavors to prevent multiple versions of a package from being installed simultaneously, but as this post mentioned Cabal does not maintain global consistency so if you do multiple independent installs in the same database then you can wind up with this problem.

In particular this is, for whatever reason, a frequently occurring problem with the `bytestring` library. There are a lot of SO questions related to figuring out why the compiler is rejecting the use of a ByteString type even though it appears it should work... but that's because the compiler error is eliding the package version difference.


This shouldn't really be a problem in a statically typed language. If A is using C, but all the uses are hidden, then the type system guarantees that the values from C can never be inspected by anyone outside of A. So B can use its own, different version of C.

On the other hand, if C is present in A's interface, then it s version has to agree with all other versions of C visible in the same unit of code (package/module/namespace).


That fails in the presence of subtyping, unfortunately. A could expose a C in terms of some supertype not defined in C. Then that object gets cast back down to the more specific type and handed off to (the wrong version of) C.


Well, in that case, the same class from different versions of C should be considered distinct (i.e. when the compiler sees `a instanceof C.A`, it actually translates it into `a instanceof C_V1_53.A`, where the version is appropriate to the current module (and so is different if accessing `C.A` in modules `A`, `B`, or potentially the main program).


Hmm, npm with Node.js allows that to happen. I don't recall that causing any problems, though. The "solution" seems to be that B is tested with both A and C, and when you're developing B, you know what kind of values you get from A, and you know what kind of values C accepts.


It doesn't cause problems most of the time since most dependencies do tend to be encapsulated. When they aren't, JavaScript's dynamically-typed nature can also give you some slippage: you may get an object from a different version of yourself, but as long as it has the same properties you expect, it might, mostly, do the right thing.

This is also a better fit in JS where the community seems to prefer packages that don't expose many real "objects" with methods and stuff. Packages tend to expose bare data-bags and functions from what I've seen. That makes "I got an object without the methods I expect" errors less frequent.

I personally feel that approach is too unsafe and definitely wouldn't have been a good fit for Dart, but it does kind of sort of work for npm.

Of course, it's totally broken in the presence of dependencies cycles, which is why npm now has "peer dependencies" and is right back to having to resolve shared constraints.


Anyone know if there's a formal name for this situation/problem?

Thanks for the clear explanation BTW.


You probably would have good vocabulary for this in the ML family languages. You're essentially talking about existential type mismatching—a core component of how modules work in ML family languages.

It's just hard to read that into Haskell since "modules" are not as well represented in-language and exist somewhere between Haskell-the-language and Cabal-the-package-manager/ecosystem.


I don't think there's a well-known name for it. I'd call it something like a "non-encapsulated dependency".


I think that would only work if neither A nor B exposes anything about C. I'm not a Haskell user, but certainly in Scala and Ruby there are plenty of libraries where library consumers see objects from underlying libraries.

Suppose C is a JSON library, for example. In my code I fetch a JSON object from A, with library version C1. What happens when I pass that object to library B, which uses C2? Is that an immediate type violation because C1 and C2 are treated as different types? Do you live with runtime failures when, say, B calls a C2 method not in C1? Do you you try to create some sort of magic object conversion and hope that the data isn't too different? Or do you try to require all libraries to describe how to convert data from every version to every other version?

Even if the two libraries never interacted and data could be proven never to flow between them, there's a real question as to what version of the library is available from the main code.

Despite my concerns, I suspect that you could make this mostly work if you were cavalier about possible runtime errors, because actual conflicts would be rare. But from what I understand of the Haskell community, they're not big on "cross our fingers and hope it works at runtime" approaches.


The first thing that needs to be done is to acknowledge its existence. Often Haskell zealots pretend it's not a Cabal problem but a user problem. So this article is indeed a very good start.


I think it's not "a cabal problem", in the sense of cabal being somehow awful and "doing it wrong" where other systems "do it right".

The symptoms are a result of the confluence of a few things (mentioned elsewhere in this discussion), some of which are quite positive, which make the problem substantially harder. Which isn't to say there's no problem, or that nothing could or should be better - I'm glad it's getting attention and I'm glad people have ideas for improvements, though both of those have been true for a while.


> The feeling of powerlessness one has when Cabal does not do what one wanted and one does not know how to fix it.

I experienced this immediately after I read Learn You A Haskell and it made me give up on the language. I develop on Windows (currently?) and I was passionate about creating my first hobby project in Haskell. But every direction I turned, I ran into an issue where a dependency or transitive dependency expected some linux library to be there and I couldn't install.

Maybe I'm just spoiled and need to be more open minded. I come from Java-land where I take "write once run anywhere" for granted. I eventually switched to Clojure but that was unsatisfying for different reasons. I wish there was a language with Haskell's purity and type system but Java's ability to "write once and run anywhere".

I think Frege is very close, but I feel uncomfortable learning something with such a small community (damn... I am spoiled).


>I develop on Windows

In my experience, this sucks in basically any language.


Then your experience is rather limited.

As a counter-anecdote, my experience is that Windows is frictionless for developing PHP, Node.JS, C#, Java, Scala, Elixir/Erlang. It's also close-to-first-class for Python. Ruby to a lesser extent, with a lot of the community just assuming you're on OSX, period.

I've often been impressed by how, despite the hacky nature of the Node.JS community, a lot of stuff simply just works there. For example, getting PhantomJS, a full-fledged scriptable headless browser, available for a project is a single "npm install" away on all three major OSes.


That is definitely impressive of the Node.js community! Good on them. Also great to know Elixir is working well on Windows, yet another feather in its increasingly-well-plumed hat!

I remember a get-Windows-working-better rallying call at a Ruby conference I was at 4 years ago, and as far as I can tell, things have not changed much since then. The OSX thing isn't quite fair though - everything works on linux too, because that's what people deploy on. In fact the trend seems to be toward developing against containers running your production version of linux, rather than OSX directly.


Funnily, one of the main reasons I switched to Linux was because writing Python on Windows was like hell. I couldn't get pip installed, and packages were an absolute pain to install.


What kind of development environment do you use (I'm particularly interested about Node and Elixir)?


Well, I think I agree with you, but that wasn't the GP's point. The point was not that developing on Windows was bad, but that Haskell package management was bad on Windows. The GP was not yet running into the problems that you're referring to.


I had no idea people were having problems with Cabal! It's always worked very well for me. It seemed like one of the better build systems out there.

I suppose a lot of the issue comes from the fact that Cabal is very careful about everything. It seems like most package managers I use have a "try and maybe fail" attitude, while Cabal seems to have more of a "guarantee success or fail" attitude.


Nix is getting mentioned a lot in this thread. If anyone's interested in NixOS feel free to email me. I switched to it on my main computer about a month ago and might be able to give some perspective on what it's like.

My config is here: https://github.com/seagreen/vivaine


Could NIX be used as a language package manager? Does it allow installing on cutting-edge code from random repositories? Can you only download something, without installing it? Can you access multiple versions of the same library from a single "project" (e.g. by using different aliases for different versions)?


Yes and with ease for the first two. Yes, but you may have to write some Nix code for the last.


As someone who has tried to compile pandoc from source, I welcome this.


You know you play too much Destiny when...

But seriously. It's so difficult to realize how much of a game changer a great package manager can be. I've never tried Haskell, but is this really THAT much of a problem? I hear so much praise and worship about Haskell...


I think the Haskell folks would do well to consider a complete rewrite of Cabal and look at something like Leiningen, Npm or even Opam as a base from which they can start with their new version.


That doesn't solve the problem. The problem is that the problems Cabal faces are real, which is to say, faced by every language that allows you to use packages from external entities, and the solution almost every other package manager uses is to sweep the problems under the rug and hope for the best. Haskell the language can't do that very well (because it is already careful under the hood), and Haskell the community won't do that.

I'm actually intrigued to see what comes out of this. When the Haskell community turns its mind to something like this, something quite interesting tends to come out.


Like many people have pointed out, cabal and npm are not solving the same problem. npm appears to "work" more often because a whole class of problems in Javascript program are deferred to runtime while in Haskell all interfaces have to be compatible, link properly and compile. That's why "cabal hell" is such an imprecise term because on top of the version bound issues it's also used for any and every failure that happens when using cabal, even if it's the library in question that's to blame and not cabal. If you want to consider a similar problem, take 20 interdependent C++ libraries and try to link them into a single application using just version information and see how often that works out of the box.


You can (and I do) use npm and Node.js with type checking at compile time using TypeScript. Although to be honest it doesn't play that well with different versions of the same type definition, but in principle you could have two header files of the same library and the compiler will create an error on a type mismatch.


I think that's only a part of it. I think that it's also the case that the Haskell ecosystem has more breaking changes at the root of diamonds in the dependency graph - which isn't something a build tool can solve (or at least isn't something any build tool does solve).


In my experience the dependency graph often has many more nodes as well. I'm not sure if this is because Haskell code is especially reusable or Haskell developers are especially lazy.


There is an ambient culture of code reuse at a level I don't see elsewhere, that's one of Haskell's strength's in my opinion. Though it does make certain packaging problems more difficult.


High levels of abstraction and well defined edges mean more opportunities for reuse, which probably contributes to both.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: