Hacker News new | past | comments | ask | show | jobs | submit login
The golden rule of software distributions (haskellforall.com)
56 points by todsacerdoti on May 9, 2022 | hide | past | favorite | 21 comments



> Stackage works by publishing a set of blessed package versions for all of the packages vetted by Stackage and these packages are guaranteed to all build together. Periodically, Stackage publishes an updated set of blessed package versions.

All the packages being intercompatible sounds extremely valuable, but how is it actually done? For example, how does stackage allow package A to continue updating despite some never-updating package B requiring an old version of package A? Drop B? Don't let A change? Ignore B's declared dependency constraints? Take over development of B? Require every package to have a contactable maintainer who responds to "you gotta update B" emails or else the package gets dropped within a month?


All of the above to some degree. I can name some specific examples of each of these occurring:

* For an example of "Drop B", at one point the haskell-lsp package was deprecated in favor of the lsp package, so all downstream packages had to drop the haskell-lsp package as a dependency and migrate to the lsp package (I personally had to do this)

* For an example of "Don't let A change", that might happen for some period of time, although not indefinitely. The most obvious example is holding back the compiler version. For example, Stackage was on GHC 8.10 for a while, even after GHC 9.2 was released, due to breakage introduced in GHC 9.0.

* For an example of "Ignored B's declared dependency constraints", this is extremely common, especially when the `base` package is upgraded (since many packages have conservative upper bounds on their `base` dependency which can often be trivially bumped without issues).

* For an example of "Take over development of B" the `aeson` package for JSON support is one example of this. More generally, this happens when abandoned packages get adopted by the Haskell organization.

And Stackage does require contactable maintainers for supported packages. There are some exceptions to this rule, though. For example, sometimes a package gets added where the maintainer is available, but a dependency for that package was not yet on Stackage. I believe you can either add that maintainer to be the contact for that dependency, too, or it can be an orphan package. There are a bunch of people who fill in the maintenance gaps in the ecosystem by fixing these packages that don't have official contacts or active maintainers.


I do have to point out that what Stackage is not doing something super novel here; it is essentially the same work that traditional distros like Debian etc have been doing since the dawn of time, especially for C libraries.


> If a package manager only permits installing or depending on one version of each of package then ...

So just allow installing more than one version of a given package? Problem solved :)

> A locally coherent package manager requires a globally coherent software distribution.

Ah, so the implication is that "local incoherency" is bad? There are certainly some specific packages where you don't want multiple versions in use at the same time, but for most packages it's fine. For the sake of discussion, let's call those packages where having multiple versions is a problem "key" packages.

The problem with the solution in the article is that it requires some third party (ie. Stackage maintainers) to do a ton of extra curation work, plus every package that might depend on a "key" package needs to do work to make sure they're using the same version as everyone else if they want their package to be included in the next curated set.

An ideal solution would only require extra work for the maintainers of those "key" packages. This can be achieved by splitting the "key" package into two parts:

1) The "key" part of the package - only one of these can be in use, but has no public API. 2) The adapter package - depends on the "key" package and exposes a public API.

Since the adapter package is not a "key" package, it is fine to have multiple versions, and since no other packages will directly depend on the "key" package, there's no extra work for anyone else.

The maintainers of the "key" package will continue to release patches to older versions of the adapter package to ensure that all adapter versions are compatible with the latest version of the "key" package.


Hum... Let's say I install the package A, that has the type X, so I can use A.X in my code. But I'm using it's version 1, so let's call it A1.X.

Now, all of my dependencies expect values of A1.X. Except for one, that upgraded recently and expects A2.X. Now I have no way at all of taking a value from one library and sending it to that new one. It's only useful it it's some stand-alone functionality that I must call directly from my code (like the vast majority of libraries in OOP, so, maybe you are biased?) and not integrate with anything.


That's certainly an interesting case. A "UUID" package is a good example of this, since UUIDs are often not in the standard library, but are often passed around between other packages.

There are two approaches to solve this that I know of:

1) When releasing UUID 2.0, also release a new UUID 1.X which adds a dependency on the new UUID 2.0 package.

Then either: a) Re-export the UUID 2.0 type from the UUID 1.X package.

b) Add conversion functions/operators to the UUID 1.X package.

This approach is nice because there is no extra overhead for the 2.0 dependency, all of the extra compatibility code only exists in the 1.X package.

2) Discourage passing around UUIDs directly. Instead have a trait/interface/etc. called "UuidLike" or similar, that is implemented for UUID-like things, and have downstream code be generic. This trait would allow converting to/from any version of the UUID type.

A similar approach as in (1) can be used to share this trait across multiple versions of the package.

For this example of a UUID, approach (1) makes more sense to me, but in practice there may be cases where approach (2) makes more sense.

There may also be cases where it is preferable to just break compatibility. An example of this would be if the UUID 1.0 package had a sufficiently serious design flaw (eg. always assumed type 1 UUIDs or something). In that case the ecosystem churn may be preferable than trying to support something inherently broken.


In the case of the Haskell ecosystem, I think that achieving this golden rule is hard specifically because not every package conforms to semantic versioning.

For example, the newly released `mtl`, which is a dependency of many other packages, breaks backwards compatibility from version 2.2 to 2.3. Personally, I'd expect to be able to have a dependency requirement like `mtl = 2.*` and forget about it forever.

If package versions were 'updated' to follow semantic versioning (e.g. `mtl` is not implicitly a version 3), we might have a better view of the current level of coherence of the Haskell ecosystem.


I don't know how controversial this will be - but I'm wondering whether packages which break semantic versioning should be re-versioned by distributions to 0.X.Y versions (perhaps with a X.Y.? -> 0.X.Y transformation), until they prove themselves willing to maintain API stability enough for an official, community-accepted, non-zero major version.


Specifically for Haskell packages, they are not expected to change their interface on the 0. versions.

What is perfectly reasonable, by the ways, because it makes the schema basically useless. If people did it, very soon all the packages would be free-form 0.X versions, if not by their fault, it would be because of a dependency.

It makes much more sense to forcefully increase the version, instead of decreasing it.

(Also, notice your sibling that explains that mtl strictly followed the numbering convention.)


That's actually correct versioning in Haskell. The Haskell ecosystem uses a different versioning convention from other ecosystems where the first two components of the version number signal breaking changes. For more details, see: https://pvp.haskell.org/


Uh, I stand corrected. I've been developing Haskell for nearly a decade and somehow never saw this.

Thank you for bringing this to my attention!


You're welcome!


How would we easily quantify the coherence of a software platform, pre release? So we can state some subsystem is completely coherent or the overall platform is slightly incoherent?

In that terminology, Linux distributions are "mostly coherent" in their rolling beta release and the strive to become globally coherent for their release forks, right?


In an ideal universe, Debian Stable is globally coherent minus declared Conflicts. That is, there are packages that are mutually exclusive, but they are expected to perform similar functions such that they are reasonable alternatives for each other.

What do we call NixOS, where different users can have incompatible package versions but the whole system allows for that? 'Tolerance' rather than 'coherence', perhaps?


NixOS is globally coherent: the packages are specified using global identifiers like "python3", "vlc", etc. and each of these identifiers corresponds to a "blessed" version. But it is not locally coherent, because the hashes allow installing multiple versions of a package at once. So NixOS would be "locally tolerant", to use your terminology.


And namespaces are used to make them runnable simultaneously? So "dependence hell" is escaped with sufficient ram?


"Joyfully incoherent" comes to mind. :)


A rolling release is not a "pre release" but rather a continuous succession of releases, one for every package update. To the extent the rolling release has no broken packages it is completely globally coherent. And I don't think a failing package is necessarily a failure in coherence either - it just means the blessed version is broken. Incoherence is when you have A depending on C-1.0 and B depending on C-2.0 and the two versions of C not simultaneously installable (local coherence of C) - it is only the complex constraint-solving package managers that would be able to function with such a dependency tree. Such a situation is unlikely to occur in a rolling release - it is more likely to have two C package C1-1.0 and C2-2.0 which do not conflict.


Good point. We have to distinguish if it has been built coherently or if it actually works. I was obviously thinking of the latter, it's the one that's uaefzl. In that sense a rolling release may _claim_ coherence, buld coherence that is, while it still is locally incoherent (some packages don't work).


Oh. I have no idea how autocorrect transmogrified "the one that counts" into "the one that uaefzl".


Rejecting all but one version of a package does not scale.

This is why (thank you NPM and NixOS) we now have a strong trend of project-relative package trees.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: