More

kvnhn · 2024-02-10T05:44:00.000000Z

This is a variation on one of my favorite software design principles: Make illegal states unrepresentable. I first learned about it through Scott Wlaschin[1].

[1]: https://fsharpforfunandprofit.com/posts/designing-with-types...

kvnhn · 2023-11-11T00:55:43.000000Z

IMHO, Poetry is the best we have in the Python dep mgmt space, and it's still endlessly frustrating. It's especially hard to recommend it for newbies looking to get up and running with even a simple ML stack. Check out this thread[1] on the Kafkaesque nightmare that is trying to install PyTorch with Poetry.

[1]: https://github.com/python-poetry/poetry/issues/6409

appplication · 2023-11-11T03:24:36.000000Z

Poetry won’t ever be the answer because the Poetry maintainers haven’t shown the maturity to be real leaders in the space. It’s good, as many narrowly opinionated projects are, but ultimately the core maintainers are not interested in use cases they see as outside their vision for the tool. Which means it will never be the one tool to rule them all.

zwaps · 2023-11-11T03:34:12.000000Z

For ML projects, conda is still better since it usually manages to resolve a working environment including pytorch and cuda.

Sure, it doesn’t lead to the same exact environment on every machine, but that stuff never ever works anyway at least with portry.

cxr · 2023-11-11T04:49:53.000000Z

The whole "Python is how science gets done" meme is one of the dumbest things we've allowed to be foisted on otherwise unaware/unsuspecting users (such as the kinds of academics who end up being the victims of the Python ecosystem shitshow). Who knows how many setbacks in science we've suffered, not to mention billions of dollars of productivity lost, sticking to such an unworthy programming system/language/environment. All because, like, colons and significant whitespace make programming so much easier to pick up when you compare it to making someone look at curly braces—which, as we all know is the hardest part of programming.

physicsguy · 2023-11-11T07:54:16.000000Z

Once you start to look at scientific packages in other languages, they have the same issues as Python does, because they start to use scientific libraries written in C and Fortran, as rewriting 50+ years of code is actually really hard.

kvnhn · 2023-11-11T00:50:59.000000Z

PEP 582 was rejected, FYI.

https://peps.python.org/pep-0582/

adamckay · 2023-11-11T19:24:23.000000Z

Bugger.

Thanks for letting me know (embarrassingly I did load up the PEP page to make sure I remembered the right number but I didn't check its status).

Was hoping that it would make things simpler for smaller projects and newbie developers but the rejection reason is solid.

kvnhn · on Sept 28, 2023

I've used DVC in the past and generally liked its approach. That said, I wholeheartedly agree that it's clunky. It does a lot of things implicitly, which can make it hard to reason about. It was also extremely slow for medium-sized datasets (low 10s of GBs).

In response, I created a command-line tool that addresses these issues[0]. To reduce the comparison to an analogy: Dud : DVC :: Flask : Django. I have a longer comparison in the README[1].

[0]: https://github.com/kevin-hanselman/dud

[1]: https://github.com/kevin-hanselman/dud/blob/main/README.md#m...

kvnhn · on Oct 3, 2022

You might be referring to me/Dud[0]. If you are, first off, thanks! I'd love to know more about what development progress you are hoping for. Is there a specific set of features that bar you from using Dud? As far as testing, Dud has a large and growing set of unit and integration tests[1] that are run in Github CI. I'll never have the same resources as Iterative/DVC, but my hope is that being open source will attract collaborators. PRs are always welcome ;)

[0]: https://github.com/kevin-hanselman/dud

[1]: https://github.com/kevin-hanselman/dud/tree/main/integration...

kvnhn · on Aug 16, 2022

I very much agree with you about DVC's feature creep. The other issue I have with it is speed. DVC has left me scratching my head at its sluggishness many times. Because of these factors, I've been working on an alternative that focuses on simplicity and speed[0]. My tool is often five to ten times faster than DVC[1]. I'd love to hear what you think.

[0]: https://github.com/kevin-hanselman/dud

[1]: https://kevin-hanselman.github.io/dud/benchmarks/

nerdponx · on Aug 17, 2022

Thanks! I really like your clear explanation of how Dud differs from DVC (and I prefer your version in all cases).

Would it be possible for Dud to push/pull from a DVC remote and use the DVC shared cache? That would be really useful so I (iconoclastic free software user) could use Dud on my machine/acocunt, but still share data and artifacts with other people (who don't give a shit what tool they use) using DVC on their machines/accounts.

Also: Does Dud support reflinks at all? Or does it only support symlinks?

kvnhn · on Aug 20, 2022

Unfortunately, there's a few things that currently hinder compatibility with DVC caches. First, Dud uses the Blake3 checksum algorithm, and DVC uses md5. This means the content-addressed caches will have completely different file names. Second, directories are committed to DVC differently than they are in Dud. For directories, not only will the committed file names not match (due to point 1), but the contents will not match either. Both of these things could be addressed, but it would take a lot of effort and would likely cost Dud in terms of its two main goals, speed and simplicity. I'm not opposed to this if we can make it work, though.

Dud currently does not support reflinks, but I think adding reflink support would be fairly straight-forward. Just curious: What filesystem and OS are you using for reflinks?

I'd be happy to chat more about this. Feel free to open GitHub issues for these items. I welcome contributions as well. ;)

kvnhn · on July 25, 2022

I don't know about replacing Make with Docker, but I use the two together to good effect. One of my favorite hacks is adding a 'docker-%' rule in my Makefile to run make commands in a Docker image[1]. It's a bit mind-bending, and there's a few gotchas, but it works surprisingly well for simple rules.

[1]: https://github.com/kevin-hanselman/dud/blob/e98de8fcdf7ad564...

jfkimmes · on July 25, 2022

I toyed with a similar setup recently and I had the problem that I, too, would have rules depend on a "docker build"-step (like `docker-image` in your example). Usually `make` would stop building the dependency for non-PHONY targets if it finds the correct file but in this case it obviously cannot find anything.

I tried `touch`ing hidden files for each step and then add those as a dependency but that is not very elegant. Do you have this problem at all?

kvnhn · on June 25, 2022

A thousand times, yes. I've wanted to write this same article. Thanks for saving me the time!

The industry is going to great lengths to avoid writing configuration in any ubiquitous imperative programming language. We're seeing the proliferation of hyper-specialized, clunky declarative languages with sub-par tooling and package ecosystems. In what world are templates acceptable code? I don't mean to pick on anything specific, but this[0] is the most recent example I've come across, and it's far from the most unreadable examples.

[0]: https://github.com/traefik/traefik-helm-chart/blob/master/tr...

throwawaymaths · on June 25, 2022

Is it? Terraform has been around for ages.

kvnhn · on May 7, 2022

Here's an example I whipped up: https://github.com/kevin-hanselman/tailscale-forward-auth

kvnhn · on April 13, 2022

Dud author here. I'm guessing this was shared after I mentioned it[0] in the comments on this recent post[1].

Thanks, kind stranger, for sharing! I'm happy to answer any questions.

[0]: https://news.ycombinator.com/item?id=31008992

[1]: https://news.ycombinator.com/item?id=31006003