Hacker News new | past | comments | ask | show | jobs | submit login
Coping strategies for the serial project hoarder (simonwillison.net)
224 points by usrme on Nov 28, 2022 | hide | past | favorite | 38 comments



I write down every thought in the open so I have a log of what I thought. See my profile. Some go on to become projects.

If I want to do some feature work on the thought I create a GitHub issue and write implementation notes, I did this on samsquire/hash-db for document storage.

I write English descriptions of the mental model then implement.

Some people say code is the documentation. This is unfortunate and means that thinking ends when the code is decommissioned its lineage ends and cross pollination never goes on.

The lifespan of an English description on Wikipedia of an algorithm or project description outlives its implementations.

Please document your mental model of the problem you're solving.

As an analogy the documentation and thinking behind the implementation of Windows 95 (and it's user interface design thinking and guidelines) is outlived by its documentation and screenshots.

University and research publications whitepapers outlive the code written during research and potentially many implementations.


> Some people say code is the documentation.

McConnell has a really good quote on this from Code Complete:

  The information contained in a program is denser than the information contained in most books.  Whereas you might read and understand a page of a book in a minute or two, most programmers can’t read and understand a naked program listing at anything close to that rate.  A program should give more organizational clues than a book, not fewer.
I comment meticulously on the why so that if I retrace my code 3 months later, I can remember the decision making context for a particular implementation. It also helps me hand off my work so that anyone can pick it up quickly.


Regarding information density, I don't think this is necessarily true. Quantifying understanding between a reader and a text is always ambiguous, where the results of programs are intentionally designed to not be ambiguous (with varying results). Consider the ink and blood spilled over the implications of a single word in a holy text.

Saying that a reader understands a page or two of most books in a minute or so is like "understanding" the rules of Chess but being unable to actually win a game.

What differs between books and programs is the consequences of getting that sense of understanding wrong. When reading a book, the results of misunderstanding or skimming are rarely serious (outside the realm of technical manuals, textbooks, and holy books) whereas the programmer and user usually immediately notice that something is very wrong if the programmer took a wrong turn in their understanding.

Academics paid to "understand" non-technical books face so little consequences for misunderstanding that no one even agrees what understanding is. James Joyce's Ulysses is a crypto-ur-fascist work prefiguring the ascension and demise of Nationalist Irish America as found in the correspondences between Buck Mulligan and Donald Trump? Well, I certainly can't tell you that's wrong. Half of these terms would require volumes themselves to meaningfully define to an extent that a conclusion could even approach falsification.

This failure to understand misunderstanding is also pervasive in the majority of legal systems. They try to define terms and precedents in volumes and still have dogmatic disagreements about meaning and understanding.

Try doing this in a codebase and see how far you can get. "As you see, this comment in the Linux kernel was written by Linus Torvalds. Torvalds had many documented conflicts as a result of his frequently acerbic personality. We can date this comment to after his foray into improving his interpersonal skills. We can infer from the use of language found in the manuals of the University of Oregon's Anger Management school that Torvalds is emotionally disturbed yet diplomatically declining to accede to the Rust development team's demands for special privileges. Seeing that Torvalds, the creator of Linux, wanted to originally decline these changes, I have put forth a commit to undo all commits initiated by anyone associated with the Rust development team."

On the other hand, if programmers were to instantiate legal and semiotic assertions in Prolog from first principles, suddenly there's a legibility and granularity that makes the entire "meaning" and "understanding" game pointless because we can directly see and talk about where the logical assertions fail to correspond or contradict.


I don't understand human-query-engine. The link to it in the <https://github.com/samsquire/wants> README is also broken.


I was interested in quantified self for qualitative statements and the social aspect of asking strangers questions to try reveal interesting things.

I enjoy using Quora.

I generate random questions based on permutations of templates a bit similar to a madlib generator but for questions.

https://github.com/samsquire/human-query-engine

I had another project that could collect salary data and stock market portfolio's based on statements you add to the system. it was a personal data collection syste. that was programmed to produce useful functions based on what you tell the system in the form of templated statements.

The closest software that existef at one point in time is Mozilla Ubiquity which I loved. It was a twitter CLI for Firefox.

The idea was meant to be used to give targeted advice and judgements or evaluations to people. It's a bit similar to a choose your own adventure book but with advice or a problem flow chart.

This attempt is here https://github.com/samsquire/living-documents And here https://github.com/samsquire/fact-collector

This one uses prolog to collect statements then shows statistics of your collection.

Think of it similar to Siri in reverse. Rather than asking Siri for an answer for a question, you are asked a question and the answer is used to generate something useful. The problem then becomes thinking of useful questions that you can do automation over

Simply having a twitter style tweet that allows you to say "I am at London", "I am eating this" then calculate calories or nearby trains home or travel information.

My other attempts are here This one calculates train journeys using neo4j It parsed plaintext statements for a "I want to travel between <station> and <station>"


> This is the most important tip: avoid side projects with user accounts.

> If you build something that people can sign into, that’s not a side-project, it’s an unpaid job. It’s a very big responsibility, avoid at all costs!

Interesting for sure. But what about those of us that hoard side products?


> The key trick is to ensure that every project has comprehensive documentation and automated tests. This scales my productivity horizontally, by freeing me up from needing to remember all of the details of all of the different projects I’m working on at the same time.

Good insights. Both great steps to being able to re-learn & experiment (or maintain/fix) quickly.


That was a gem of an article. I wish I had read something like that years ago.

Last year I started to break up Common Lisp code monoliths for personal projects into small libraries that are installable with Quicklisp, with applications also Quicklisp installable that are smaller by reusing libraries. I have started doing this to a lessor degree with my Python side projects and in the last week I set up a personal library system for my occasional Chez Scheme projects.

I bookmarked the author’s two templates for new Python library and command line tool projects.


I'm a compulsive project hoarder and they all build on top of each other. My first project was an open source full stack real-time framework I built about 10 years ago. I worked on it for about a year but it didn't get any traction so I pulled out some of the core logic and turned it into a stand-alone real-time remote event library; then within a few years, it got some traction. Then as WebSockets was becoming mainstream, I migrated my framework from HTTP long-polling to WebSockets but realized that it didn't add much value anymore, so I added pub/sub and RPC functionality to it (I also built some additional components so that it would run as a self-sharding cluster on Kubernetes).

Then I was working in the blockchain space and so I decided to build a lightweight quantum-resistant blockchain using my pub/sub library for peer-to-peer messaging... Then after I finished the blockchain, I decided that it would be fun to build a decentralized exchange on top of it and so I did. It's been running for a few years without issues though it's low volume but the community around it is dedicated.

Now I'm looking for new things which I could build on top of the blockchain and DEX. I'm thinking to use it as a payment system which accepts multiple cryptocurrencies interchangeably. I have a few ideas but I'm more focused on earning money nowadays so I'm hesitant to start anything new. Lol. After all that work, I only earn about $1000 per month in passive income. I'm a brute-force entrepreneur.

It's pretty easy to maintain all this. Like the author says; tests help, but even more important is to keep the number of third-party dependencies to a minimum and to avoid using overly niche programming features or features which are unlikely to be forward-compatible.


> Technically I’m actively maintaining all of them, in that if someone reports a bug I’ll push out a fix.

Ironically, I toned down my enthusiasm for this author's (many) projects after my initial perusing led me to something interesting[0] that didn't work, and the subsequent issues and minor (but linked to issues!) PRs I contributed went completely without response for the last few years. They're still open.

To be clear, I'm grateful for the work the author is freely providing for me and the world! And I could certainly do a better job with some of the projects I help maintain as well. He's under no obligation to respond to issues if he doesn't have time or just doesn't want to. But it does speak to how difficult it can be to maintain over a hundred projects, even if you have a system.

[0]: https://github.com/dogsheep/healthkit-to-sqlite


That is 100% fair criticism. I'm definitely not as on top of issues and PRs across the long tail of my projects as I would like to be.

I've decided that learning to live with that is the price I have to pay for running with way too many projects!


I was trying to find an issue management system for doing “the perfect commit” and tried loads without thinking of GitHub issues, which is obviously the correct place to put things. Exciting to hear about GitHub projects too which sounds fantastic!

I would love to watch a day of programming by someone as prolific as Simon to see what other things he does to maintain speed and keep churning out code!


> GitHub issues, which is obviously the correct place to put things

Unless you care about project portability. In which case, GH issues are obviously the wrong, locked-in place to put things.

Not that there is an obvious solution to this problem. There are a few tools trying to fix this, but they're still a bit awkward. I wish Git added a somewhat-native/idiomatic way to deal with it.


(Disclaimer: I work for GitHub, but not on issues, and I've only been here 6 months)

The counterpoint to this is that GitHub has been around for a long time and I don't think all of these free features are going anywhere. Even beyond that, there's an API you can use to get your data out and lots of tools build on that API.

https://github.com/MichaelMure/git-bug came up here recently and that looks pretty good, though I haven't tried it. I suspect the usability of a lot of these tools will lag behind GitHub Issues/Projects.


> GitHub has been around for a long time and I don't think all of these free features are going anywhere

It's still an external system you don't control the fate of, run by an private company, fundamentally unaccountable, and a massive single point of failure. For some people, this is not a good thing. It's a bit like using GMail vs an actual mail client.


Yes people were looking to use git notes for this but it currently looks hacky - maybe a .issues folder and some cli tooling could help here but I guess getting consecutive IDs could be a pain/cause conflicts. Sub-issues might also be difficult.


Do we really need consecutive numbers? Any short letter-number combo would work fine, which could be done with simple hashes.

I reckon git has already all the low-level features it needs to make this work, it just needs a bit of porcelain on top.


I think project managers might freak out at `PROJ-3eff1a` but maybe you work with more technical managers than I do. How do we link to these issues, I suppose you could build a read only ui for them or maybe each issue is a folder with a README.md in it? It’s a lot of work to make something useful at work.


Surely for a manager in tech, it wouldn't be too difficult to use "3eff1a" where they used "#2345".

> How do we link to these issues

The same way we link to commits today? As in, we don't - not in the tool itself. The critical part is that such managers will want a nice web interface, not a command line tool. That's the porcelain I mentioned.


Very inspirational.

“Massively increase your productivity on personal projects with comprehensive documentation and automated tests”.

Is the possibly more enticing subtitle, for people who don't automatically click when they see the simonwillison.net doman, but frankly it goes beyond that to something more like "Getting Things Done for open source side projects"


I'm trying to manage my projects mostly like the steps the OP takes. The only thing that I would add, which has helped me is, is having build automation in place as well (CI/CD or just scripts like a Makefile).

This could fall under tests, but I've found that having concrete steps written down on how to build the thing helps. As there's many times something that's easy to forget that has to happen for things build correctly if I've stepped away from the project for a while.


I'm now religious about adding Makefiles to projects with standard commands to get them up and running. It's built-in, works across languages, overall a win when you switch often between things!


Considered using Bazel? Perfect for what you described.


Writing tests? For personal projects? More like "Coping strategies for the project hoarding masochist" amirite.

I suppose it depends on what you're trying to do, but the thing about personal projects is that you don't have to maintain them :P


If your personal project is small and self-contained enough that you write it once and then you're done, then sure. My projects tend to be much larger, and be works-in-progress. As annoying as it is, taking the time to plod along and do proper testing has saved me time and time again. If you only have a few hours a week to work on a project, the last thing you want to do is spend 8 hours (AKA 3 weeks) figuring out why something isn't working.


Really depends on the size yeah. If you're working on some 20k+ code megaproject then it makes perfect sense, but for the average small side project I feel like the code and approaches change so quickly that you'd just spend half your time rewriting and fixing broken tests for no real gain.


Hmm, well my "small recent" project, that I got to my own personal MVP standards, is already at 2.6k lines of code; and my long-term project is at 9.4k lines of code -- not counting the little side libraries where I put more generic code I thought might be useful elsewhere. So, I guess you're not too far off.


I got the impression that by "some 20k+ code megaproject" the GP meant 20k+ files, not LOC.

But I could of course be wrong.


That's why I gave this talk: intuition says that writing tests for personal projects would slow you down and make you less productive.

But I've found the opposite to be true! My capacity for personal projects dramatically increased after I started habitually writing tests for them.

It turns out tests aren't just useful for large projects with multiple contributors: they enable horizontal scale for small projects with a single contributor too.


As someone who doesn't do this as much as I should, I agree.

On projects that are well covered with tests, I find I'm less afraid to pick it back up again and start working on it, because I know there is less chance of some missing context I had in my head months ago and now is forgotten and my change will break the app.

Most of my small projects have automated deploys using ansible, not because I couldn't do the deploy manually (in most cases it's ssh to the server, git pull, run migrations, collect static, restart the server), but because with ansible I don't need to remember the manual steps.

I am sorely lacking in the documentation area tho :-) Your post prompted me to try to improve there, as I can definitely see the benefit.


Definitely helps with having the confidence to pick something back up and start working on it again without worrying about breaking stuff.


[flagged]


The author isn't trying to "add pointless best practices and tools into your workflow". He's describing his personal strategy that allows him to handle some 100+ projects in parallel. It's just a bunch of opinions that you're free to adopt, ignore, or use as inspiration. Personally, I think his description of a "perfect commit" is spot on. I've arrived at roughly the same principle (minus the issue link) myself, and I can speak to its benefits.


Although sometimes you don't need to read the article in order to comment on it, in this case it's necessary to at least to have a brief look, otherwise your comments makes no sense in the current context.


Heh, that's an elegant way to avoid certain dang-ers...


Idk. I’ve worked with tools like he mentioned.

Recently I set up a media server and didn’t want to mess with Ansible to provision the VMs and containers or Kubernetes to manage the configs. I figured I’d be wasting time getting stuck on little devops details when I can just hand-roll everything once and make a backup. Well, it’s been a couple of weeks and I’m still making changes here and there - and there’s no infrastructure in place to rebuild it from a clean slate. That’s ok, I guess for home use, but it doesn’t feel ideal.

I had another project which was a server+iOS app which got canceled and I wanted to rebuild it recently. There wasn’t any of this automatic infrastructure in place anymore, well, it wasn’t open source so I’m not sure if I could have even done it like that on GitHub. But yeah, when I pulled it down it was nearly impossible to run without changes and adaptations to the code in order to get 2018 code (and dependencies) running on 2020+ systems and language versions, and building in XCode. Actually the easiest way to build the iOS app seemed to be to use a machine with Catalina still installed and an older version of XCode. I’m sure you’ve all been there before.

Anyways, yeah, it feels like a headache to pile on a bunch of best practices and “infrastructure” anytime I want to do something with code, whether that’s write a small web service or a script that runs on laptops. But, if you make it a part of your daily process, and you have a repeatable way to init new projects quickly with identical tooling, maybe the tools get out of your way and you can ultimately be more productive and ship more reliable code? I think that’s what the author is getting at.


I guess don't click on it then. It is clear from the title what this post exactly is.


how is this comment relevant to the submission?


Why do you feel documentation is pointless?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: