Hacker News new | past | comments | ask | show | jobs | submit login
How we manage 200 open-source repos (turbot.com)
169 points by nathanwallace on Oct 6, 2023 | hide | past | favorite | 32 comments



This was actually a worthwhile read, sharing the section titles in case anyone is wondering what they are getting into:

  Lesson 1: Respond instantly
  Lesson 2: Early communication is essential
  Lesson 3: Treat contributors like team members
  Lesson 4: Age reports keep you honest
  Lesson 5: Burndown is a must
  Lesson 6: Consistency is the name of the game
  Lesson 7: Documentation is king


> Lesson 4: Age reports keep you honest

> Long-forgotten issues and PRs sap enthusiasm and hinder progress.

There are a truly massive number of open source projects that would benefit from this.

Mozilla is pretty much the poster child for "major issue that usually annoys the hell out of users who come across it, for a decade or more." Firefox and Thunderbird are littered with bugs that are half a decade to sometimes as long as two decades old. There are something like 40,000+ verified bugs in Firefox core.

Instead of those bugs getting fixed, we get shit nobody asked for like massive UI overhauls that everyone hates, integration of SaaS shit like Pocket, secretive data-collecting force-installed plugins for a media conglomerate's TV show (!), and so on.

Maybe they could free up some resources by firing some of the numerous product and project managers, while also moving their offices to places other than "the most expensive zipcodes in the most expensive real estate markets all over the world", while also trimming the CEO's pay, which has gone up even as market share has plunged; she's failing upwards.


It could be worse. They could be automatically closing all issues that they havent solved after 6 months like many projects on GitHub do (VS Code is particularly bad for this one).


This is mostly why I have stopped reporting bugs. A good bug report takes a lot of effort to write. Seeing it automatically closed after a few months due to lack of interest is a waste of my precious time.


Based on what are you expecting bug fixes for free from Mozilla? None of us is paying Mozilla for developing in any particular direction.

To see the bugs fixed that bother you, contribute to the open source project? Can be code or paying a developer.

see also http://antirez.com/news/129 after "Flood effect" on some discussion on why this does not scale to open source projects.


If they have enough resources to foist unwanted ui overhauls against the wishes of the user, they have enough resources to fix their bugs. We could also begin by firing the worse than useless but rather sabotaging CEO whose first actions upon joining were upping her pay by a few millions and firing 250 employees, and whose pay keeps rising everyday.


> Based on what are you expecting bug fixes for free from Mozilla?

Because they want people to use the browser? Seems pretty straightforward.

Anyway they're not a charity; stop treating them like one. Mozilla is a company with a product and should act accordingly.


What addons are you referring to?



I was skeptical when the author implied automation was the solution.

But actually, the given solutions are mostly communication-focused, and the automation is to aid in that.

Good read.


The examples they give on how this worked for them in practice are also great to follow.


100% - automation assists us to stay on top of it. But, human connection is the key! (In hindsight, our subhead text "automation with a human touch" got it the wrong way around.)


A question for people here saying "use a monorepo", and coming from a different direction than that of the article. Say I want to use a monorepo for all code I write for personal use and development, but I'll have a folder with dozens of projects cloned from github often with small tweaks and custom branches. Is the solution submodules? Saving patches or just code-snippets in the monorepo and keeping the random misc repos isolated? Hardlink specific files of interest?


I prefer storing the source code of third-party dependencies directly in the monorepo. Treat third-party code as your code, since it is used in your project in the same way as your code. It may contain bugs and security issues, which is easier to spot and fix if the source code is available in your repository. This also prevents you from disaster when third-party code becomes unavailable for some reason (missing internet connection, temporary outage at the third-party code hosting, permanemt deletion or corruption of the third-party code).

Apply custom patches third-party code in the same way as you apply patches to your code. Prefer posting these patches in the upstream project you depend on - this will improve the quality of the upstream prpject and also will reduce the amounts of custom patches on your side.

The hardest part is to update heavily patched third-party code to new releases with significant changes. Fortunately, this happens rarely in real life. Bugfix and security releases for third-party dependencies are usually easy to apply even to heavily patched code in your repository.

Real-life example: we at VictoriaMetrics store all the code for third-party deps inside "vendor" directory. This is very easy because Go provides easy to use tooling for this - just run `go mod vendor` and that's it! It is also very easy to upgrade third-party deps via editing "go.mod" file with subsequent `go mod tidy && go mod vendor` command. It is easy to inspect changes to third-party code for security issues and bugs after this command with `git diff` before commiting them in our repository.


If the projects were my own, I'd consider a monorepo. We use this approach for Steampipe samples - https://github.com/turbot/steampipe-samples

If it's a collection of changes, small improvements, etc to existing projects and repos then personally I'd go for separate forked repos. Then you can track your changes relative to the original project source code and (hopefully) contribute back PRs etc more easily.

As always - there are pros & cons to both - just a matter of choosing the approach that feels best 51% of the time :-). Of course, it's minor in general compared to the value of just keeping on moving on your projects and work!


For a C++ or similar project I would recommend fetching and building the depended-on repos using CMake's ExternalProject feature, with patches that your version in your main repo. I would never recommend submodules.


Steampipes is amazing. It is a light in the darkness to anyone taking on enterprise compliance reporting. The growth of the platform laterally WRT plugins and features while somehow making cross-org/multi-account even easier for end-users on top of building a cloud-hosted solution for teams is a testament to everything in this article.


Glad you are enjoying Steampipe and thanks for the kind words :-). If you have any suggestions for where you'd like to see the platform go please let us know!


Another question worth asking is whether 200 repos are worth it. Wouldn't it not be more efficient for everyone involved to merge them all in a single mono-repo?

That said, the tips is that article are equally valid for single repositories.


Monorepos have advantages and we considered that approach - easier mgmt, linear change timeline, consolidation of GitHub stars, etc. But on balance we felt our plugin and mod model benefited more from the separation - specific audience, clearer release history, simpler clone & install (mods), encourages contribution of plugins via repo ownership. Cross-repo tooling helps to offset the negatives of managing many repos, and as you say, the lessons we shared come also from managing our larger repos (e.g. the Steampipe CLI).


Using 200 repos is just creating unnecessary problems. Use a monorepo.

That said, all the solutions are applicable irrespective of how many repos you have. First I'd heard of an "age report" too. I like that idea. It's too easy for bugs to be marked low priority and then never get fixed.


High priority tickets are naturally noisy and fixed quickly for impact reasons. Age reports create a good balance by gradually increasing the "noise" of all tickets regardless of priority.


The blog post contains very valuable advices!

I'd recommend also shrinking the number of repositories - this can significantly reduce the amounts of efforts needed to sync changes, which span multiple repositories. Ideally there should be a monorepo, where any change can be put in a single atomic commit, even if it touches the core part of the code, which is used in thousands places of the rest of the code.

Another recommendation is to keep user docs in the same repository as the code - then docs can be updated synchronously with the code in a single commit.

We at VictoriaMetrics are following most of the recommendations from the blog post, plus we try minimizing the number of repositories in order to reduce maintenance burden as described above.


Keeping docs with the code is definitely helpful, we've found that works well for our plugins and mods. All pull requests are then reviewed for consistency and documentation at the same time [1] - e.g. does this new table have good column definitions and clear examples?

1 - https://steampipe.io/docs/develop/plugin-release-checklist


A little late to the reply here but I'm a volunteer for openlibrary.org and I think they'd benefit from trying some of these things so I might share it with you.

What tool are you using to make the age reports? How do you determine if it's waiting on the submitter or the reviewer? Is it based on last comment? or requesting a review, etc?

There once was a tool similar to this that github bought and it was quite nice even for a small internal team but I'd really like to know what works well at your scale.

Cheers!


We use our open source project Steampipe [1] to query data using the GitHub plugin [2] and mods [3]. We have also opened up the custom mod we built for the specific charts & reports [4].

I hope they help - let us know if you give them a try!

1 - https://github.com/turbot/steampipe 2 - https://hub.steampipe.io/plugins/turbot/github 3 - https://hub.steampipe.io/mods?q=github 4 - https://github.com/turbot/steampipe-mod-community-tracker


So there is GitHub Scheduled Reminders: https://docs.github.com/en/organizations/managing-organizati... , which has some serious limitations it seems (GitHub will only trigger reminders for up to five repositories per owner and 20 pull requests per repository. Reminders are not sent when changes are merged from upstream into a fork.).


As a maintainer I feel like the first tip is not the best approach sometimes. People tend to raise a lot of “I got a problem” issues (well, we all do sometimes) which they solve themselves in a short time (e.g. a misconfiguration). Responding instantly to such sounds inefficient.


Fair point! Rapid response is definitely valuable for thoughtful issues or PRs. For "questions" we try to just ask for more clarifying details or do a brief pointer to documentation. I find it can also be useful to deliberately slow down responses when getting caught in a low value conversation - kinda like rate limiting undesirable traffic to your website <grin>.


Curious what tool(s) you use to format sql?


Most of our team just use their text editor to do formatter via plugins. I also find this online SQL formatter to be useful - https://extendsclass.com/sql-formatter.html


In addition to this communication guidance, I prefer when issues are disabled entirely. I would like it to be the default.

I am also looking forward to seeing LLMs applied to this space.

Plenty of problems could be solved with some simple research and understanding of the code in a project. If that could act as a barrier between those with issues and those who contribute, I imagine it would help a lot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: