- committing a commit hash from a foreign repository in a text file (.gitmodules), along with
- some convenience tooling to check out that repository's tree at that commit in a subdirectory.
It is useful exactly when you want both of those things at once. Wanting the first is common enough, it's wanting the second that's more rare.
For example, you wouldn't submodule a Rust dependency, because you get what you want by committing the hash to a text file (Cargo.toml), and tooling to check it out in a subdirectory of your project gets you nothing.
I think the error the Git project made with submodules is trying to make them transparent, i.e., allowing you to use Git inside the submodule checkout. This is basically never a good idea; it is understandable that people get confused trying it (the thing they are trying to do is inherently confusing; if you have already been traumatized by submodules, imagine doing it with Cargo.toml or similar - it would be a mess!).
You probably don't want submodules. But they're useful, and not at all broken or poorly specified.
I find this to be one of the most useful features of git submodules. I can easily just cd into the module and do whatever related work I need to do there and then commit to both projects when I'm ready.
Same here. Like I said above [below?], I generally despise submodules, but that is one thing that makes them easier to deal with, than packages.
For example, in the project that I'm working on now, I have refined the app's "business logic" to an SPM module that I integrate, through GitHub.
The idea is that I don't change the logic, and I enforce that, by using it as an SPM module.
However, every now and then, I encounter a bug in this module, or I find the need to add something to its API. When that happens, I need to exclude the module from the dependency list in the project, include it directly, as a local link, and work on it that way. Committing is a separate task from the main app (Xcode allows aggregate commits, but Xcode's Git integration is so poor, that I don't use it).
It actually works well. A submodule would also work well, with more Git integration, but the drawbacks far outweigh the advantages.
I've had it work fine on large teams. The rule for me is that if contributors to the repo are as likely to change the submodule as the repo itself to resolve any given issue. More succinctly, submodules are for internal dependencies we might change. The reason being that with some package managers and languages doing small incremental changes to a dependency as you try to resolve an issue is a lot of extra typing vs it just being in a submodule.
What confuses me is that people complain so much about git submodules, but I can't recall ever hearing complaints about svn externals. They're pretty much the same thing.
Co-workers who have moved from svn to git have had no trouble understanding and working with git submodules.
I usually do this when I'm developing one project that has a third-party dependency which either doesn't have a release, or I need to make modifications. That way I can work on the dependency and easily keep track of what version is used in the parent project at any point in history.
I don't think I would agree with that statement. I would say that both are actually what most people want when using submodules, otherwise you would just vendor your dependency.
I would also agree with OP that submodules are horrible. Source: I have to deal with them everyday in my company's repo and they cause issues pretty much every time I pull, switch branches, merge/rebase, etc.
IMO, one of the MAIN issues with submodules is that there isn't an actual FILE, checked in the repository, and understood by Git, that lists the hash of all the submodules used. So diffs and PRs on Github, git, etc. don't show the actual changes in a way that we're already used to with files.
I have a convenience project that sets up several others repositories for ease of development and the things you call rare/a bad idea are perfect for this use case. It's working very well so far.
For that use case, how often do you want the other repositories to be fixed to some particular hash, however? At least every time I've wanted such multi-clone behaviour I've wanted to have all the repositories on HEAD of some branch, and very rarely some fixed commit.
I do generally want a fixed commit. I often want to improve the code in the submodule possibly (voluntarily or accidently) introducing breaking changes without worrying about breaking repos that contain it as submodule.
I’ve used submodules (a lot), in the past, but I hate them.
I feel as if they are a “half-baked” solution. In order to be useful, they’d need to be much more tightly integrated into Git. Instead, they are sort of “duct taped” to the outside.
But they are an ironclad way to ensure that you have an exact version of a foreign repo. It’s actually a pain stay at the head.
I like using package managers, but they can be quite dangerous, so I tend to write my own packages.
What was really cool, and possibly the only good thing that VSS ever did, was the ability to create “aggregate symbolic repos.” I think I remember how it worked (been a long time):
You could define a repo that was composed of “symlinks” to files from various other repos. Submitting a change into the repo would submit that change into the portion of the other repo.
Perforce had the concept of workspaces. I don’t think you could integrate other repos into them, but you could define a workspace to be a “mask” over your repo, so it would only integrate the files you want. That could be useful to me, as my testing code usually dwarfs my implementation code. Many of my packages[0] consist of a single Swift file, but the test harness might be an entire app.
I'm curious what you mean about obfuscating provenance. Most package managers download the package from a specific source and will allow you to store the hash of that package in your repository.
The whole deal with packages, is to make it really, really easy to discover and integrate them, without having to worry about where they came from, or who has had their fingers in the pie. It's [theoretically] possible to find out, but I have never met anyone that admits to vetting their dependencies in anything near complete fashion. Most look at "buzz" around the package, and at how many stars it has.
That's wonderful. So is going out clubbing, and "getting to know" a whole bunch of different folks that you meet randomly.
Both can have unfortunate side effects.
It's entirely possible to do so safely, but Christian coffeehouses might not be your idea of a good time on Saturday night.
But nothing is perfect, and a dedicated blackhat can leverage just about anything.
It is a great way to tell if someone knows what they're doing with git. Even joking about deleting the repository and recloning is enough to let me know. Unfortunately this includes 99% of people I've ever worked with.
I had awful problems with git submodules in the Ansible repo. When I wanted to change branch, I couldn’t use `git checkout` as usual: I had to blow away the submodules, switch branch, then reinitialize them again. Appalling failure to leave submodules so unfinished that branching doesn’t work properly any more.
I haven't had a problem in general with switching branches and submodules. You just change branches and then `git submodule update`. Or `git checkout --recurse-submodules`. You can also set this as the default behavior so that checkout automatically updates submodules.
I think the worst problems happened when switching between branches where the same directory changed between being a submodule or being part of the parent repo.
Having to set a configuration option to make submodules work is another example of the feature being unfinished.
I had that command set previously, and OMG it made every checkout slow as hell! There's like a dozen submodules in the repo I work in, it's a nightmare.
By read-only dependency do you mean that you’re not a developer for those repositories? What if you do develop a library and then want to use it in an application?
What is being meant (i presume) is that even if you are a developer of those repos, do not edit them within the host repo. Work on them separately, as if you were an independent developer, and then bump their revision as a submodule - the same you do with bumping a dependency version in your Makefile/package.json etc.
I assume read-only from the perspective of the dependent. Any fixes belong in the modules upstream repository, then pulled in to the dependent once pushed.
Everyone has to learn what they can do and what to avoid.
We use the to build our whole system out of one commit, although we have several repos. We made the artificial rule that a commit that updates a submodule must not contain any other changes. It has reduced the number of problems especially related to rebasing.
I don't trust GP, I deal with them every day and they are a nuisance every day. Perhaps they are not working in these submodules and just set them to a hash once and never had to make changes there. Then sure, but as soon as you're working actively in these submodules it's mayhem.
> Yes. And buy-in doesn't necessarily mean that people will love the chosen tool. Just that they understand and accept the compromises made.
I think we are saying the same thing in different words.
I don't hate most things.
I don't have an opinion on most things.
I am pretty indifferent about those things.
I don't hate git submodules.
However, I think we should at least hear out the concerns of people who hate git submodules.
If you still want to use git submodules, then at least you've made an educated decision.
How would you quantify impact of something like that? Using a tool like Jellyfish, linearB, Adadot etc or just hope people would see enough difference to justify investment?
I found Git submodules excellent for certain use cases.
Here are some real world examples:
- I have some software I’ve written that runs on a single board computer. Inside of this repository I have a sub module pointing at the commit from which the Linux image is built, that my software runs on. This is extremely useful because if I ever have to rebuild the Linux image there is never any doubt about which version of it that I have been running my software on. So I can rebuild the Linux image from the same commit as before, and then add my software to it. In the future I might also automate more steps and again then I will get even more benefits from this. But already today it is hugely useful for me to have a sub module like this.
- I generated some Rust code for the types of a third-party API that we use. I generated this code from their OpenAPI spec which they have in a git repo. I added their git repo as a sub module in our repo and committed the generated code alongside the sub module pointing at the commit from which the code was generated. Perfect!
Git sub modules are super useful for several things. They can be a bit confusing and frustrating at times. But there are ways like what I mention above where they are perfect to use.
I've found that vcstool[0] is the better solution for your first example and have been using it extensively both in personal and company projects without any issues, except on occasion forgetting to commit some subpackage changes as git sometimes doesn't indicate untracked changes correctly.
The principle is similar but with explicit repos and branches defined in a config file that can then be pulled or cloned as one.
In practice it is POSIX-only, which is not viable in an organization where Windows needs to be a first class development and deployment platform. Running under cygwin or WSL is a non-starter for a number of reasons (I've done that myself for tooling in the past, after which I would never impose it at an organizational level).
git subrepo is what I chose to include a "core" library of functionality in a number of repositories.
Works fairly well, the main thing is that git treats all the files in the subrepo the same as any other files, so no surprises.
> Provide an ad-hoc in-tree script to download the dependency
> Yes, really, git submodule is worse than ad-hoc Makefile runes
Please don't. Submodules are much better.
Projects using submodules and standard build tools are easy to build. They're easier to customize. They can be packaged downstream with few or no patching. On the contrary, projects using ad-hoc scripts are often hard to build, have no convenient methods of customization, and resists downstream packaging.
That's because with the former approach, build descriptions are standardized and (often) declarative. Standardized build descriptions are easier to build and customize because they're predictable and provide uniform methods to configure builds. Declarative build descriptions are easier to customize because they only specify the end state. It doesn't matter what changes you make to the build process as long as it doesn't create conflicts with the described end state.
When building a project, submodules can be viewed as declarative way of specifying dependencies because it only specifies a Git URL, a commit hash, and a target subdirectory. It doesn't really matter how you fetch them. This is convenient for things like package managers because it allows them to fetch dependencies beforehand. With ad-hoc scripts, dependencies are specified imperatively. Or as the article puts it, the script is "in precise control of when/whether the download occurs." If package managers want to prefetch dependencies, it would have to patch that part of the script out.
The concept is not the problem. The submodule's problem is it has the most suck ui ever. The default 'git clone' don't ever setup the submodule. The checkout, rebase... etc works half of time. And some left submodule intact unless you add some weird parameters. It's pain in the ass even you are not the one writing submodule but just the one using it.
In practice, a lot of repo add shell scripts just to… setup and update the submodule. This should be git's own task to do. But it really isn't done well.
It will be much better if git just treat submodule as a readonly subdirectory and force sync everything unless I tell it I want to edit it.
Package managers on the other end don't give you such pain. A 'npm install' will just correct every dependency to proper state without bother you about anything.
Looks like a rant from someone never bothered to add submodule.recurse=true to their git config.
Git requires knowledge and manual configuration. It is a low-level tool that is not user-friendly. Nor it is expected to be. See, early on, git was a toolkit for building VCSs, so it historically contains many plumbing utils not designed to be used as-is.
No, god, no! I see where the author is coming from, and I'll give him that submodules aren't a well-designed or well-implemented feature, but dear lord, subtrees are so much worse...
I work on a project that uses them. In my particular case, the "genius" who set it up decided to create this kind of setup: a "framework" part, and a bunch of subtrees which are taken from a separate repository having several branches for specific versions of the program, and each version is made into a separate subtree in the framework repo. This leads to immense duplication of commits, totally worthless history, no ability to go back without a humongous effort... It's the worst Git repo I've seen in my life. And it was created by someone with a decent knowledge of Git, which, unfortunately, didn't translate into making useful things...
One of the most pernicious problems with submodules is way that anyone who finds them problematic thinks its their own fault, and people who have finally figured out how they work (sort of) are so dang pleased with themselves they are now fully indoctrinated into submodule cognitive dissonance. Meanwhile git submodules themselves are leaving massive footguns all over the place, are an absolute nightmare during complex merges and are absolutely terrible for code transparency.
What the "security" team doesn't realize is by blocking actions in this way they've opened up a wider hole.
If a third party action had a problem it can be blocked centrally, instead these modules now live scattered throughout the code base relying on individual development teams to keep them updated.
Those teams will not always be fully funded to keep an eye on every vulnerability in every module and keep everything up to date the same way a centrally managed dependency system can.
The only thing I dislike about submodules is that older repos have remotes that no longer exist. So I'm left with an incomplete clone because I can't find the submodules anywhere. Otherwise I don't mind them and will continue to use them.
In any case, this is no longer an issue with GitHub monopoly.
Submodules are amazing until you start using them.
The whole idea on paper looks brilliant, but the pitfalls are very painful.
Perhaps if you're working solo on a project it's OK. Part of the problem is that developers aren't used to submodules and it behaves in a way you don't envisage.
> Part of the problem is that developers aren't used to submodules and it behaves in a way you don't envisage.
I would say you correctly identify the problem, but draw the wrong conclusion.
Developers should be able to work with the right tools for the right problems. I've used git submodules in teams, professionally, for more than a decade. Neither I nor my colleagues have ever had a problem with them except when used without the proper understanding.
The solution isn't to replace the tool, but teach developers how to use it.
> The solution isn't to replace the tool, but teach developers how to use it.
Or to improve the tool to have it teach developers and be more helpful in case of common errors. I think in many non-submodule parts git has improved on that front, though that has never been its strength.
Yeah, it works. You just have to remember, if it doesn't, in doubt use "git submodule update", or if this doesn't cut it use "git submodule update --init --recursive". Or maybe you should have run "git submodule sync" first? :)
However I agree with others in this thread that it's easier to develop a bunch of modules that are used in a top-level project. Just change to the subdirectories, change the files, then when you are done commit the subprojects then commit the submodules commit ids to the parent project.
I occasionally (less than once a month) work on the rust project. They probably rightfully use submodules to pull in llvm and cargo and other dependencies. However, every time I change branch or rebase, there's some kind if conflict or breakage caused by the submodules.
Just last week I rebased and the cargo submodule was changed to dirty? I didn't touch the folder. I deleted the folder and reran my submodule init/updated dance to no avail.
I had to fix it by going into that folder and running `git restore .` because somehow everything got removed?
So yeah. I don't like submodules. They probably make sense but they are not at all intuitive if they are so fragile like that
I too loathe git submodules (which I have to use daily).
But there is one case where they're not too terrible: It's where you have some kind of "meta project" that needs to coordinate other large projects together. An example is where you need to combine specific versions of the Linux kernel, glibc and gcc (eg that you have tested and know work together). A git project with one submodule pinned to the tested commit of each of Linux/glibc/gcc seems to work well. You can, for instance, test a new combination of submodules together and if they work push an atomic commit to update them all together.
I don't think any of these arguments are good enough to not simply put git submodules as yet another option to consider. Probably not your first choice but it could be a perfectly acceptable one (think components).
Git subtrees are definitely not a replacement for what submodules can do. It's actually closer to monorepo.
Package systems are absolutely more robust but could be more painful depending on context.
Git submodules, if correctly understood (which is not trivial, even in git terms) can be just the right tool for a certain scenario.
Qt has used them quite well but it's definitely not as friendly to newcomers as it is to "core Devs"
They definitely have flaws, and as usual Git goes out of its way to make the UX extra awful (why isn't --recursive the default??!). But they are pretty great for third party dependencies that you might need to patch a bit, especially if you can't use something like Cargo (or don't want to set up a private registry etc.)
But git subtree has massive flaws too. You squash by default so you lose all git blame support. It's way more janky to pull from remotes and submit patches etc. You have to manually ensure you separate changes to the subtree out into different commits.
Honestly I think no existing solution is very good. A monorepo plus submodules for external dependencies is probably the best option at the moment.
Really Git could do a lot to make submodules better, but I guess - like LFS - if the kernel developers don't need it then screw you we're not putting any effort into it.
1) define somewhere in the repo that it's a "compound repo", a "workspace" etc - name is arbitrary, This is our repo "R",
2) for certain paths in the repo, mark those paths as an aliases to other repos identified by a repository URL/path. These our "r" repos.
3) for every git command executed inside repo "R", run appropriate commands in the background for each "r" repo only if repo "r" was affected by changes initiated in repo "R".
4) If you made changes to repo "r" directly and then returned to "R", after "git pull" you should see nothing else than standard git diffs, conflicts etc. You should not run anything like "sync"/"refresh" etc. Only git pull/rebase/merge etc.
5) Commit in repo "R" which is only responsible for bumping repos "r" should be handled by git submodule system transparently for the user of "R". I'm not a git expert to tell what kind of commit should be used here. Any ideas? You should commit seeing diffs of course, not some commits hashes.
5) THAT'S IT.
Everything should work recursively, e.i. you should be able to do 10 layers of "r" repos. Each n-th "r" repo acts as "R" repo for n+1 level repo. Ten fold commit should work like transaction, e.g. if any of layers between 1 and 10 has failing precommit hook - the whole operation should fail.
This is timely, as I used submodules years ago and got very confused with them.
But now I have an Ask HN:
I'm building a ruby gem (package) that will be shared across a few of my applications. The apps are separate and proprietary, but I want to open source the gem, so a monorepo doesn't fit.
When the gem is stable, I can just publish it and then reference it in my Gemfile like any other gem - rebundling every time there's a new release.
But at the moment, I'm actively developing the gem and making frequent changes to the API - so I want to reference it directly from the container app and edit both together as different edge cases arise.
The standard Ruby on Rails way to do this is to stick it in /lib or /vendor - then work on it and once done, extract it and publish the gem. But I'm working on several apps that share this gem and will all influence how it works. So I don't want multiple vendored copies scattered across different source trees. And I don't want to be rebundling the app every time I make a minor change to the gem - so referencing the git repo is out.
On paper a submodule sounds like the perfect solution - main app is a repo, with a submodule repo in vendor. I can make changes to both, then push them separately to their own repos. If I start working on app 2, I just fetch the submodule to get the latest version of the gem without having to rebundle everything.
As it's just me working on this set of code at the moment, are submodules a good fit?
Since it is mainly for ease of development for your proprietary apps, you could immediately create/develop the gem in a separate repo and create a symbolic link to your apps vendor folders.
I just clone the "submodule" inside the main repo, then use the aliases above to rename the .git folder to .gitbox and save the current remote, branch, and HEAD commit to a file. The .gitbox folder is ignored.
If I want to update the submodule or commit changes on it upstream, I just use the unbox alias to get the .git folder back, cd into it and pull/commit/push/whatever as normal, then use the box alias again.
Since the folder is no longer named .git it doesn't interfere with the main repo and you can use the main repo as normal: your team doesn't have to know about weird submodule commands or fight the damn thing to do normal stuff which is important if it was hard to get them into git in the first place.
Plus you can make changes to the submodules in your repo without committing them upstream (or having to create another repo to commit them to) since the submodule contents themselves are committed to your main repo. You also won't lose the files if the submodule remote dies.
I don't have many of them so I don't have any automation to recreate the .gitbox folders from the .gitboxinfo files if necessary but it wouldn't be hard to make.
I've had too many issues trying to (recursively) reset all submodules after switching branches and cloning. Once submodules get in a limbo state it's a massive pain to get back to normal.
Android has similar tool called 'repo': https://gerrit.googlesource.com/git-repo . I have seen it used outside Android and I think it's mostly ok (Definitely better than git submodules). My main beef with it is that it's impossible to google anything related to the tool. Some people do seem to hate it though.
I wonder has anybody used this 'west' outside of Zephry?
We're using submodules to share internal OpenAPI specification files between repositories. These are used for code generation and automated testing of APIs.
It requires some manual checking that branches and commits are up-to-date.
Does anyone have any recommendations on how to do this with a multi-repo setup?
It sounds like the author of this article is trying to use git submodules to replace a package manager. Of course you'll get annoyed with git submodules if you have something like cargo available. Unfortunately, there are languages (C and C++) where package management sucks and git submodules are a godsend.
> Provide an ad-hoc in-tree script to download the dependency
Add this to the list of things I hate. The last thing I want to do when adding a library as a dependency is figure out that it requires Python to download some source, then uses a makefile to build everything, and has a "nice" bash script to tie everything together. If I see that crap, I'm not using your library. This (contrary to what the developer may be thinking) does not make it easier to integrate the library into my project. It makes it so so much more difficult.
Please. Don't.
> Yes, really, git submodule is worse than ad-hoc Makefile runes
Please don't make monstrous makefiles that do a million and 1 things and do them all terribly and don't have any modicum of support for cross platform functionalities. This is terrible advice.
If your language has a package manager, use it. But in my experience, with a language like C or C++ git submodules are the perfect tool for the job and definitely more preferable than hacked together makefiles that do everything except build the source.
Tldr; this sounds like a blog post from somebody that tried to use submodules to replace a package manager. If you have a package manager, use it. Otherwise, thank God for submodules.
In general you can't just use a package manager in many cases. Or you can, but it'll be even more painful.
Imagine that you're working in some sort of mono repo, except that some internal libraries are submodules. Then you'll often find yourself working outside and inside submodules to test changes. You can't do that easily with package manager.
That being said, I believe it should be possible to augment (some) package managers to handle local path only when in dev mode.
- committing a commit hash from a foreign repository in a text file (.gitmodules), along with
- some convenience tooling to check out that repository's tree at that commit in a subdirectory.
It is useful exactly when you want both of those things at once. Wanting the first is common enough, it's wanting the second that's more rare.
For example, you wouldn't submodule a Rust dependency, because you get what you want by committing the hash to a text file (Cargo.toml), and tooling to check it out in a subdirectory of your project gets you nothing.
I think the error the Git project made with submodules is trying to make them transparent, i.e., allowing you to use Git inside the submodule checkout. This is basically never a good idea; it is understandable that people get confused trying it (the thing they are trying to do is inherently confusing; if you have already been traumatized by submodules, imagine doing it with Cargo.toml or similar - it would be a mess!).
You probably don't want submodules. But they're useful, and not at all broken or poorly specified.