Nice to see this posted here. I switched over to it about 2-3 weeks ago, and I h...

TylerE · on Aug 1, 2023

What happens if you accidentally save a file with some sort of secret that gets sucked in?

Certhas · on Aug 1, 2023

Isn't the idea that you continue editing the working copy commit until you actually commit it?

Also from the documentation:

https://github.com/martinvonz/jj/blob/main/docs/git-comparis...

"As a Git power-user, you may think that you need the power of the index to commit only part of the working copy. However, Jujutsu provides commands for more directly achieving most use cases you're used to using Git's index for. For example, to create a commit from part of the changes in the working copy, you might be used to using git add -p; git commit. With Jujutsu, you'd instead use jj split to split the working-copy commit into two commits. To add more changes into the parent commit, which you might normally use git add -p; git commit --amend for, you can instead use jj squash -i to choose which changes to move into the parent commit."

KolmogorovComp · on Aug 1, 2023

Sometimes you have changes that are permanent to your repo (ie local workflow), that you always want to keep locally, but never push to the remote.

In git you would always leave the changes unstage, does that mean with jj you would always have to remove them before pushing? I haven’t found an answer on the linked page.

Side note: I really wish git had a way to mark commit has ‘no-push’ so they never leave your local copy, as an upgrade of the unstaged workflow.

pjc50 · on Aug 1, 2023

> Sometimes you have changes that are permanent to your repo (ie local workflow), that you always want to keep locally, but never push to the remote.

I appreciate that there are times when this has to be in the middle of an otherwise committed file, but it's worth avoiding that if at all possible and putting the locally-changed bit in a different file because, as others have pointed out, this is error prone. It feels like the equivalent of keeping your important files in the recycle bin because it's easily accessible.

For 90% of git users 90% of the time, the staging area is an inconvenience that adds an extra step.

jstanley · on Aug 1, 2023

> In git you would always leave the changes unstaged

I do this too, but I quite frequently forget that I've done it and "git commit -am" and end up pushing my private changes anyway.

masukomi · on Aug 1, 2023

tell git to treat it as if it's unchanged

i have these 2 aliases assume = update-index --skip-worktree unassume = update-index --no-skip-worktree

"assume" as in "assume it's unchanged / not wanted"

which lets me say "git assume path/to/file" and then "unassume" it when/if i want to commit it.

mattmanser · on Aug 1, 2023

Someone else's warning to not do this:

https://news.ycombinator.com/item?id=36954723

MrJohz · on Aug 1, 2023

Ooh, that's me! As in the comment, please don't do this.

It's been a while since I worked at a place doing this, and I'm on my phone, so some details may be fuzzy or wrong, but when I was figuring all this out, I remember this SO comment being really helpful:

https://stackoverflow.com/a/23806990

Basically, there's two different APIs, and neither of them are designed for ignoring config files (but both of them happen to do things that look like ignoring config files, therefore get misused).

`assume-unchanged` is an optimisation tool that tells git that it shouldn't both checking files and folders that are expensive to check but never changed. If changes do ever happen, and git realises this, then git will remove the assume-unchanged flag - because clearly it wasn't true!

`skip-worktree` is a more aggressive version of this that tells git never to touch certain files or folders at all. But now it changes do happen to those files, it's not clear what git should do. Overwriting the files would cause data loss, but ignoring the files completely means that you miss out on upstream changes. And because you're telling git that these files should never be checked, there's no good way to handle merging and conflicts.

What typically happens with both of these flags is that they work well about 80% of the time, and then cause a lot of confusion that last 20% of the time.

The alternative is almost always some sort of configuration file that is directly referenced in .gitignore, and that therefore never gets checked in. (In addition, it's often useful to have a (e.g.) config.json.default file for getting new developers up and running quickly.) Any system that needs to be configured (database connections, API URLs, IP addresses, etc) should use this configuration file. Alternatively, I find environment variables, along with .env files for local development, to be really effective, because most things can already be configured that way.

This typically takes sightly longer to set up the first time, but will work very reliably from then on.

See also these changes in git's official documentation, and the commit message that explains why it was necessary:

https://github.com/git/git/commit/1b13e9032f039c8cdb1994dd09...

IlliOnato · on Aug 1, 2023

For a long time I have been wishing that Git had a capability of "track this file locally, but don't transfer it when pushing or cloning". Such files would be listed using the same syntax as .gitignore, but in a different file, say .gitlocal or something.

Git kind of has some "local tracking" already: if I am not mistaken, lightweight tags behave like this. It would be cool if it could track files in the same way.

Under the hood it could be done via a separate database, stored not in .git but in .gitl directory or some such. The database in .git would behave as if both .gitignore and .gitlocal contribute to ignoring, and .gitl as if .gitlocal was .gitignore in reverse: it would ignore anything not covered in .gitlocal (and also ignore what's in .gitignore). Or something along these lines.

mdaniel · on Aug 1, 2023

this may not help you, but JetBrains IDEs have "local history" which addresses many of those concerns, including applying tags to meaningful points in time: https://www.jetbrains.com/help/idea/local-history.html

TylerE · on Aug 1, 2023

That has saved me so much time over the years. Makes quick experimentation totally painless.

kasdi · on Aug 2, 2023

A safer way to keep temporary changes is to create a new local-only branch with a commit of those temporary changes. Then, whenever you need them, you rebase the branch on top of your working branch.

Usually these temporary changes are to config files, so rebasing shouldn’t create conflicts if you don’t otherwise modify the config file.

An alternative is to stash only those temporary changes but I find branches to be easier to work with.

vxNsr · on Aug 1, 2023

Isn’t this just .gitignore? I feel like I’m missing something.

montroser · on Aug 1, 2023

That won't work for local changes to files that are legitimately committed in the repo. Like if you need to set your local database, or your own dev hostname in config, for example.

MrJohz · on Aug 1, 2023

In my experience, there is almost always a simpler way to handle these cases that involves being able to put a file in gitignore. This is one of the big advantages of environment variables, for example - they can typically be stored in a .env file (with a .env.default file checked in for new developers to copy from), and loaded dynamically in different configuration systems.

If my local setup requires me to ignore changes in checked-in files, I usually find that I need to handle configuration more cleanly.

(I did work on a project that made use of git update-index - this was a terrible mistake and caused pain every time we needed to update the config file in the repository. Please never go down this route!)

ljm · on Aug 1, 2023

dotenv and direnv are more than enough for this task, I think, but you can make it as complicated as you care to with similar tooling that pulls secrets/config from the cloud (Doppler, parameter store, etc.)

Most places I've worked at have created tooling that more or less merges the two - sane defaults and non-sensitive values go into a `.env` file of some kind (.env, .env.development, whatever), and then a tool merges a subset of that config from a remote store which contains the stuff you don't want committed in the repo.

Usually used for connecting to a remote dev or staging instance if you can't or don't want to run the entire stack locally.

steezeburger · on Aug 1, 2023

This is a smell that you should refactor how your configuration is done. What happens when there's a legit change to the config? If it's in git with `update-index`, you're going to have a hard time actually changing that file and getting the changes to the team. There are other reasons, but this is why things like 12 Factor recommend putting config in environment variables. It's made my life much easier. https://12factor.net/config

bmacho · on Aug 1, 2023

Why wouldn't it work? You gitignore that file, and your modification is ignored, even if it is committed? Or will git delete that file from everyone?

Groxx · on Aug 1, 2023

gitignores are usually committed (they're treated like a normal file), so yes, in this context that would delete it from everyone.

There's .git/info/exclude, but that has some kinda large surprises if it excludes a tracked file and I don't recommend anyone use it unless they know what to look for and can always remember what they've excluded.

sanderjd · on Aug 1, 2023

I have a $HOME/.gitignore that I use for this. (You can configure git to use that globally.) It's not a panacea, and I think other commenters are right that you should instead endeavor to organize things so that the project's own gitignore results in a sane workflow. But I think having permanently unstaged changes is worse.

happymellon · on Aug 1, 2023

It's the perfect location to ignore *.log.

Git will fight you if you ignore .config files which are actually used.

thegeomaster · on Aug 1, 2023

For this, you can use .git/info/exclude.

dataflow · on Aug 1, 2023

That only works for nonexistent files right? Not files that exist but whose modifications you want to ignore.

majkinetor · on Aug 1, 2023

Git has a feature for this, called clean/smudge.

Certhas · on Aug 1, 2023

So in your workflow you never "git commit -a"? So you have to always manually mark what you stage. Which is probably more work than always manually removing the changes you don't want to commit.

The ability to rewrite older commits easily in jj also looks like it would help with this usecase if you get it wrong once.

Concretely I think you would do is: Instead of staging part of your changes and then committing as in git, you would call jj split and split the commit into the part you want to keep local and the part you want to push. This way the local changes always stay in your working copy commit.

Even better, just commit the local changes once when you start. Work locally and before you push you call jj diffedit on your initial commit of the local changes and remove them. Now all the work you did since then will be automatically rebased on the edited initial commit and you can now push up. Instead of excluding your local edits every single time you just have to do it once before pushing.

guilherme-puida · on Aug 1, 2023

> So in your workflow you never "git commit -a"?

I like seeing what I'm about to commit, so I always do `git commit -p` or `git add -i`. Most people where I work do the same, so I don't think this workflow is uncommon.

happymellon · on Aug 1, 2023

I never `git commit -a`, because I'm paranoid that I've done something like named a variable "foo" as I'm just working through the logic and not caring about naming.

I'll then go back through, tidy and add all my changes.

dingnuts · on Aug 1, 2023

if I don't do this (actually I use magit to select each hunk to stage, but that's just add -p with a fancy interface) then I'll accidentally commit testing log lines and that sort of experimental code

I never, ever commit -a. That flag horrifies me. I want to choose, specifically, each line of code that I am going to publish.

rkangel · on Aug 1, 2023

I almost always do "git add -p", "git commit".

"add -p" is great for showing you all the chunks of code you've written one-by-one and then you do "y" or "n" for whether you're adding them. Doing it like this means that you are reviewing what you've changed as a final check that you haven't left a debug line in, or forgotten part of what you meant to do. It's also a natural way of splitting up what you've done into two separate commits.

ocharles · on Aug 1, 2023

The analogous command here is `jj split -i`, which interactively splits the current commit (which is your working copy).

sicariusnoctis · on Aug 2, 2023

`jj split -i` gives:

> error: unexpected argument '-i' found

Actually, maybe I'm just a complete git, but I couldn't figure out how to `git reset HEAD~` my accidental commits, `git rebase -i HEAD~6`, format `jj log` more like `git log --color --oneline --graph --full-history` (which shows one-line-per-commit), `git checkout -p` (and obviously, `git add -p`), `git show HEAD~`, refer to N-th parents, e.g. `master~5`, and a bunch of other things...

It also feels a bit weird that new files are automatically committed without being manually approved, but I suppose this might theoretically help with some of git's annoyances.

Certhas · on Aug 1, 2023

Thank you, I have yet to see a workflow mentioned here that is not just as easy in jj (just conceptually different).

Also learning about progress git workflows though, so that's cool.

fiddlerwoaroof · on Aug 2, 2023

I never do `git commit -a` precisely because I don’t want to randomly add files that I have in the repository. If I’m in a hurry, I’ll do `git add -u && git commit` to add changed files that were already in the repository. But, more typically, I use magit to stage exactly what I want to commit.

jayd16 · on Aug 1, 2023

Git should really steal the only thing I like about perforce, many uncommited change lists. It should be an easy workflow change to add 'named stages' along side the default stage. That way you can just leave changes in a different stage, tracked but uncommitted.

dingnuts · on Aug 1, 2023

Unless I have misunderstood your feature request, that feature exists, and it's called "worktrees"

Worktrees allow you to have multiple branches open potentially with dirty state on any or all of them

https://git-scm.com/docs/git-worktree

jayd16 · on Aug 4, 2023

No I don't think that is a similar feature. It looks like worktrees are put on different paths and by default do not operate on the same commit.

My feature request is about having n stages instead of one so uncommitted work can be organized better while staying in the usual working copy.

How would I accomplish this with worktrees?

postmodest · on Aug 1, 2023

Is that different to git stash?

jayd16 · on Aug 2, 2023

Yeah, they stay in the working set but files are marked for different commits. Ideally git could do it at the line level instead of by file.

dryrun · on Aug 1, 2023

In git, you have the .git/info/exclude file to track these down. Given that jj is compatible, maybe it works?

Disclaimer : I'm not on my computer, I know a file like that exists, it may not be at this exact location

dataflow · on Aug 1, 2023

I think you want git's skip-worktree?

TylerE · on Aug 1, 2023

Ugh, I was baited by myself here. I’m he scripts I use at the day git (which are the only way I tolerate hit, honestly. It does this, and also recursively merges from parent branches in commit.

We’re on an old school feature branch/master is production topology though.

kasdi · on Aug 2, 2023

Assuming in git you would keep it as untracked file in the working copy, I find jj’s automatic anonymous working copy commit better.

For once, it avoids the downsides of carrying untracked files around. Ever accidentally committed an untracked file somewhere deep in history during a long rebase?

Also I find it clearer what files are committed: every file - except if it is in .gitignore. Meanwhile in git you scourge through your untracked files before each commit to decide which ones you don’t want to add. Ever accidentally committed all files and forgot that there were untracked files you didn’t want?

chaxor · on Aug 1, 2023

Are you simply using it with GitHub repos?

It mentions that it can be used with backends like Dropbox, but it would be wonderful if we finally had a system that could easily be used with IPFS. This is especially important for large data, since you can't store 1TB on github (and no, I don't count lfs, since you have to pay for it).

IPFS is the natural solution here, since everyone that wants to use the dataset has it locally anyway, and having thousands of sources to download from is better than just one.

So if this uses IPFS for the data repo, I'm switching immediately. If it doesn't, it's not worth looking into.

bastardoperator · on Aug 1, 2023

If you're storing 1TB of binary files in git, you're just doing it wrong anyways. You have a bunch of other tools and capabilities for doing this in a way that doesn't make your repository nightmarishly stupid to deal with because of its size.

chaxor · on Aug 2, 2023

I didn't exactly intend it to operate precisely the same way that git does, but rather to have extensions of git that unify the system into one easy to use version control for data and code.

In most projects today, the code is (or generates, anyway) the data. This is true for materials science in physics, neural networks, and creation of databases via ETL. So, it would make sense to remove the requirement of making users of some software to regenerate this data, which may take 2 months on a supercomputer. Downloading that would be much faster. You can put it on a university server, or AWS, but now the data is in some system that is not guaranteed to be there. In fact, it's almost guaranteed to *not* be there in a very short period of time (people move positions and lose their access to these servers constantly).

So the very obvious best solution is IPFS for distribution of the data, but it does need to be linked to the git repo somehow. Of course, the data may not be simple or textual and play well with simple text based diffs for version control, so using something like borg can solve the issue of both data privacy, if needed, and block based diffs.

So this isn't to suggest "just git everything", but rather to say, 'if there's a new version control system for data and code, it's probably added some improvements to fit, and this could be a direction that makes sense'.

So I was checking to see if it had gone that direction yet.

pulse7 · on Aug 1, 2023

One can still use Subversion to store binary files in VCS...

bastardoperator · on Aug 1, 2023

Nexus, Artifactory, Packages (deb, rpm, nix), Cache, GitHub Releases... There are so many places you can grab a signed binary from that are just outright better for the health of your repo, and will respect your developers time.

phlakaton · on Aug 1, 2023

The issue is not limited to archives, artifacts, and packaging. Game projects, for example, have large directories with many binary assets which need to be change-controlled. Artifact repositories address distribution, in a way, but don't generally support change control much if at all.

(And yeah, git's historically a poor choice for this – so you may see companies sticking with Perforce or other non-distributed solutions.)

arxanas · on Aug 1, 2023

The `Backend` interface is not that wide: https://github.com/martinvonz/jj/blob/48b1a1c533f16fc5df5269.... Mostly it just handles reading/writing various objects. You could very plausibly add IPFS support yourself!

pbronez · on Aug 1, 2023

In the author’s presentation [0], the Google roadmap includes “custom working copy implementation for our internal distributed VFS”. The related graphic shows a “working copy” block connected to a “distributed file system block”.

This work might be extensible to include IPFS and other distributed virtual file systems.

[0] https://docs.google.com/presentation/d/1F8j9_UOOSGUN9MvHxPZX...

gcr · on Aug 8, 2023

There are two questions in play here:

1. When we ingest files or make new commits, how are these additions to the object store persisted?

2. When operations modify the working copy, how should these changes be reflected in the user's view of that working copy?

Ordinary git handles (2) by directly modifying the files on the filesystem. If you `git checkout` a branch, git will `rm` nonexistent files, `open()` and `write()` new ones, and adjust modification timestamps etc as needed. As you make changes to these files, some commands will occasionally "notice" that the file changed after the fact, and some may choose to modify the index to match.

the jj on github also does this, but inside Google, our concept of "working copy" needs to be disconnected from local files. Developers don't have their own local "working copy" backed by files on the ordinary filesystem; instead, we do all development inside a FUSE-mounted virtual FS called "client in the cloud" (CitC), so working anywhere inside our giant monorepo doesn't take any disk space (except caching). I think that's what the "Distributed file system" refers to - instead of modifying the local filesystem, jj would need to talk to whatever remote service provides the user's FUSE-backed view whenever the user uses `jj checkout` or some other jj operation that modifies the working copy.

When you speak of implementing IPFS storage, I think instead you want to keep the object store and operation log on IPFS while keeping the local working copy right on the ordinary file system, similar to how git-LFS keeps local files untouched while modifying the way they're persisted to the git object store.

Alternatively, perhaps we could imagine an IPFS backend similar to `jj git` and `jj native`, perhaps `jj ipfs push/pull`. Then, a completely local repository could push/pull to and from IPFS, completely agnostic of how the user's repository is stored on disk.

In any case, jujitsu's API surface is flexible enough to support any of these use cases since the author designed it from the ground up to smoothly support very different needs for internal and external users. Most users outside Google just want a familiar working copy containing ordinary files, and the fact that the repository structure happens to be backed by a git-like object store (linus' "git is just a merkle tree of files" philosophy) is incidental under the hood. That's just fine, even though most internal users will be interacting with a very different way of using jj when everything's said and done. Ideally, nobody needs to notice or care about the difference.

[1]: More about Google's internal VCS needs: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

[2]: Linus Torvalds on git: “In many ways you can just see git as a filesystem — it’s content addressable, and it has a notion of versioning, but I really designed it coming at the problem from the viewpoint of a filesystem person (hey, kernels is what I do), and I actually have absolutely zero interest in creating a traditional SCM system.”

eternityforest · on Aug 3, 2023

IPFS didn't always seem to have pinning services at BitTorrent prices($3-$5 a month with 1TB bandwidth, no crypto wallet needed) and the client usew tens to hundreds of kb per second idle.

Unfortunately, the model of putting every block in the DHT instead of having roots mode be the default, and then spamming your wantlist to tons of peers seems to still be at least partly around.

Right now IPFS looks pretty good thanks to the gateways and services, so I would imagine well see more of it in the future, but I can see why it took so long.

ocharles · on Aug 1, 2023

Yea, just GitHub repos. In fact just a single repo (work's mono repo) for now, but that's where I spend the majority of my day.

motoboi · on Aug 1, 2023

Why do you want such big files in a git repo?

chaxor · on Aug 2, 2023

The point is to have an easy way to distribute code as data. This is important for many areas, such as training neural networks (code with proper seeds can ensure the weights output by training), various applications in basic physics, database creation via ETL, etc.

If the choice is "run this code in the repo, wait 10 weeks while it's running, and retrieve the 50GB file", vs "download this file", of course, the latter is better. But many of these processes exist in academia, wherein you are essentially guaranteed to lose access to the server and maintenance of that file for download, it can get pretty annoying. Additionally, there's no seamless way of distributing it (it's in the docs, point somewhere else that may or may not exist, etc).

Since essentially all big data is really just code, it would make much more sense to tie these directly at the hip. So, a git/repo commit hash that is a key directly to the IPFS data hash would fix this problem directly.

So it's not "wanting big files in a git repo" (an obvious no-no, since central servers shouldn't be used for storing large data, and github centralized repos only should store single digit MB or so), it's wanting to relieve the cost of running processes that may require supercomputers weeks of processing for QM calculations, etc by providing a guaranteed hash pairing of the output of the code.

theLiminator · on Aug 1, 2023

How about why not? The only reason it's not done is because git doesn't support it.

motoboi · on Aug 1, 2023

Maybe I came across as accusative, but I'm genuinely curious. Do you have 1Tb text files or this is some kind of media management for video production, something like that?

ptx · on Aug 1, 2023

Because it's a source control system, which means it's intended to store source code, not the artifacts generated from the source code. It seems far-fetched that anyone would manage to author 1 TB of source code.

happymellon · on Aug 1, 2023

This isn't true at all. We were storing binary files separately via Maven for Java projects for almost 20 years now.

This was done with SVN projects. Keeping the blobs out of your source repos has been the preferred way for a long time.

[Edit] The only folks who seem to want to do this are game developers, and they are generally not people you would want to emulate.

theLiminator · on Aug 1, 2023

Then how come git-lfs even exists at all? There's clearly a demand for it. Whether it's good practice is up for debate.

> Keeping the blobs out of your source repos has been the preferred way for a long time.

This is just appeal to tradition.

happymellon · on Aug 1, 2023

> This is just appeal to tradition.

It might be, but the arguement was that we don't do it because of git.

We haven't been doing it for a long time, but that's not because of git.

h0l0cube · on Aug 1, 2023

I'd be curious to know if someone is successfully using this in a team. How is it when two people are working in the same branch?

arxanas · on Aug 1, 2023

It's not really different than using Git to work on the same branch. If you and the teammate commit to the same branch, then you'll need to resolve the divergence somehow, usually a merge or rebase, or choose to forcibly overwrite the other's changes.

h0l0cube · on Aug 2, 2023

I can theorize about how it might work in a team also, but I was curious how this has played out in practice for anyone

arxanas · on Aug 2, 2023

To be clear, I am describing my actual usage of jj. It’s worth noting that there’s currently not a `jj pull` command, so divergences typically involve manually setting a branch pointer for me.