Hacker News new | past | comments | ask | show | jobs | submit login
Starting October 19, storage limit will be enforced on all Gitlab Free accounts (docs.gitlab.com)
167 points by reimertz on Aug 8, 2022 | hide | past | favorite | 179 comments



A lot of discussions about how git repos are supposed to be small are totally missing the point. This storage quota applies to everything, including release artifacts, containers, etc. Forget containers or CI artifacts on every commit, let's look at a very common scenario: using goreleaser to build binaries and deb/rpm/etc. packages for multiple architectures every release. This way a moderately sized Go project can easily consume 50-100MB or more per release. That gives you at most 50-100 releases across all your projects.

Using hosted GitLab for open source projects is looking less and less appealing.

I also posted about issue trackers on gitlab.com not allowing search without signing in a while back: https://news.ycombinator.com/item?id=32252501

Edit: An open source program that upgrades the quota is mentioned elsewhere in the thread: https://about.gitlab.com/solutions/open-source/ I don’t use hosted GitLab for my open source work, so no idea how many people get approved.


Hi, GitLab team member here. The GitLab for Open Source program provides open source projects with Ultimate benefits and higher limits. More info in previous comment: https://news.ycombinator.com/item?id=32387621


Why do you have to “apply” for these benefits? Shouldn’t providing the source publicly in the open (ie not a private repository) by definition be enough for “open source”?


Why do you expect services for free? Is it really too much that you have to ask to be upgraded for free?


I'm guessing it's because you can easily whip up an OS project and pay nothing on Github.


What happens if I publish the code under a restrictive (non-open source) license?

Even though the code is available, that does not mean you can use it however you want, no?


But GitLab already gives more features to public repos.

Public repos have a cost factor of 0.008 so your actualy CI quota is 125,000, instead of only 400.

It would actually make it feasible to grow on GitLab if quotas changed based on if its a public or a private repo, like how GitHub does it.

https://docs.gitlab.com/ee/ci/pipelines/cicd_minutes.html#co...


Maybe just giving this free quota to projects that is using a known open source license?


Any plans on removing the sign up requirement to search issues ? Its really annoying.


I guess this is the next step to reduce costs after the brakes were put on the "let's delete old OSS repositories" leaked plan.

For comparison, I think GitHub just have a cap of 100MB on any single individual file, plus:

> We recommend repositories remain small, ideally less than 1 GB, and less than 5 GB is strongly recommended. Smaller repositories are faster to clone and easier to work with and maintain. If your repository excessively impacts our infrastructure, you might receive an email from GitHub Support asking you to take corrective action. We try to be flexible, especially with large projects that have many collaborators, and will work with you to find a resolution whenever possible.

Which is a bit wishy-washy, but sounds like there's room for discretion / exceptions to be made there rather than a hard cap at 5GB.


The difference being that the value Microsoft gets from GitHub isn't its revenue, it's its influence. Whereas Gitlab is just another corporate software suite.


> the value Microsoft gets from GitHub [is] influence

Don't forget easy access to all the code to train Copilot, the AI code launderer. With my tinfoil hat on: even the private repo edition!


I don't have the diffs in front of me, but I'm fairly sure this predates the MS acquisition by quite a while.


Prior to MS acquisition, Github free offering was really bad: No private repos, no CI pipeline and I think there was not real "team".


Well, Github (free or not) had no CI until Github Actions, which was after the acquisition. Integration with things like Travis were already there and free.


That's a weird definition of bad. It was free and it worked as advertised without providing every imaginable feature.


Do you have a source for this? I vaguely remember GitHub starting to offer private repos when GitLab started getting popular.


Github raised the 5 repo limit to infinite in 2016 https://github.blog/2016-05-11-introducing-unlimited-private...

And made private repos free after the acquisition https://github.blog/2019-01-07-new-year-new-github/


> I think GitHub just have a cap of 100MB on any single individual file

There's a 100MB limit on the size of a push, which also limits the size of a any single git object (i.e. file) to 100MB too. However GitHub supports LFS for large files, and their documentation says to use LFS for files over 100MB:

https://docs.github.com/en/repositories/working-with-files/m...

> GitHub blocks pushes that exceed 100 MB.

> To track files beyond this limit, you must use Git Large File Storage (Git LFS).

https://docs.github.com/en/billing/managing-billing-for-git-...

According to my own GitHub account ( https://github.com/account/billing/data/upgrade?packs=1 ), I'm paying $37/yr for 600GB LFS storage on top of my existing GitHub Pro subscription.


> I'm paying $37/yr for 600GB LFS storage on top of my existing GitHub Pro subscription.

I'd love to know how you managed this! I'm paying $60/yr for 50GB, which feels exorbitant.


My account is still on some now-discontinued pricing tier, so I guess I got grandfathered-in or something.


GitHub has a repo size limit of 100GB (https://web.archive.org/web/20200521202931/https://help.gith...) and you'll get a warning if you exceed 75GB


> Github had a repo size limit of 100GB

The source you link is 2 years old.


Can you provide a source to the contrary? I’m fairly certain it’s still in place.


Just check the unarchived version of the site OP linked: https://help.github.com/en/github/managing-large-files/what-...


It's no longer mentioned in the current version, that's why I linked to the archive.

Yes, the information is 2 years old, but since nobody has any proof of the contrary, I'd say it still stands.

(Sorry, my upload is too slow to just do a quick check and see if I can indeed push a 100gb repo, maybe someone else can try?)


If you build an image for testing on every commit and don't have a retention policy set up, you could be using a massive amount of space without realizing it. I can see why they did this.


Also, for some things there is no retention policy configurable at all -- eg pipelines and their associated stdout logs appear to not be auto-cleaned-up at any expiry date (no default, nothing configurable). If you want to get rid of those you need to script it via the API, it seems: eg https://gitlab.com/eskultety/gitlab_cleaner

(I just ran that on the QEMU project and reduced the usage from 295GB to 165GB by deleting pipelines older than 1 Jan... so that's a lot of low-hanging logfile fruit gitlab could be auto-deleting.)


Gitlab didn't even have retention policies until sometime in 2020. I have a six year old project that's consuming something like 4TB of space in their container registry.


Yeah tbh we are leaving the age of abundant cheap money where everything is unlimited and unpriced. Having people pay for their usage will get them to actually start cleaning up all their junk build artifacts rather than just eating the loss on storing petabytes of old docker images which have no value.


Woah. 4 TB? What kind of project is it?


Not op, but a simple node application can very easily make container images that are well over a gigabyte. Build and store that container images on every push and boom, you can explode in usage.


Are retention policies easy? I recall fighting on GitHub to setup a sane retention policy awhile back, but haven't revisited in years.


GitLab offers ways to manage retention policies for your container registry. Please see the docs for guidance: https://docs.gitlab.com/ee/user/packages/container_registry/...


I wouldn't say easy because regexes are needed for complex ones (e.g. retain -prod images for 6 months, -staging for 24 days) but they are powerful and not that complicated.


GitLab team member here. The impacted users are notified via email and in-app notifications will begin 2022-08-22, so far we've contacted 30,000 users. Only GitLab SaaS users are impacted - the limits are not applicable to self-managed users.


We're affected by this change, and have been trying to get support in forums, but can't get hold of anyone.

Build Artifacts are listed as part of what contributes to the quota (which is fair enough), but there's no way (that I could find in the docs) to manage build artifacts that are stored.

I suspect we have a large historic storage, which we don't use / need, but there's no way to browse this, no way to verify it, and no way to delete what we don't need.

I'm dreading getting a big ole bill in a few months for storage we had no way to opt out of.


GitLab Support Engineering Manager here. These limits will first be soft limits as they are now. Impacted users will be notified and there will be about 2 months to take action before enforcement starts. After that point, limits will be enforced.

If you’d like to investigate now you can use the GitLab API to fetch job artifacts and their storage size: https://docs.gitlab.com/ee/api/jobs.html#list-project-jobs and calculate the summary.

I’d suggest collecting all pipeline jobs to be deleted using this API endpoint https://docs.gitlab.com/ee/api/job_artifacts.html#delete-job...

We are looking to improve the visibility for job artifacts in the UI as we get nearer to the storage limit being enforced.


I have a repo that was almost exactly 5GB and it turned out that the storage was also used exclusively by pipeline logs and artifacts.

Found this recipe on StackOverflow to just nuke everything and worked for me: https://stackoverflow.com/questions/71513286/clean-up-histor.... You can tweak it a bit if you want to keep the more recent pipeline runs.

For some reason I had to run this multiple times to completely remove everything; each run removes about half of all the pipelines and only when there were about 20 remaining did the script remove everything.


What's the difference between self-managed and SaaS users? To me gitlab is a SaaS ?

I mean if I'm self hosting on digitalocean doesn't that just mean that I'm using your FOSS and doing all the work myself and completely separate from gitlab anyway other than the common codebase?

Sorry, I'm not a guru of gitlab and don't know all the common parlance.


GitLab SaaS is the instance at gitlab.com.

Anything else is self-managed.


gotcha, so my instinct wasn't completely off. Thanks.


Honestly for me the bigger issue is the upcoming bandwidth limit.

There is no clear date to when its going to be enforced.

But its definition is so wide its crazy, its basically any egress data except the web interface and shared runners.

AFAIK it will also include git clones!, so if your project suddenly gets popular the users clones will cost you many too.

Also if you use your own runner, cloning the repo to the runner will also be included in your bandwidth limit.

And since it should apply to GitLab pages, it becomes useless for anything the you want to get few visits.

Since GitLab is behind cloudflare, you might as well just use Cloudflare Pages at this point.


> Also if you use your own runner, cloning the repo to the runner will also be included in your bandwidth limit.

You make it sounds like it's horrible. I'd offer a different take in that sure, it can be a huge negative if done recklessly on GitLab's part. But if done correctly, it can actually be a good thing. I think CI pipelines doing a full clone from scratch on every build (or npm installs or other bootstraps) is extremely wasteful in terms of resources, so I'd be glad if that reduces it significantly.


GitLab's runner agent software doesn't support a local git cache.

GitLab.com Saas agents have no sticky-ness, so basically always clone.


Thanks for the suggestion of adding a Git cache for a GitLab Runner. A similar mechanism already exists that allows control of the Git strategy with fetch on a local working copy which is faster, and clone if not yet existing. All benefits and limitations are documented [0].

To help prevent unneeded traffic when a new pipeline job is executed, GitLab Runner uses a shallow Git clone by default on GitLab.com SaaS that only pulls a limited set of Git commits from the current head instead of a full clone. [1]

There is a feature proposal [2] to add support for partial clone and sparse-checkout strategies that have been added in more recent Git versions. Recommend commenting/subscribing.

Please note that for GitLab.com SaaS Shared Runners, the traffic limits do not apply with mostly internal cloud traffic. [3]

[0] https://docs.gitlab.com/ee/ci/runners/configure_runners.html...

[1] https://docs.gitlab.com/ee/ci/large_repositories/#shallow-cl...

[2] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26631

[3] https://about.gitlab.com/pricing/#what-counts-towards-my-tra...


Yeah, that does sound wasteful, just like what parent said. They should add some sort of cache support if that's the case.


GitLab CI default clone configuration already AFAIK only includes the last 20 commits, so its already slimmed down.


If you start from scratch, regardless of history, you're fetching at least the size of the repo. In many cases, that's hundreds of MBs for every build and I'm a huge fan of building for every commit. I find that a colossal waste.

I have a very old fashioned and "not recommended" setup with Jenkins that only pulls new commits, because it works out of a persistent working directory. Works wonders for a Django/ES6 project and sips bandwidth. I wish more of the modern containerized, start-from-scratch and so on setups would work in a similar way wrt the bandwidth they use.


GitLab team member here. Thanks for the comment.

We understand people don’t want their quotas being filled by something outside of their control. For Open Source projects, we have GitLab for Open Source which contains higher limits from the Ultimate tier (250GB Storage, 500GB transfer/month). In addition, we intend to look into other ways to address your concern such as counting only your own traffic or allowing you to limit external traffic.


I love the fact that they haven't thought out a lot of how this needs to work. You can't even decide if it's worth sticking with GitLab because they don't even know how they are going to handle half of this stuff.


I appreciate their open discussion and iterative approach. There’s no need to get your feathers ruffled. These measures mainly impact users freeloading their services, so maybe have some empathy for the company providing them?


> These measures mainly impact users freeloading their services

Does it? We use GitLab at work, and we have to use local runners (for regulatory reasons + GitLab runners don't support what we need anyway).

Unlike other CI/CD providers (e.g. teamcity), GitLab Runners don't have a local git cache. So if you need to do a clean clone for an important build (gitlab runners don't clean up well after themselves), you need to re-clone the repo from GitLab.com.

With a 1:2 ratio for storage vs bandwidth (10GB storage, 20GB bandwidth per month), assuming using 1GB in latest commits, and 2GB total repo size (e.g. a vendored dependency that doesn't change frequently):

- 2 runners - Clean once a week - 5% of bandwidth per shallow clone - 40% of bandwidth per month on CI/CD alone.

Leaving space for 6 clones by developers (hope you don't upgrade your dev machines often).

If you're using a submodule in multiple projects you're going to tear through your bandwidth.


Like I said, it /mainly/ impacts freeloaders. Having a 1+2 GB git clone is not a common usecase. That being said, it appears that you would not be affected:

> Transfer is the amount of data egress leaving GitLab.com, except for:

> Paid plans only: self-managed runner transfer and deployments. This is determined by transfer authenticated by either a CI_JOB_TOKEN or DEPLOY_TOKEN

(https://about.gitlab.com/pricing/#what-counts-towards-my-tra...)


Freeloaders... You mean the people they encouraged to use their service and then pull the rug out from under them?


Thanks for sharing for your feedback. I have added more insights into caching and checkout strategies to reduce traffic and speed up job execution in GitLab Runners in this comment: https://news.ycombinator.com/item?id=32408960


Am I reading it right that the original Free Tier had a quota of 45,000GB? That seems absurdly high and not very sustainable (hence the change I assume).


I'm reading that as a "stop-the-bleeding" number; presumably there is someone out there with a 44TB repository and they want to impose initial quotas that don't actually impact anyone immediately.

I guess someone has been backing up their movie collections to Gitlab or something.


One case I've seen on our local Gitlab server is someone in data science/HPC (accidentally?) adding the output of a solver to their repo. Easily hundreds or thousands of gigabytes of data.


I had to learn svn surgery because someone imported a 1GB archive to test the ol' signed 32 bit file size bug, got yelled at, and deleted it. Well, except it's still in the repository, mate, it's just hidden.

I replaced it with a fixture that was 2GB of space characters, which compressed down to about 3KB. I know there's a canonical file bomb zip file that's under 1K but there's clever and then there's clever.


No, originally there was no quota enforced to begin with if my memory serves me right. The limits discussed here are likely meant to gradually tighten up the limits, rather than immediately locking out projects that exceed these limits.


I guess an infinitely large storage quota is only slightly less sustainable than a 45,000GB one.


I won’t publish my gitlabfs plugin for 1.7k different programming languages then.


I might be wrong about that but I'm pretty sure that I saw that limit (100 MB) when I was reading about GitLab plans in the past. They just didn't enforce it for some reason.

IMO GitLab does the wrong thing. They should have enforced those limits from the beginning. And if they didn't, they should've eaten those expenses or at least grandfathered old repos.


> And if they didn't, they should've eaten those expenses or at least grandfathered old repos.

Why should they eat the expenses of abusive users? As well they clearly have and are finally taking action about preventing it. Your entire post seems extremely entitled to their money.


No, 45,000GB wasn’t a quota. At the moment, the 5GB limit is a soft limit, and the rollout plan included tiered enforcement. This is an internal implementation detail of our technical rollout plan.


That number sounds like "unlimited, but the system requires a number" kind of thing.


Git excels at tracking human keyboard output. A productive developer might write 100KB of code annually so a git repo can represent many developer years of collaborative effort in just a few MB. That is, unless you require git to track large media files, third party BLOBs, or build output.

However, sometimes tracking these things are necessary, and since there isn't an obvious companion technology to git for caching large media assets ("blob hub?") or tracking historical build output ("release hub?"), devs abuse git itself.

I wish there were a widely accepted stack that would make it easy to keep the source in the source repo, and track the multi-gb blobs by reference.


Git LFS is a pretty widely accepted stack for managing binary blobs by reference in git: https://git-lfs.github.com/

The plugin is installed out of the box in many git distributions now. Many hosts support it today, including Gitlab, which is relevant to this article's discussion: https://docs.gitlab.com/ee/topics/git/lfs/


I have yet to find a host that has at all usable Git LFS pricing though.

GitLab is $60/m (edit, per year? unclear in pricing page) for 10GB storage and 20GB/m transfer. So, I can checkout my repo twice a month!?

GitHub is similar, with a 1:1 ratio for storage and transfer. So you can checkout your repo once a month.


You mean something like git Large File Storage? It comes with git for Windows by default and every Linux distro I know has it in its repos. MacOS also has it in Homebrew.

https://git-lfs.github.com/

It's basically one click/command away.


> companion technology to git for caching large media assets ("blob hub?") or There is Amazon S3 or any other cloud storage.

> tracking historical build output ("release hub?")

There are OCI artifact registries.

For me, there is rather issue that when all you have is a hammer, everything looks like a nail.


5 gigs? what is this, repo hosting for ants?


That does seem ridiculously small. It's probably fine for 95% of projects though.


Git change compression is quite effective. Unless you start checking in large binaries, most companies will never get close to 5 GB. (Even 1 GB is a pretty substantial amount of code.)


That will depend on how well they handle forks. I could easily see myself exceeding such a limit just issuing bug fixes.

Especially since one of the bugs I'm likely to fix is, "why is this library so damned big?"


GitLab team member here. Forks of projects get deduplicated, so only the changes you make will contribute to your storage consumption as long as the fork relationship is maintained.


Update: our team recently identified an issue impacting how we calculate storage which results in forks being counted towards Usage Quota. This will be addressed before we begin enforcement.


Shrinking the library won't make the repo any smaller. You have to erase history to do that.


Yeah I thought that might be confusing.

Artifact size depends on what garbage from the project infrastructure gets included into the artifact. So simple things like not having a fully populated ignore file will cause things to get included into the project or the artifact.

People who don't care about file size don't care about file size. If the artifact is ridiculous, sometimes the repo is ridiculous too. Therefore if you take a bunch of projects with outsized artifacts, you are going to have above average repository sizes as well.


I need solutions that fit 100% of my projects. I don’t want to use service X for 95% and service Y for 5%.


You are also willing to pay for it?

Because it sounds like you want to use 100% of the service and not pay for it.


Then you can pay for more data, seems fair no?


Do personal accounts really have 5gigs of code??? Unless you have a lot of models/artifacts, images, plain text code should be well under 5gb for 100s of projects


Where this will really hit is the docker image registry. The lazy CI implementations will just tag a new image every single commit and leave them sitting there forever. Hundreds of megs each, completely useless.

In a way I'm almost glad people will have to start cleaning up after themselves.


Most repo's are in the singles or tens of MB. If those are ants, then sure.


I dont disagree with that, especially given the assertion that artifact storage from pipelines does not count towards the 54gb, it just feels like a tiny amount for 2022. just bumping it to like 20gb would make me feel better even though it doesnt really matter.


That is what github has now, doesn't seem to be out of the ordinary.


That per repo, GitLab's is for the whole account.


What's the point of tapering it down in stages like this? Between the October 19th quotas and the October 20th quotas, if you wait until the last minute, you have 24 hours to move 37.5TB of data. Then 4 more days to move another 7TB; does that actually help anyone? The proposition of getting that much data out of it at that speed seems a bit unrealistic. Why not just say "the quota will be 5GB on November 9th" and be done with it?


The phased enforcement of the limits are a part of the technical rollout plan for this change that was added to our docs. Related comment: https://news.ycombinator.com/item?id=32387597.

The communication sent to impacted users via email and future in-app notifications includes only the applicable enforcement dates and limits.

Issue to follow: https://gitlab.com/gitlab-org/gitlab/-/issues/368150


Definitely sounds like there was some bureaucracy in there.


A couple of notes

  - If I tag a docker image with multiple tags, and then push it to Gitlab, each tag counts towards the storage limits even though SHAs are identical. eg 100MB container tagged with "latest" and "v0.5" uses 200MB of storage.
  - The storage limit is not per repository, but per namespace. So 5GB free combined for all repositories under your user. If you create a group, then you get 5GB free combined for that group. Does this include forks? Does this include compression server side?
  - The 10GB egress limit per month includes egress to self-hosted Gitlab Runners in free tier. Consider this with the 400 minutes per month limit on shared runners.
These limits feel less like curbing abuse and more like squeezing to see who will jump to premium while reducing operating costs. Is this a consequence to Gitlab hosting on GCP with associated egress and storage costs? Is this a move to improve financials / justify a market cap with fiscal storm clouds on the horizon? Is this being incentivized by $67m in awarded stock between the CFO and 2 directors?

Stock history over last year for GTLB (since IPO in 2021?): https://yhoo.it/3QaExCs

From the golden era of 2015: https://about.gitlab.com/blog/2015/04/08/gitlab-dot-com-stor...

> To celebrate today's good news we've permanently raised our storage limit per repository on GitLab.com from 5GB to 10GB. As before, public and private repositories on GitLab.com are unlimited, don't have a transfer limit and they include unlimited collaborators.


Hi @rohfle! A couple of clarifications:

> If I tag a docker image with multiple tags, and then push it to Gitlab, each tag counts towards the storage limits even though SHAs are identical. eg 100MB container tagged with "latest" and "v0.5" uses 200MB of storage.

Any "duplicated" data under a given "node" (be that the root namespace, a group or a project) counts towards the storage usage only once. So images latest and v0.5 would only represent 100MB in their namespace registry usage, not 200MB.

> Does this include forks?

The registry data is not copied/duplicated when one forks a project. So this is not applicable. But even if it was, as long as the fork and the source are under the same root namespace, any "duplicated" registry data across the two would only count towards the storage usage once.

> Does this include compression server side?

Yes, the measured size is the size of the compressed artifacts on the storage backend.

We should be updating the docs shortly to make these answers more transparent!


I didn't even know there was no storage limit - that seems like an immediate way to get your platform used to store non-code data in very large quantities.


I would guess controls to avoid people storing pirated movies on Gitlab repos and similar stuff have been in place for a long time.


Looks like everyone is rushing to check their storage.

> something went wrong while loading usage details

On my free group’s storage page.


Yes, the container registry and build artifacts are now correctly displayed in the free plans.


What kind of projects are using 500GB much less TB's of data?


Gitlabs own repo is at 33TB (mostly docker images). https://gitlab.com/gitlab-org/gitlab


In gaming, you can have a single branch that is 8TB with 1+ million files. But those projects aren't using git.


Desktop Environment components, I suspect.


5GB isn't much different than the storage limits of other services, but their storage pricing is atrocious. I've seen the writing on the wall for a while and watched as GitLab went from being the cool open source alternative to GitHub to becoming a bloated oversized mess. I know several popular open source projects were offered premium tier upgrades for free. I am curious to see if these changes, especially transfer limits, will impact them enough to move away.


My Qt/C++ cross-platform FOSS Wallpaper Engine project[1] currently uses 47gb of storage. This is because I compile for every platform and store the artifacts for 4 weeks. Not sure what I will do in the future, because having older builds around to try out without recompiling is always nice.

[1] https://gitlab.com/kelteseth/ScreenPlay


I guess that having old artefacts is nice, but expecting 47GB of storage for free per repo seems ludicrous to me.

Maybe it's a generational question, but I would be pretty happy with 1-2GB (i.e. git history, and one artefact per platform).


>but expecting 47GB of storage for free per repo seems ludicrous to me.

I agree with this. This move makes perfect sense. I think GitLab does a lot already for the people that don't pay for the service.


47GB would be less than 3 cents per month for object storage in backblaze b2. And most repos won't have anywhere near that much storage. It isn't zero, but afaik, GitLab's main competition doesn't have a similar limitation.


GitLab for Open Source provides OSS projects with Ultimate tier benefits, and includes 250GB of storage and 500GB transfer/month. Please apply to join the program here: https://about.gitlab.com/solutions/open-source/


And note that even if you're just "a random someone with a bunch of open source repos", you almost certainly already qualify for this program:

---

In order to be accepted into the GitLab for Open Source Program, applicants must:

- Use OSI-approved licenses for their projects: Every project in the applying namespace must be published under an OSI-approved open source license. [1]

- Not seek profit: An organization can accept donations to sustain its work, but it can’t seek to make a profit by selling services, by charging for enhancements or add-ons, or by other means.

- Be publicly visible: Both the applicant's GitLab.com group or self-managed instance and source code must be publicly visible and publicly available.

---

[1] https://opensource.org/licenses/alphabetical


> Every project in the applying namespace must be published under an OSI-approved open source license

Well I guess I would have to split out my one little private repo I use for my dotfiles into a separate account in order to qualify.


tfw GitLab admits that their own product isn't open-source ;)

("can’t seek to make a profit by selling services, by charging for enhancements or add-ons")


That isn't Gitlab saying those are the requirements for being "open-source", it's just their requirements for who they are willing to give a generously large amount of free services to.

I think it's quite fair to say you're not going to give free services out to open source projects that are seeking to fundraise beyond covering their costs.


They’ve been saying that for a long time. https://about.gitlab.com/blog/2016/07/20/gitlab-is-open-core...


That's... an incredible footprint. Surely even if there was no size limit, deduplicating the build output would have made a lot of sense?


Having said that: if you run a public open source project that has requirements beyond what a regular free user gets, then why not apply to the open source program[1] (which you almost certainly already qualify for) so you're not space constrained?

Quoting the requirements:

---

Who qualifies for the GitLab for Open Source Program? In order to be accepted into the GitLab for Open Source Program, applicants must:

- Use OSI-approved licenses for their projects: Every project in the applying namespace must be published under an OSI-approved open source license. [2]

- Not seek profit: An organization can accept donations to sustain its work, but it can’t seek to make a profit by selling services, by charging for enhancements or add-ons, or by other means.

- Be publicly visible: Both the applicant's GitLab.com group or self-managed instance and source code must be publicly visible and publicly available.

---

[1] https://about.gitlab.com/handbook/marketing/community-relati...

[2] https://opensource.org/licenses/alphabetical


Perhaps now is a good time to recommend the ever-popular BFG to anyone unaware: https://rtyley.github.io/bfg-repo-cleaner/

Also my team's biggest repo is a 2.5 GB checkout but gitlab (self-managed) reports it as 185MB "files" and 353 MB "storage" (no CI/CD artifacts).


Similarly there's also git-filter-repo: https://github.com/newren/git-filter-repo

It's in Python so runs pretty much everywhere *nix out of the box.


> erhaps now is a good time to recommend the ever-popular BFG to anyone unaware: https://rtyley.github.io/bfg-repo-cleaner/

Thanks for sharing this tool to help cleanup the Git history. Please be aware that it will rewrite the history, which could be very impactful to existing branches, merge requests and local clones. A similar approach is described in the documentation: https://docs.gitlab.com/ee/user/project/repository/reducing_...


Seems like a buried lede here is that limits also now apply to paid accounts. Just checked my team’s name space: we have 700GB of storage used, and gitlab is going to start charging us $0.50/month/GB for everything in excess of 50 GB. On top of the hundreds of $/month we’re already paying in per-seat pricing. That seems absurdly expensive.


Thanks for your feedback. I’d suggest starting the analysis to identify the type of most storage being consumed. Maybe CI/CD job artifacts are kept forever and need an expiration configuration [0], or container images in the registry would need cleanup policies [1]. The documentation linked in the FAQ provides more analysis and guides to help. [2]

If you need additional help with analyzing the storage usage, please contact the GitLab support team. You can also post cleanup questions on our community forum. [3]

[0] https://docs.gitlab.com/ee/ci/pipelines/job_artifacts.html#w...

[1] https://docs.gitlab.com/ee/user/packages/container_registry/...

[2] https://about.gitlab.com/pricing/faq-paid-storage-transfer/#...

[3] https://forum.gitlab.com/c/gitlab-ci-cd/23


I'm assuming this is only for the ones they host and not the self-hosted solution. That is insane that anyone uploads terrabytes of data into gitlab, is there an actual valid non-illegal / weird backup choice use case? Is there some big ass GIS open source project out there that could use the attention of GitLab before they nuke some vital data somehow?


The quota includes not just the git repo, but also other services that gitlab offers. In particular the container and package registries are easy to setup in a way that accumulates a lot of data (e.g. by building and tagging a docker image in every commit and not cleaning up old images). In most cases this will be a misconfiguration and all the quota enforcement is doing is to force people to configure their projects a bit more considerate.

As far as I know, the quota itself isn't new, it just wasn't enforced in the past.


GitHub limits the package registry to 500M for free accounts, so 5G doesn't seem so bad. Git repos themselves usually don't take up that much space; all my 135 projects on GitHub are ~250M combined (quick count, could be off a bit, but roughly on that order). Even larger repos at $dayjob with daily commits typically take up a few hundred M at the most.

Basically you need to either be in the top 0.1% of highly active accounts or do something specific that requires a lot of disk space, but for most people 5G doesn't seem so unreasonable.


I'm actually somewhat curious how much of an impact this will actually have - I think I've only seen a 5 GB+ repo once or twice and they were not really source code, but mildly "abusing" github CDN/releases for downloads.

At least gitlab is not deleting any data, just rejecting pushes if you're over the limit.


You are correct in your assumption - the storage limit impacts SaaS only. I added some context on the rollout in a previous comment. https://news.ycombinator.com/item?id=32387597


Anybody found out which projects on gitlab exceed the 45 TB limit?

I'm curious what kind of project would even need such a repository size. From a distant view this sounds like heavily mismanaged build artifacts in the project's git history; or abused storage for free CDN of video data or similar.


Maybe a super active repository with a large build matrix? For instance, a repository with 10,000 commits, 25 artifacts per build, and 200 MB/artifact will take 50 TB. It is still ridiculous though.


Are releases/tags also counting in on that?

I was re-reading the announcement but as far as I can tell it only states repository size everywhere.

The linked cleanup instruction articles also mostly show repo size (git gc et al) and nothing related to the releases section of a repository. [1]

I'm also still unclear about what exactly is part of the package registry storage and whether or not releases in the repository are part of that.

[1] https://docs.gitlab.com/ee/user/usage_quotas.html#manage-you...


This doesn't affect me, but a better way to handle this would be to sell extra storage at, say, double GitLab's cost. Digital Ocean sells 250 GB object storage at $5/month and $0.02/GB beyond that.


GitLab team member here. GitLab offers additional storage for $60/year for an additional 10GB of storage and 20GB of transfer/month.


Yes, $0.50/GB/month seems like very reasonable pricing /s

That's over twice as much as the disks to store that much data cost. Every month.


Yeah that really hurts. A single Unity project can easily reach that in a short amount of time. When I get that warning email in a few weeks, I'll be shopping around.


I see now that it was mentioned at the end of the post. (Though I did specify "double GitLab's cost".)


So, we either go to Github, where our licenses are abused for their shitty ML.

Or we pay $20/month to Gitlab. And I can't figure out how the quotas will intersect with "professional", if at all.

For us Open Source devs, neither is a good option. Although I have heard good things about sr.ht / sourcehut. And for the service, it appears to be fair https://sourcehut.org/pricing/


Why do you need more than 5gb for a git repo? IMHO, any repo above 100mb should have an exceptional reason for being so big, and if it's just code a normal repo is more like 1 to 10 mb in size, max.

You don't even need to pay gitlab, free tier can do for most stuff and there's a generous sponsorship for Open Source.

I don't get all the hate Gitlab is receiving these days.


I do hardware hacking, presentations, code, pictures, and full media to reproduce what I've shared.

FreeCAD files get big. And to accommodate easy printing, STLs are also needed.

KiCAD for board layout and schematics can also get larger. And remember, these also have 3d board components too.

Presentations are naturally larger.

Full high-rez pictures eat storage like you wouldn't believe.

Code is small, thankfully.

One such device I have created is hovering around 4.5GB for a full reproduction for the current snapshot. And if you do the command to pull the whole history (and not current), its around 10GB. And, I'm not sure if GL is counting the whole history, or the current? And are they pruning old after a certain date?


> And, I'm not sure if GL is counting the whole history, or the current? And are they pruning old after a certain date?

GitLab stores the Git history with all data revisions as commits. To reduce the repository storage [0], older Git commits can be pruned but this would rewrite the Git history as an invasive change [1]. To store larger files, it is recommended to use Git LFS [2]. Alternatively, you can use the generic package registry to store data. [3]

[0] https://docs.gitlab.com/ee/user/usage_quotas.html#manage-you...

[1] https://docs.gitlab.com/ee/user/project/repository/reducing_...

[2] https://docs.gitlab.com/ee/topics/git/lfs/

[3] https://docs.gitlab.com/ee/user/packages/generic_packages/


Larger heavily contributed to repositories can be quite large, even if they are just plain code. Sure, not quite 5 GB large, but still.

Linux:

  $ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  $ du -h linux/.git
  ...
  2.8G    linux/.git
Git:

  $ git clone git://git.kernel.org/pub/scm/git/git.git
  $ du -h git/.git
  ...
  120M    git/.git
DefinitelyTyped:

  $ git clone https://github.com/DefinitelyTyped/DefinitelyTyped
  $ du -h DefinitelyTyped/.git
  850M    DefinitelyTyped/.git
home-assistant/core:

  $ git clone https://github.com/home-assistant/core
  $ du -h DefinitelyTyped/.git
  380M    core/.git

The above is also only taking into account repo size, while the GitLab limit applies to everything including release artifacts, CI build artifacts, hosted containers, etc. Many of which GitLab currently provides poor or sometimes even zero support for implementing purging.


I don’t think using the Linux kernel, which is the OG repo that git was created for, is a fair comparison. Though, I guess since it’s barely 3GB, it’s proof that you almost never need >5GB.


It's not just Git. Gitlab hosts a container registry for every repo, so if you build and store a new container image regularly then you could easily top 5GB.


Build react hello world and you’ve got gigabyte docker.


I have recently migrated my projects to fossil (https://fossil-scm.org), hosting them myself on a cheap vps.

I also really got hooked on their code/project management philosophy, rethinking a lot on how I want to run collaborative projects in general


Personally I use codeberg[1] which doesn't have all the fancy bells and whistles but for personal projects it runs like butter.

[1] https://codeberg.org/


Git works perfectly fine without a web interface like that. It's not popular nowadays, but you can just keep the primary repository on your own server and merge patches by email.


>So, we either go to Github, where our licenses are abused for their shitty M

As long as it's on the internet MS can scrape the data so not being on Github is only a temporary defense.


But GitHub's TOS can only protect them from GitHub repos.

If it has impact on CoPilot or not, is not clear.


Except not all contributors to projects signed the TOS since you can upload an existing project with existing non-Github contributors. One person uploading a project does not override or change the license of that project. So MS is already not relying on the TOS for legal protection which means they feel legally they can train on any OSS projects.


Well, at least in the US training an AI will probably fall under Fair Use. In the EU there is an explicit copyright exception for data mining. So I don't think there's a legal obligation for Microsoft to only train within the bounds of public GitHub repos.


> Well, at least in the US training an AI will probably fall under Fair Use.

Provoking question: so, will it be Ok to train an AI on leaked Microsoft code the publish the models (not the code)?

Of course, MSFT won't accept this, but w know that big corps are hypocrite by default.


AI fair use is not really clear anywhere.

I think SFconservancy's articles about CoPilot are very helpful.


Hi there, in another reply I mentioned GitLab for Open Source provides Ultimate features and higher limits to Open Source projects for free: https://news.ycombinator.com/item?id=32387621


Yes, and even the Ultimate plan limits will be significantly impacted. If they are not already self hosting now is a good time to look into it or explore other lightweight options like Gitea.


Gitlab just can't shooting themselves in the foot.

Driving away individuals is apparently their strategy now.

Sad. I used to teach new developers starting with Gitlab pages.


I'm just going to move to GitHub at this point.


Yup its looking to be the only platform where projects can realisticly grow.

But after CoPilot I still dont want to use it, on the contrary, I'm switching to GitLab because all the free stuff on GitHub are just unrealistic, and are just to play the long game.

Also I dont want to invest my time and effort into a proprietary platform, like GitHub Actions, and Gitlab CI and most other core features of GitLab are Open Source.

https://sfconservancy.org/blog/2022/jun/30/give-up-github-la...

If GitLab didn't add the 5 person limit per group most people will be fine, but now with the upcoming bandwidth limit, the 5 person limit, and many others.

Its almost impossible to grow on the platform Excexpt with the OSS program, which requires Lawful agreement with each org member.


I love GitHub Copilot. In fact, I am more inclined to use GitHub simply to help improve their tool.


Reading between lines it also says it going to enforce a 10GB limit on Paid tiers.

> Namespaces on a GitLab SaaS paid tier (Premium and Ultimate) have a storage limit on their project repositories. A project’s repository has a storage quota of 10 GB.

Even it's not mentioned as a change nor in the timeline, but that limit does not exist currently.


This change is one of the only MRs about these new limits discussed in the open.

https://gitlab.com/gitlab-org/gitlab/-/merge_requests/91418


Just wanted to point out that it looks like people will be able to go over that limit but will have to pay extra for it according to https://docs.gitlab.com/ee/subscriptions/gitlab_com/index.ht... . Seems somewhat reasonable.


Does the storage measurement work now? The container registry and build artifacts used to not be measured


Yes, the container registry and build artifacts are now measured for Free users.


Mine is still showing negative usage for some repos. The most negative is showing -1.08 GB, both from my profile and from the project page.

Repo 130.82 MB Artifacts 12.74 MB Wiki 51.20 KiB Everything else 0 bytes.

Been like this for at least a couple weeks.

This one is public, but I have others that are private with negative values.


Looks like the issue is tracked in https://gitlab.com/gitlab-org/gitlab/-/issues/368326

Example of a public project with negative project storage: https://gitlab.com/strivinglife/book-raspberry-pi


https://gitlab.com/nbdkit/nbdkit/-/usage_quotas

The headline number is 1.1G but the container registry is 16G.

Edit: You should definitely add a simpler way to delete old pipelines. Having to mnnage it yourself through the API is a pain. (https://stackoverflow.com/questions/71513286/clean-up-histor...)


Bitbucket anyone? Or Atlassian is no longer an option?


How much did data hoarders use Gitlab for their backups?


Why are people checking build artefacts into the repo?


Playing devil's advocate: sometimes your build is non-reproducible, annoying to build, or both (for instance, needing a proprietary tool which can only run on a particular developer's laptop because the license is tied to that particular hardware, and which crashes half of the time for no particular reason). Keeping the build artifacts in the repository means you can reproducibly obtain that exact artifact, even years into the future.


This limitation is not just about code it will also affect all artifacts like container and package registry, pipeline artifacts, lfs, the lot.


Gitlab keeps giving me more reason to go to Gittea.


I stay on:

https://sourcehut.org/

No blingbling no social-media drama, just clean straightforward code hosting.


Fair enough.


[flagged]


I'm confused at the connection between Microsoft/Github and the decision that Gitlab made here.


I think GP just rushed the free hate


You are confusing Gitlab with GitHub?


But this is related to GitLab…


Haha, great.


With this change, the 5 user limit[1], and original intent to delete dormant repositories[2][3], it seems as though GitLab is no longer able to support the free side of its business. GitLab has been touted as more OSS-friendly than GitHub, but a large part of the OSS ecosystem depends on free repositories. With these changes and this trajectory, I can't see myself putting another OSS project on GitLab.

It's a shame it's come to this, but I'm confident GitLab didn't make this choice lightly. It must be done in order for them to stay afloat.

Thank you GitLab team for your efforts. I hope you guys are successful in your future endeavors.

[1] https://about.gitlab.com/blog/2022/03/24/efficient-free-tier...

[2] https://www.theregister.com/2022/08/04/gitlab_data_retention...

[3] https://www.theregister.com/2022/08/05/gitlab_reverses_delet...


I think a lot of the problem is how they communicated it. It isn't like "Well hey we have a problem here", it's just "Hey here's this super complicated to this problem you didn't know existed and you can take it or leave it"




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: