A one-line change decreased our build times by 99%

JOnAgain · on Oct 26, 2020

I think it takes some real humility to post this. No doubt someone will follow up with an “of course...” or “if you don’t understand the tech you use...” comment.

But thank you for this. It takes a bit of courage to point out you’ve been doing something grotesquely inefficient for years and years.

projektfu · on Oct 26, 2020

I'd be interested to know how they came to realize what was missing. Did they read the Jenkins docs more thoroughly? Post on a mailing list? See something on StackOverflow? Hire a consultant?

pushrax · on Oct 26, 2020

I would expect that internally someone profiled the build (i.e. looked at timestamps) and then either profiled git, or just looked at the logs and did some guessing/research. This didn't seem like it would be complicated to find once you realize the time is spent in git.

Also, this probably has been an exponentially increasing problem, and wasn't really a priority to solve until relatively recently. I would bet there are a lot of stale undeleted branches.

marta_morena_28 · on Oct 26, 2020

It really doesn't sound complicated to find, unless you just have a handsoff approach to building things and just don't care as long as "something" comes out on the other end.

What makes me wonder however is this: 40 min made them look into this? I mean 40 min is crazy long. What builds this long? Chrome, Windows, Linux Kernel on a single core? This should have been raising red flags much earlier. The only explanation I can come up with is that the whole build takes hours anyway, otherwise there is no way you wouldn't notice this sooner.

detaro · on Oct 26, 2020

Remember "build" time likely also includes test runs, packaging for deployment, ... 40 mins is easy to get to, and has nothing to do with "on a single core".

sumedh · on Oct 26, 2020

You can still see which stages are taking a long time.

jeromenerf · on Oct 26, 2020

I wonder if people went into the habit of synchronizing git pushes with socializing breaks, with the proverbial excuse of "yeah, compiling ...".

On day, someone forgot to brief the foreign intern about the necessity of breaks, intern fixes the issue, pointy-hair-boss gets wind of the news, old crew gets fired, new intern gets promoted and fixes also the Pinterest spam on google images.

znpy · on Oct 26, 2020

> new intern gets promoted and fixes also the Pinterest spam on google images

A man can dream.

pushrax · on Oct 26, 2020

It takes >12h to build Windows on the MS build platform.

On a single core, Chromium surely takes hours to build.

Though I agree that 40min for the repository in question is highly suspect.

masklinn · on Oct 26, 2020

> On a single core, Chromium surely takes hours to build.

Earlier this year Bruce Dawson had a post indicating that it took about a CPU-day, though coalescing files (“jumbo builds”) significantly reduces build time (we’re talking down to 5h), however that’s at the expense of incremental building, and it constrains the code as you can get symbol collisions between the coalesced files.

Dylan16807 · on Oct 26, 2020

I read that comment as only applying the "single core" part to the Linux kernel, not Chromium.

maccard · on Oct 26, 2020

I work in games, a clean sync of our project takes well over an hour (and if you're at home it takes multiple hours), and compiling takes a large amount of time. We use lots of tricks - "unity" builds with out build system detecting modified files and compiling them standalone as an example. On my last workstation (2x intel xeon gold) it took about 30 minutes to compile

oefrha · on Oct 26, 2020

Yep, I’m shocked that it had to be bloated to 40min before they even thought about fixing it. Anyone who has used Jenkins for nontrivial builds must have had the experience of staring at the slowly expanding session log screen? It doesn’t take any “profiling” to realize git clone’s taking forever.

ImprovedSilence · on Oct 27, 2020

I remember hearing facebook was on the order of hours (6+) to build.

jeromenerf · on Oct 26, 2020

I would bet on "the new recruit found it, 45 minutes in in the onboarding process".

conradludgate · on Oct 26, 2020

My first task at my current job, after familiarising myself with the codebase was to improve the CI pipeline

Aloha · on Oct 26, 2020

I think everyone has done this, sometimes it really does take a second set of eyes

ric2b · on Nov 5, 2020

In my experience it only takes someone who is not drowning in a backlog of tasks yet.

That tends to be the beginners during the onboarding weeks.

koliber · on Oct 30, 2020

The beauty and power of the beginner's mind.

It can see things that were there all along, but everyone who has been there has developed a blindness to.

bluedino · on Oct 26, 2020

I see this kind of stuff at companies where only one or two developers work on a project, or the team working on it hasn't had much experience on other projects.

An example would be a company I worked for who ran a pretty standard LAMP setup but had never heard of memcached. Simply adding that reduced the database load by like 90%.

aeyes · on Oct 26, 2020

The same way it always happens to me: Over time build time creeps up and you don't really look at this code thinking "I implemented shallow clone years ago so there is nothing I can do, it's slower because we have more code."

Until you or some other person looks what the code is doing.

It could also be that it was a new hire. I shallow clone a huge monorepo similar in commits/branches and it takes seconds. My experience would instantly tell me that something is worth looking into.

hashkb · on Oct 26, 2020

[flagged]

jedberg · on Oct 26, 2020

Why are you so angry about this? You've commented throughout this post about how this is boring and the Pinterest team is incompetent. Why?

I found it quite interesting. I've been working in deployments for over 20 years at some pretty big places, and never really though about this before. I now have a new tool in my toolbox, and I'm quite happy about it.

hashkb · on Oct 26, 2020

In general I'm frustrated that rigor, standards, etc are out the window in favor of all this warm fuzziness. I guess I might be angry...

The culture change in our industry, towards warm fuzzies and away from tech screens, results in calculable waste. Time, money, electricity, customers. We lose good engineers and tell ourselves they were a bad fit. We push crap on users just to sell ads. Then we write engineering posts to brag about fixing our own mistakes. It's a terrible shame and I speak up about it to remind everyone there was a time when RTFM would be the only response to this.

Edit: rate limited but one last thing: are we this forgiving of Equifax when they oopsie our data? Seeing this would immediately make me wonder if anything I have shared with Pinterest is safe. That's why they owe us a postmortem and not a thirst trap.

kokada · on Oct 26, 2020

> are we this forgiving of Equifax when they oopsie our data?

The kind of culture you're in favor for, blaming engineer for mistakes and punishing them, is exactly what makes the kinda of Equifax mistake possible. Suddenly, people stop to improve things and just do the minimum possible so they can keep their job, since anything else can cause a mistake that will cost your next performance cycle (or even worse, your job).

stjohnswarts · on Oct 26, 2020

I know I have certainly left cans of worms close various companies because I knew there was only risk and no reward even though I would have loved to tackle the problem. Fix a major problem working late and weekends and get an attaboy at the weekly conference. Do a good job but introduce a bug that is relatively minor but causes a slight delay in deployment and get on the manager's shit list for the rest of your time there.

hashkb · on Oct 26, 2020

I'm talking about blaming and punishing management, not engineers. I'm sorry that wasn't clear.

phist_mcgee · on Oct 26, 2020

1. You seem to be projecting a lot of your own thoughts and biases onto this article, and this discussion.

2. RTFM made sense when you wrote code in one terminal window and compiled it in another and then shipped a CD. There is no way you can RTFM for every tool you use at a modern software company. NO ifs ands or buts, it just is not happening. It sounds like you are looking at a nostalgic view of the past, and not understanding the context of the scope and scale of software that is built these days.

3. Engineers should be encouraged to share their learnings with each other to collectively 'raise the tide'. I will never pooh pooh a development team wanting to share their learnings, even if you may or may not think it was a good idea, it may have helped their team or someone reading the article.

4. We've been pushing crap to sell ads since advertising began, grow up and take a longer look, the technology has complicated it, but it's still the same as it always was.

unishark · on Oct 26, 2020

I personally do think there is a systemic problem in companies trying to hire the bare minimum in skills/experience for technical roles and ending up with people operating right at the limit of their abilities (and intermittently being in over their heads).

But do I agree RTFM is frustrating advice for a lot of things. Especially if you aren't going to use such information often enough, so will keep forgetting it anyway and have to start over with the manual each time.

corpMaverick · on Oct 26, 2020

> But do I agree RTFM is frustrating advice for a lot of things. Especially if you aren't going to use such information often enough, so will keep forgetting it anyway and have to start over with the manual each time.

This is related to one of biggest frustrations in the last ten years. We used to master the tool set. But now there are so many tools. Some of them we don't used them often enough. I find myself doing a lot of guessing whereas before I knew what was going on at every step.

pavel_lishin · on Oct 26, 2020

> The culture change in our industry, towards warm fuzzies and away from tech screens, results in calculable waste.

I have not noticed, in the past decade, any move away from tech screens whatsoever.

uglycoyote · on Oct 26, 2020

Are we talking about technological screens, like LCD screens here?

xdavidliu · on Oct 26, 2020

I believe they mean screens in the sense of technical interviews to assess candidates' technical ability.

Aeolun · on Oct 26, 2020

I don’t know, I consider myself fairly competent but I’d never even considered that. It’s just not so relevant until your repo is multiple gigabytes big.

Still, I’ll see if it works for our pipelines, and we can get our clone from 20s to 1s

hashkb · on Oct 26, 2020

Good teams profile everything. This team's only goal is to support other engineers. Build time is a huge issue for every ops team. Missing this for so long is wasted money that's easy to calculate. We can be nice to people while still having high standards. It's a missed opportunity for a deeper postmortem, and it's bland content at best.

jacquesm · on Oct 26, 2020

I think you live in a highly theoretical parallel universe. The one I occupy is the one where 'good teams' profile those things that take too long.

Take yourself as an example: in spite of the wide availability of free certificates you are still hosting your domain without using a secure transport layer. Some would take that as incompetence. Others would assume you have more stuff on your plate rather than that you don't have high standards.

sdoering · on Oct 26, 2020

And others would assume, that there are other philosophies out there regarding ssl everywhere. So the question is, who's POV has more validity and logical rigor attached to it. I actually can't see any side winning here on a purely logical level. Only on an ideological level. At least as long as we are talking about consuming public information.

Am I in favor of the aggressiveness of OP in other posts? No. Am I using SSL myself. Hell yes.

Nonetheless, I understand that there are people who feel that consuming public information like on a private homepage is nothing that necessitates using SSL. Even if I myself have a different ideology/value set governing my decision.

I once heard the comparison that it is like the difference of sending a letter and sending a picture postcard. Not sure if I buy into that, but I can't argue against it on a purely rational basis.

hashkb · on Nov 3, 2020

Yes, it's by choice. It's read only, public information. We don't set cookies or anything.

We take security very seriously. But we don't take anything too seriously.

Edit: by the way, persuade me that there's an upside and I'll turn it on.

dmurray · on Oct 26, 2020

> The one I occupy is the one where 'good teams' profile those things that take too long.

Deciding which things take too long is profiling. Maybe you do it in your head or with pencil and paper instead of using a software approach but I think your position aligns with "good teams profile everything".

argc · on Oct 26, 2020

This is neither incompetence nor surprising. Maybe you’ve only worked at large companies who have had time to optimize things for years (and even then, I see grotesque software decisions at my large company quite often). Try accepting that software is often written poorly optimized on the first pass, for good reason, and learn to celebrate the wins without needing to shame someone.

hashkb · on Oct 26, 2020

This is Pinterest. Every org I've worked at has been smaller. There's space between shame and ignoring mistakes.

The purpose of this post is not to educate. There's nothing in here that anyone can use to improve. It's just marketing.

phist_mcgee · on Oct 26, 2020

Well I learned something from this article, and I thought I had a good handle on CI/CD, so either I am incredibly stupid and shouldn't be reading these 'nothing' articles, or maybe there is so much to learn it's impossible to know it all.

sytringy05 · on Oct 26, 2020

If you dont have massive repos, then this is the sort of thing that not often a big problem. Also - if you are using things like gitlab runners, you might be in the same AZ and even large repos clones are fast.

And it is impossible to know it all, I like these articles just for the differing ways people work

phist_mcgee · on Oct 26, 2020

It sounds like GP has a bit of a chip on their shoulder when it comes to feeling like you're not worthy if you don't know everything.

stjohnswarts · on Oct 26, 2020

Are you just an angry ex Pinterest employee? There is something to learn here and a reminder to pay more attention even when you're knee deep in other tasks. Besides the obvious feature/limitation of git that the author points out.

kevinventullo · on Oct 26, 2020

A team discovers a major efficiency win requiring minimal engineering effort and your response is to... punish them?

swsieber · on Oct 26, 2020

People praise my git skills at work (among other things) when they come to me for git help.

My response is always the same: I've just run into these bugs more often than they.

Hnrobert42 · on Oct 26, 2020

Shocking? Incompetent? A hiring postmortem? Really? AYFKM?

ponker · on Oct 26, 2020

They did a billion dollars of revenue last year, their management and hiring systems seem to be getting the job done.

sk5t · on Oct 26, 2020

Yep--although multiple unpleasant experiences with pinterest have spurred me to permaban it from search engine results and smite it with network filters, somewhat wasteful CI/CD pipelines have clearly not prevented the company from flourishing.

bald42 · on Oct 26, 2020

I don't get why they have to clone their repo frequently in the first place - seems to me as a brute force usage of a version control system prone to high cost in the first place.

rightbyte · on Oct 26, 2020

It is a nice and fool proof way to get a clean working environment to just download everything from nothing. And you want different working folder for different jobs anyway so they don't mess with eachother or build of state between jobs due to scripting messups.

ForHackernews · on Oct 26, 2020

I don't know about a big org like Pinterest, but it's pretty common for "clone the repo" to be the first step of a CI/CD pipeline when using something like CircleCI or GitlabCI.

It's an easy (if inefficient) way to always get the latest changes and if you have disposable build-runners then it all gets thrown away at the end of the pipeline.

DougBTX · on Oct 26, 2020

It is interesting that we trust our tools so little. A git hash is a pretty robust way to know whether the code in the repo is what it is supposed to be, so a "git fetch" rather than a fresh "git clone" should be safe, but we can't trust the build steps to not trash the build-runner so the entire thing needs to be thrown away.

Edit: for context, I wrote this comment while waiting for `npm ci` to run. Its first step is to delete the node_modules folder, as otherwise it can't be trusted to update correctly.

ForHackernews · on Oct 26, 2020

> we can't trust the build steps to not trash the build-runner so the entire thing needs to be thrown away.

I think it's partly this, and partly that everything is shared infrastructure now. I don't want to pay to keep a machine up 24/7 just to use it to run a build for 10 minutes half a dozen times per day.

So instead I lease time on shared hardware with ephemeral "containers" or "virtual machines" or whatever.

user5994461 · on Oct 26, 2020

Jenkins has a setting to keep the checkout directory (default) or to clear the directory between builds.

At last job, the default was letting broken changes pass the build, they break some step of the setup/run process that's not run on a partial build. New joiners came in and they couldn't build because the build was broken.

Had to fix it by setting up two jobs, one running from scratch (30 minutes) and one incremental (10 minutes). The build from scratch was catching a broken change or two every week.

mschuster91 · on Oct 26, 2020

Ephemeral CI runners. I have the same problem at work - 4GB repository that is redownloaded on every single pipeline run.

Another reason (which is why we went for ephemeral runners in the first place...) is that if you have stuff that mounts a directory from the repository directory as a volume in a Docker container (e.g. for processing data), you may end up with the Docker container frying permissions in the repo folder (e.g. 0:0 owned files). Now, you can put a cleanup step as part of the CI (=docker run --rm -v $(pwd):/mnt sh -c 'chown -R $runner_uid:$runner_gid)... but unfortunately, Gitlab does not allow a "finally" step that always gets run, so in case the processing fails, the build gets aborted, the server hosting the runner crashes, ... anything happens, the permissions will be fried, and a sysadmin will need to manually intervene.

An ephemeral runner using docker:dind however? It simply gets removed.

mytailorisrich · on Oct 26, 2020

In order to start with a clean slate and to guarantee state and absence of artefacts from previous builds/pulls it is common practice to start off with a clean directory.

segfaultbuserr · on Oct 26, 2020

Better title: A one-line change decreased our "git clone" times by 99%.

It's a bit misleading to use "build time" to describe this improvement, as it makes people think about build systems, compilers, header files, or cache. On the other hand, the alternative title is descriptive and helpful to all developers, not only just builders - people who simply need to clone a branch from a large repository can benefit from this tip as well.

ma2rten · on Oct 26, 2020

Right, from the article:

"This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result."

So the title is just completely wrong.

elwell · on Oct 26, 2020

There's also this part of the article:

"We found that setting the refspec option during git fetch reduced our build times by 99%."

So, the article contains contradictions.

CGamesPlay · on Oct 26, 2020

They set out to reduce build times, not to reduce git checkout times. It turns out that 99% of the entire build was spent downloading code.

Thorrez · on Oct 26, 2020

Where does the article say "99% of the entire build was spent downloading code"?

Dylan16807 · on Oct 26, 2020

The title. If they reduced the build time by that much, then at least that much of the build time must have been spent downloading code.

If the title is a lie (which it probably is), then nevermind that number, but it's clear where it came from.

mcherm · on Oct 26, 2020

The text of the article clearly states that clone time was reduced by 99%.

The only way build time could have been reduced by 99% is if every part of the build other than cloning is negligible. It is far more plausible to assume that the title is simply wrong.

colechristensen · on Oct 26, 2020

It quotes a jenkins job going from 40 minutes to 30 seconds.

shawabawa3 · on Oct 26, 2020

They say "Cloning our largest repo, Pinboard, went from 40 minutes to 30 seconds"

Presumably the build does more than just clone

lytedev · on Oct 26, 2020

This isn't true either, as the article says that builds went from 40 minutes to 30 minutes. The time spent cloning was presumably about 10 minutes and came down very far, presumably by 99%.

Thorrez · on Oct 26, 2020

> the article says that builds went from 40 minutes to 30 minutes.

Where in the article does it say that? The article says this:

> This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result. Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds.

Both of those sentences say the clone time was reduced by 99%. There are percentage numbers given for how much the build time was reduced, nor any numbers about the total build time.

phildenhoff · on Oct 26, 2020

It says from 40 minutes to 30 seconds, not minutes.

lytedev · on Oct 26, 2020

I stand quite corrected. Sorry, all!

dada78641 · on Oct 26, 2020

This reminds me of my first programming job in 2005, working with Macromedia Flash. They had one other Flash programmer who only worked there every once in a while because he was actually studying in college, and he was working on some kind of project from hell that, among other problems, took about two minutes to build to SWF.

Eventually they stopped asking him to come because he couldn't get anything done, and so I had a look at it. In the Movie Clip library of the project I found he had an empty text field somewhere that was configured to include a copy of almost the entire Unicode range, including thousands of CJK characters, so each time you built the SWF it would collect and compress numerous different scripts from different fonts as vectors for use by the program. And it wasn't even being used by anything.

Once I removed that one empty text field, builds went down to about ~3 seconds.

tetris11 · on Oct 26, 2020

I take it that this is not something he added himself, but was likely a catch-all default of textfields at the time?

omegote · on Oct 26, 2020

Yep. In order to use non-standard fonts in Flash I recall you had to embed the fonts, even if the movie clip containing the textfield was not being used anywhere.

dusted · on Oct 26, 2020

This is the most I've ever gotten out of pinterest, other than this, it's just the "wrong site that google turns up, that I can't use because it wants me to create an account just to watch the image I searched for"

saagarjha · on Oct 26, 2020

Can we not do the thing where we pick an organization from an article and then bring up the most generic complaint you can about it in a way that is entirely irrelevant to the post? We get it, you don't like Pinterest showing up in search results, nobody does. But this has absolutely nothing to do with the article other than it being pattern matching on the word "Pinterest", which is about the least informative comment you can make aside from outright trolling or spam. There are threads that come up from time to time where such comments would be appropriate, if not particularly substantive.

dusted · on Oct 26, 2020

I guess you're right. I've not noticed this being a topic before, and I should have spent more words telling that the article in question is actually quite interesting, it definitely made me consider our own Jenkins setup.

saagarjha · on Oct 26, 2020

Thanks :) I don't want to make it seem like I'm after you in particular, it's just that you were the top comment in this thread and it's that time of night when I should logged off and gone to bed is long past, so my patience for this was just a little thinner than it usually is. It's just that enough people have done this that I figured I might as well steal the second-to-top comment spot with this in the hopes that they might see it and not do it anymore.

amelius · on Oct 26, 2020

So if Monsanto/Bayer had a post about their bio informatics stack, you'd expect nobody to complain about the company and its business practices?

Sometimes the negative impact of a company is just more interesting to people than what the article brings to the table.

inopinatus · on Oct 26, 2020

It’s not surprising when certain firms evoke a strong personal feeling, but it’d be terribly exhausting if every article about, say, React, attracted the annotation that Facebook is the Philip Morris of media. The subsequent discussion then tends toward the divisive and derisive rather than the illuminating and informative. Hard to tell anyone they should suppress what they feel, but overall I’d tip the balance towards “fewer like this please.”

krageon · on Oct 26, 2020

I think it's valuable to keep saying it because otherwise we start thinking it's okay to fetishise a company's products just because they're technologically interesting. If a company made them on the back of incredibly shady and unethical dealings, they shouldn't be getting free advertising here.

1123581321 · on Oct 26, 2020

Who here is fetishizing products because they learned something from the engineering blog? This is not happening.

saagarjha · on Oct 26, 2020

I wouldn't expect it, because I have been here long enough to know that that is just not going to happen, but I would very much like it to be so, yes. Rehashing the same topic whenever you see something tangentially related is just a lazy karma grab, not an attempt at creating interesting, insightful conversation.

Look, I get it, sometimes you want to rant about a company that you think is doing something you don't like: my point is that we have specific threads for them where such a comment could at least be on-topic. When you come to an article that about Pinterest doing some git thing to make their builds faster and your comment is "they're ruining my search!" you're commenting at the level of someone who hasn't read the blog post.

read_if_gay_ · on Oct 26, 2020

The point is it’s not directly relevant to the article, and on top of that GP’s particular complaint was especially generic. In this case Pinterests negative impact is not that interesting and it’s constantly discussed too.

nabaraz · on Oct 26, 2020

HN has always been very predictive.

Praise Microsoft for turning the corner, Dislike Google for ads and snooping, Praise Apple for privacy, Dislike Zoom for privacy, Dislike Pinterest for middlewaring Google Image, and so on.

saagarjha · on Oct 26, 2020

I'm not even complaining about Hacker News being predictive, we all know that likes to have certain conversations and there is no stopping that. My only request is that this doesn't happen in every single thread regardless of whether it is relevant or not. (To be clear, I am "guilty" of the former myself; there are a handful of topics that I have a particular opinion about and I don't hesitate to share them even if I have mentioned them many times before. I just try to not bring them up in places where they clearly have no connection to what's being talked about.)

dctoedt · on Oct 26, 2020

Friendly amendment: *predictable

alkonaut · on Oct 26, 2020

Sorry no. If an article is paywalled, on Pinterest, or similar, then please let's discuss the source instead, even if it ruins the discussion, so people learn not to post such links.

fenomas · on Oct 26, 2020

TFA isn't paywalled, or on Pinterest, or similar.

saagarjha · on Oct 26, 2020

Paywall complaints are explicitly off-topic: https://news.ycombinator.com/item?id=10178989. I am not a moderator, but I think I've made it clear that I personally consider comments like the one I responded to be as well.

FWIW, in the all years I have been on this site, I have seen this happen regularly and I have yet to see any reduction in such links or these kinds of discussions. Seeing as you've been here longer, I'd be curious to hear about why you might feel differently.

kdemy · on Oct 26, 2020

We heard your complaint but you are being acting entitled now. People are free to register, free to comment, if you don't like it, downvote it. It is the top comment, that means it is being upvoted. get over it.

saagarjha · on Oct 26, 2020

I tend to downvote very rarely and only for clear violations for the rules, not for comments I don't like. Telling the author why you didn't like something they did often gets them to change or explain their behavior. Just because something is upvoted doesn't mean it is something that should be on Hacker News.

alkonaut · on Oct 26, 2020

I just don’t mind repeating myself whether it changes anything or not I guess. Simply because discussing paywalled links or Pinterest linked is invariably more interesting than whatever is found (or not found) when following those links.

csunbird · on Oct 26, 2020

I am not sure why google does not penalize this behavior in their search ranking.

randunel · on Oct 26, 2020

The most frequent search keyword that I use is "-pinterest"

amelius · on Oct 26, 2020

Yes, there seems to be no way to make it clear to Google that we want to never see certain websites in our search results. Yet, Google claims they need our information to "improve our experience".

mcv · on Oct 26, 2020

If Google wants my information to improve my experience, I'd love to be able to vote search results up or down. Or entire sites, like pinterest and content farms.

hadrien01 · on Oct 26, 2020

Wasn't that a thing? I remember a +1 button somewhere on Google Search

Edit: I misremembered, it was a social network thing from Google+ https://www.techspot.com/news/43064-google-adds-1-button-to-...

rachelbythebay · on Oct 26, 2020

There was a thing called SearchWiki where you could adjust your own results. It didn't last long.

hnlmorg · on Oct 26, 2020

For what it's worth, DDG image results doesn't get spammed by Pinterest. While my browsing is a drop in the ocean compared to Googles market share, using a Google competitor is as clear a signal as one can send that you're unhappy with the Google service.

andromeduck · on Oct 26, 2020

-pintrest should be a search extension.

bufferoverflow · on Oct 26, 2020

That's my experience too. Imagine how many views they have lost over the years, just because they require a login.

And shame on you, Google, for playing along and indexing their shit, when it's not visible when I click through.

some_furry · on Oct 26, 2020

This fact has forced people to write browser extensions to filter Pinterest out.

I opt for the "teach non-tech people how to dork" route instead: https://soatok.blog/2020/07/21/dorking-your-way-to-search-re...

jeromenerf · on Oct 26, 2020

This is one situation where a duckduckgo search is objectively of a better signal/noise ratio.

kdemy · on Oct 26, 2020

Yeah i always believed it was some kind of lone evil ai that lives through search results.

segfaultbuserr · on Oct 26, 2020

The worse experience is when you have found a dead link that contains useful information that exists only as Pinterest snapshot while doing a web search...

Joker_vD · on Oct 26, 2020

Y'know, I actually made a Pinterest account once because of one particular picture I really wanted. Guess what, even with an account you can't have it. Oh well, guess I'll just let it go.

syncsynchalt · on Oct 26, 2020

They also created/maintain the kotlin linter, "ktlint".

mcv · on Oct 26, 2020

On my first job, 20 years ago, we used a custom Visual C framework that generated one huge .h file that connected all sorts of stuff together. Amongst other things, that .h file contained a list of 10,000 const uints, which were included in every file, and compiled in every file. Compiling that project took hours. At some point I wrote a script that changed all those const uints to #define, which cut our build time to a much more manageable half hour.

Project lead called it the biggest productivity improvement in the project; now we could build over lunch instead of over the weekend.

If there's a step in your build pipeline that takes an unreasonable amount of time, it's worth checking why. In my current project, the slowest part of our build pipeline is the Cypress tests. (They're also the most unreliable part.)

ravishi · on Oct 26, 2020

At my second job in the industry I worked on a Python project that had to be deployed in a kind of sandboxed production environment where we had no internet access.

Deploys were painful, as any missing dependency had to be searched in our notebooks over 3G, then copied to an external storage, then plugged into a Windows machine, uploaded to the production server through SCP and then deployed manually over SSH. Sometimes we spent hours doing this again and again until all dependencies were finally resolved.

I worked there for almost a year, did many cool gigs and learned a lot. But my most valuable contribution came when at some point, tired of that unpredictable torture that were the deploys, started researching into solutions. I set up a pypi proxy into one of our spare office machines and routed all my daily package installs through that. Then I copied that entire proxy content into the production machine before every deploy, and voila, no more surprises.

I left this job a few weeks later, but have heard that this solution was very useful for many devs that joined the team afterwards.

greesil · on Oct 26, 2020

I suppose no Docker containers were allowed in prod either?

ravishi · on Oct 29, 2020

Of course not. That was before docker, circa 2010. Our production environment was impossible to recreate.

renke1 · on Oct 26, 2020

> If there's a step in your build pipeline that takes an unreasonable amount of time, it's worth checking why. In my current project, the slowest part of our build pipeline is the Cypress tests. (They're also the most unreliable part.)

Would you say the (slow and unreliable) Cypress tests are worth it still?

mcv · on Oct 26, 2020

I don't know. We need some sort of e2e tests, and all e2e test frameworks are terrible in one way or another. Cypress is okay. I would prefer to only run it on production or the dev server and have alarms go off when they fail, but either the requirement is, or other developers have decided that it's necessary to pass all e2e tests before a feature branch can be merged into the master branch.

And I get the reason for it; you don't want to accidentally merge breaking changes. But it does make our build pipelines very slow and unreliable.

So are they worth it? I don't know. If I had my way, we'd only run them on master, and not make it a requirement for feature branches to pass them. Because if you fix one tiny thing, you now have to wait 15 minutes again for the Cypress tests to run. I think they'd be better in a different setup than what we're doing.

pavon · on Oct 26, 2020

We had similar issues with integration tests, and made them a separate jenkins job that didn't trigger automatically, but gitlab was still configured to require them to pass for merge. We would kick it off manually only after all other code review was complete. Then the only cases where we had to re-run it were the same cases where it would have failed in master if we only ran the test there, but it saved us the hassle of reverting or feeling pressured to get hotfixes into master quickly.

smaps · on Oct 27, 2020

Check out https://reflect.run/ as a replacement for Cypress. I started using it recently to do E2E testing at work in our staging environment to run a suite of tests before we move anything to production.

So far it's been great and has saved a couple of releases in a month or so of use!

dzhiurgis · on Oct 26, 2020

That's the nature of UI tests for most part. IIRC Cypress are written declarative tools which would make them even more unreliable and slow, albeit easier to fix.

Personally I've recently started using Playwright and I'm quite happy with it. There was occasional misunderstanding of their API, but 95% of time it's great. Microsoft is kicking butt these days.

holtalanm · on Oct 26, 2020

Cypress is horribly unreliable. We used to use it, and tests would pass, then fail on subsequent runs with no code changes, due to internal bugs within Cypress screenshot plugins, if I remember right.

I have no idea if it is any better now, but we dropped it about 6 months ago in favor of pure Selenium C# for our UI tests.

edit: a word

holtalanm · on Oct 26, 2020

> In my current project, the slowest part of our build pipeline is the Cypress tests

Oh man, I feel your pain.

Cthulhu_ · on Oct 26, 2020

Personally I think longer tests (like a full Cypress run) should not be a boundary to merging in prod if they take more than 10 minutes, but should be run nightly or continuously in the background.

I've not yet had the opportunity of having a large Cypress suite (working on it as we speak), but is it still more stable than e.g. Selenium is? Honestly 80% of issues we had with that were 'unstable' tests.

mcv · on Oct 26, 2020

Exactly. I would much prefer a setup like that over our current rule that all cypress tests must pass before merging.

A better rule might be that at least one unit or e2e test was added or updated to reflect the change in the code, and that that particular test succeeds. But run all the others on master.

One advantage (or occasional disadvantage) of Cypress test before merging, is that there is someone clearly responsible for fixing it if a test fails. Problem is, sometimes the failing test has nothing to do with anything the creator of the pull request did. It's still a mystery how that's possible, but it happens. Hence my feeling that Cypress tests aren't very reliable. At least some of ours aren't.

holtalanm · on Oct 26, 2020

Unfortunately, the issues we had with Cypress were with the framework itself, not the tests.

I used to write automation, and I can say that Selenium tests can be written to be very stable. Just depends on how they are written.

aidanhs · on Oct 26, 2020

I sympathise a lot with this post! Git cloning can be shockingly slow.

As a personal anecdote, clones of the Rust repository in CI used to be pretty slow, and on investigating we found out that one key problem was cloning the LLVM submodule (which Rust has a fork of).

In the end we put in place a hack to download the tar.gz of our LLVM repo from github and just copy it in place of the submodule, rather than cloning it. [0]

Also, as a counterpoint to some other comments in this thread - it's really easy to just shrug off CI getting slower. A few minutes here and there adds up. It was only because our CI would hard-fail after 3 hours that the infra team really started digging in (on this and other things) - had we left it, I suspect we might be at around 5 hours by now! Contributors want to do their work, not investigate "what does a git clone really do".

p.s. our first take on this was to have the submodules cloned and stored in the CI cache, then use the rather neat `--reference` flag [1] to grab objects from this local cache when initialising the submodule - incrementally updating the CI cache was way cheaper than recloning each time. Sadly the CI provider wasn't great at handling multi-GB caches, so we went with the approach outlined above.

[0] https://github.com/rust-lang/rust/blob/1.47.0/src/ci/init_re...

[1] https://github.com/rust-lang/rust/commit/0347ff58230af512c95...

bertr4nd · on Oct 26, 2020

> Contributors want to do their work, not investigate "what does a git clone really do".

Exactly this. Especially if the repo and CI pipeline are complicated, it is incredibly easy to just assume “it’s slow” is a fact of life.

And from the point of view of the dev-productivity team, well, they have tons of possible issues to deal with at any given time. Not just CI but the repos themselves, the build system, maybe IDEs, debuggers, ... Sure the fix ends up being easy but you have to know to go looking for it.

IggleSniggle · on Oct 26, 2020

When you’ve got a billion other tasks to do, you might even know that it could be orders of magnitude faster and still not fix it, simply because of higher priority work.

Frankly, I’d rather spend extra time trying to address problems/bugs/potential security holes in the actual shipped code than in fixing a poorly working CI pipeline...and I’m the kind of dev who gets really irritated by these problems. But you have to prioritize.

Basically, barring “external” forces like cost overflow, customer unhappiness, or similar...stuff like that gets fixed at an equilibrium point between how much the problem hurts the dev, how adjacent to the codebase the devs current work is, and how interesting/irritating the dev finds the problem.

auscompgeek · on Oct 26, 2020

Out of curiosity, why not use the submodule.<name>.shallow option in .gitmodules?

aidanhs · on Oct 30, 2020

Primarily because, until you mentioned it now, I wasn't even aware it was an option!

That said, I generally shy away from shallow clones and probably wouldn't use it here:

- it's a trap for people who ever want to work in that repo normally (we use the trick for more than just LLVM) - I believe shallow clones, over time (e.g. for contributors), are less nefficient than deep clones - I would expect shallow cloning to reuse fewer objects and benefit less from git's design. [0] describes a historic issue on this topic

[0] https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

sxp · on Oct 26, 2020

> Even though we’re telling Git to do a shallow clone, to not fetch any tags, and to fetch the last 50 commits ...

What is the reason for cloning 50 commits? Whenever I clone a repo off GitHub for a quick build and don't care about sending patches back, I always use --depth=1 to avoid any history or stale assets. Is there a reason to get more commits if you don't care about having a local copy of the history? Do automated build pipelines need more info?

dataflow · on Oct 26, 2020

Some tools (like linters) might need to look at the actual changes that occurred for various reasons, such as to avoid doing redundant work on unmodified files. To do that, you need all the merge bases... which can present a kind of a chicken-and-egg problem because, to figure this out with git, you need the commits to be there locally to begin with. I'm sure you can find a way around it if you put enough effort into scripting against the remote git server, but you might need to deal with git internals in the process, and it's kind of a pain compared to just cloning the whole repo.

Dylan16807 · on Oct 26, 2020

If you're interested in metadata, you can use --filter=blob:none to get the commit history but without any file contents.

dataflow · on Oct 26, 2020

Did not know, that's great, thanks! Seems this is a relatively recent feature?

MarkSweep · on Oct 26, 2020

I can’t speak for the original post, but I’ve seen other people[1] increase the commit count because part of the build process looks for a specific commit to checkout after cloning. If you have pull requests landing concurrently and you only clone the most recent commit, there is a race condition between when you queue the build with a specific commit id and when you start the clone.

All that being said, I don’t know why you would need you build agents to clone the whole damn repo for every build. Why not keep a copy around? That’s what TFS does.

One other thing I've seen to reduce the Git clone bottleneck is to clone from Git once, create a Git bundle from the clone, upload the bundle to cloud storage, and then have the subsequent steps use the bundle instead of cloning directly. See these two files for the .NET Runtime repo[2][3]. I assume they do this because the clone step is slow or unreliable and then the subsequent moving around of the bundle is faster and more reliable. It also makes every node get the exact same clone (they build on macOS, Windows, and Linux).

Lastly, be careful with the depth option when cloning. It causes a higher CPU burden on the remote. You can see this in the console output when the remote says it is compressing objects. And if you subsequently do a normaly fetch after a shallow clone, you can cause the server to do ever more work[4].

1: https://github.com/dotnet/runtime/pull/35109

2: https://github.com/dotnet/runtime/blob/693c1f05188330e270b01...

3: https://github.com/dotnet/runtime/blob/693c1f05188330e270b01...

4: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

pushrax · on Oct 26, 2020

Also worth noting that git is pretty efficient at cloning a bunch of subsequent commits, due to delta encoding.

edit: looks like git doesn't implement fetching thin packs when populating a shallow clone. It will still avoid fetching unnecessary packs, so the efficiency is still high for most software repositories.

senkora · on Oct 26, 2020

Does git do delta encoding during clones? I know it doesn’t use deltas for most things.

pushrax · on Oct 26, 2020

I am fairly sure it uses thin packs during a clone usually. Though I checked the docs at https://www.git-scm.com/docs/shallow, and it says:

> There are some unfinished ends of the whole shallow business:

> - maybe we have to force non-thin packs when fetching into a shallow repo (ATM they are forced non-thin).

globular-toast · on Oct 26, 2020

Tags. All of my builds use `git describe` to get a meaningful version number for the build.

AdamJacobMuller · on Oct 26, 2020

I expected this to be some micro-optimization of moving a thing from taking 10 seconds to 100ms.

> Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds.

This is both very impressive as well as very disheartening. If a process in my CI was taking 40 minutes I would be investigating sooner than a 40-minute delay.

I don't mean to throw shade on the pintrest engineering team, but, it speaks to an institutional complacency with things like this.

I'm sure everyone was happy when the clone took 1 second.

I doubt anyone noticed when the clone took 1 minute.

Someone probably started to notice when the clone took 5 minutes but didn't look.

Someone probably tried to fix it when the clone was taking 10 minutes and failed.

I wonder what 'institutional complacencies' we have. Problems we assume are unsolvable but are actually very trivial to solve.

nemothekid · on Oct 26, 2020

I'm not sure this is complacency - this just seems like regular old tech debt. The build takes 40 minutes but everyone has other things to do and there is no time to tend to the debt. Then one day someone has some cycles and discovers a one line change fixes the underlying issue.

I'm sure many engineering projects have similar improvements that just get a ticket/issue opened and never revisited due to the mountain of other seemingly pressing issues. From IPO to the start of the year Pinterest stock price had been trending downwards - I'm sure there was more external pressure to increase profitability than to fix CI build times. The stock has completely turned around since COVID, so I'm sure that changes things

dataflow · on Oct 26, 2020

IMHO (from having addressed such CI issues personally on teams that otherwise wouldn't bother) it's likely due to other factors, like a lack of interest, being scared of breaking the build, not being terribly comfortable touching build scripts, or the inability to run scripts locally, than a genuine lack of time. The returns you can get can be ridiculously huge across the entire team compared to the hours you might spend, but I've found many people just aren't terribly interested in sitting down and digging into ugly scripts and pushing dozens of commits to figure out what might be slowing things down. And honestly, it's not exactly trivial to structure things in a way that's simultaneously both efficient and maintainable, especially if you're refactoring an existing system instead of starting from scratch, so that can be another turn-off.

MaulingMonkey · on Oct 26, 2020

For me the biggest issue is that CI is often siloed to hell and back.

Even when most of the rest of the engineering environment is fine, the build scripts and configuration often aren't under version control themselves, or are manually deployed - meaning any changes require access to carefully guarded server credentials. This may even be by design as a "security measure" - as if I didn't already have the ability to run arbitrary code on the build servers in question through unit tests etc. The gatekeepers in question are often an underfunded IT department that has too much on their plate already, and are underwhelmed by the idea of reviewing a bunch of changes to "legacy" code that they've somehow convinced themselves they'll rewrite "soon" that they don't directly benefit from anyways.

And I find I can rarely run the scripts locally. They're also often hideously locked in to a specific CI solution that I can't locally install without a ton of work on my part to figure out the mess of undocumented dependencies, and rife with edge cases that I can't easily imitate on my dev machines.

My preferred CI setups involve a single configuration file, checked into the same repository it's configuring CI for, that simply forwards to a low-dependencies script that works on dev machines. Getting there from an existing CI setup, however, can be quite the challenge.

Aloha · on Oct 26, 2020

Or just creeping build time over years, "its always taken a while, I guess it just takes longer now". You dont bother optimizing things until they cause you sufficient pain to optimize them.

scsilver · on Oct 26, 2020

I can totally see a situation where the engineers whp made the script are long gone, the new engineers are justifying their hiring by churning out features and trying not to break things, especially things they dont own and effect everyone, like ci/cd, and that annoying but manageable 40 minute wait, just gets put on the backlog, waiting for half a year until someone with just enough experience and frustration makes a push to management to dedicate a bit of time to diving into the issue.

rhizome · on Oct 26, 2020

My assumption is some or all of those more than people thinking it's "fine," that it's deficiencies more than complacencies.

zorked · on Oct 26, 2020

Yup, it's all about incentives alignment. If you get promoted for shipping a feature but you don't get promoted for saving 40 minutes of everybody's time every day you will get a lot of features, delivered slowly.

pojzon · on Oct 26, 2020

This is the kind of thinking I tried to sell in my corpo.. where cloning monorepo takes 30m and building this monstrosity takes 1.5h (first time). Got scolded by management for saying - speed of changes should be more important than “looking busy” delivering stuff.

fn1 · on Oct 26, 2020

> I wonder what 'institutional complacencies' we have. Problems we assume are unsolvable but are actually very trivial to solve.

I spend a lot of time optimizing builds, because the effect is a multiplicator for everything else in development.

But it is not an easy task. One issue with performance-monitoring is that you have to carefully plan your work, or you will sit around and wait for results a lot:

Try the build: 40 minutes. Maybe add profiling statements, because you forgot them: another 40 minutes. Change something and try it out: no change, 40 minutes. Find another optimization which decreases time locally and try it out: 39.5 minutes, because on the build-server that optimization does not work that well. etc.

You just spent 160 minutes and shaved 0.5 minutes off the build.

I'm not saying it's not worth it, but that line of work is not often rewarding.

On the flip-side I once took two hours to write a java-agent which caches File.exists for class-loading and managed to decrease local startup time by 500% because the corporate virus-scanner got active less often.

innagadadavida · on Oct 26, 2020

Considering the build host does this hundreds of times every day, a better solution would be to simply have a git repo cache locally, should be secure and reliable given git’s object store design?

Any simple wrappers for git that can do this transparently?

manojlds · on Oct 26, 2020

Build servers don't git clone everytime though. They do a git clean if needed followed a git fetch / git pull equivalent.

GoCD for example maintains a single copy of the repo on the server for every pipeline that refers to it and the agents have the repos that they work on checked out. Any local changes or untracked files are by default cleaned. There are settings to force reclone etc, but it's not the default.

robjan · on Oct 26, 2020

In many cases the build agent is a stateless container which is destroyed as soon as the build is finished. In cases like this the repo needs to be (shallow) cloned each time.

dagmx · on Oct 26, 2020

That depends very heavily on the build infrastructure being used however

NikhilVerma · on Oct 26, 2020

I doubt that they started off with a 40 mins delay. It probably crept slowly as the repo got bigger and no one noticed it because of the gentle gradient. And they didn't have the time/resources to look into it.

edoceo · on Oct 26, 2020

You're confusing full clone, which for a huge repo is OK to be long as the fix which was to specify one RefSpec so they don't clone the full repo in CI.

bluedino · on Oct 26, 2020

People probably did complain, but they were met with, "We're cloning a 20GB repo! It's not going to happen in an instant!"

raverbashing · on Oct 26, 2020

This is the real complacency

Did someone really think "well it takes 40min, what can you do about it?" and just left it as such?

I knew people who would have that mentality in companies that are not around anymore. Take it as you want.

Yes, git is hard, but you know, maybe someone else has a better idea, or you can check SO, etc. (I don't even know why they were adding the refspecs there)

gpapilion · on Oct 26, 2020

I’ve found as an industry we’ve moved to more complex tools, but haven’t built the expertise in them to truly engineer solutions using them. I think lots of organizations could find major optimizations, but it requires really learning about the technology you’re utilizing.

megameter · on Oct 26, 2020

It's a natural tradeoff made when we ask for generality and flexibility: Doing that means implicitly saying "I want to do less implementing and more engineering" because a complex configurable dependency becomes an object of study in itself, something that needs empirical testing to use at its best.

Versus the simple thing you would author yourself: if you know the engineering tradeoffs made at the per-line level you have a decent grasp of the performance and flexibility, but you are implementing it and debugging it.

pushrax · on Oct 26, 2020

Also, profiling applications is surprisingly easy to learn. It boils down to looking at timestamps, and seeing what takes the longest. The majority of the effort is just figuring out where/how to get the timestamps you are looking for.

I will add that I think software complexity is only going to continue increasing over the long term; it reduces in some domains, but expands in others as we develop more advanced systems. Some kind of analogy to entropy.

franciscop · on Oct 26, 2020

Totally agree. Example: now that Node.js supports native `import` and `export` from modules I can see how many JS libraries will not need a transpilation step.

On the other hand TS seems to be more and more popular, which requires a compilation step.

smitty1e · on Oct 26, 2020

The whole point of being an Agile "generalizing specialist" is that one is a mile wide and an inch deep.

gpapilion · on Oct 26, 2020

Which i think is a fair approach when you’re early on. When you have a dev efficiency team you’re no longer hiring generalists.

temporallobe · on Oct 26, 2020

This, this, so much this. When we build more complexity into a system, the less we understand it, similar to how development frameworks create multiple layers of abstraction to the point where the developers have no idea what actual code the framework produces, much less how to fix it.

slx26 · on Oct 26, 2020

Yes, we probably need people to stop thinking about tools as if they "solved" problems; what they really do is "transform" them. Now instead of having to deal with the original problem, you only need to deal with part of it and part of the new problem of using the tool that's supposed to help you, plus any leaks you might have because tools rarely solve problems perfectly. It's a trade-off, and you need to be aware of these transformations.

SamuelAdams · on Oct 26, 2020

Another way of looking at it is this is the current golden age of infosec.

Think of all these complex systems developers and SysAdmins need to maintain at a company. Then think of how well each person knows each technology. Most of them will be "T" shaped, ie know one tech well but surface-level on all the others.

If I know several tools really well (or better than the company's sysadmins / devs) I can probably find some security issues with them.

jeffbee · on Oct 26, 2020

We have not "as an industry" moved to git. There's a vocal subset of git fans, but it is by no means an industry standard.

wolco2 · on Oct 26, 2020

What industry are you part of?

In many domains git has replaced other version control systems.

I would love to see a new approach to version control. Things like subversion or mercurial have exposed too many drawbacks for them to win back industry.

joshuamorton · on Oct 26, 2020

Google and Facebook both don't use git. Google uses a proprietary, perforce-esque system with multiple frontends, and Facebook uses Mercurial.

Among startups, I'm sure git holds a near monopoly, but if you move into other parts of the industry, that monopoly loosens.

AMC11 · on Oct 26, 2020

Is Git not the most used VCS?

chinhodado · on Oct 26, 2020

When I first joined one of my previous jobs, the build process had a checkout stage where it was blowing away the git folder and checked out from scratch the whole repo every time (!). Since the build machine was reserved for that build job I simply made some changes to do git clean -dfx & git reset --hard & git checkout origin branch. It shaved off like 15 minutes of the build time, which was something like 50% of the total build time.

mikepurvis · on Oct 26, 2020

It's frustrating how many ways there are for a git clone to get out of sync, especially when it's an automation-managed one that is supposed to be long-lived (think stuff like gracefully handling force-pushed branches and tags that are deleted). I've dealt with a bit of this with my company's Hound (code search engine) instance. Currently there's a big snarl of fallback logic in there that tries a shallow clone, but then unshallows and pulls refs if it can't find what it's looking for, culminating in this ridiculousness:

    git fetch --prune --no-tags --depth 1 origin+{ref}:remotes/origin/{ref}

See the whole thing here: https://github.com/mikepurvis/hound/blob/6b0b44db489f9aeff39...

The pipeline I manage is many repos rather than a monorepo, and maintaining long-licheckouts in this context is not really realistic, but what does work and is very fast is just grabbing tarballs— GitLab and Github both cache them, so they don't don't cost additional compute after the first time, and downloading them is strictly less transfer and fewer round trips than the git protocol.

The only real cost is that anything at build time which needs VCS info (eg, to embed it in the binary) will need an alternate path, for example having it be able to be passed in via an envvar.

mixmastamyk · on Oct 26, 2020

A new checkout is good practice. Using refspec and depth options can make it quick.

SamuelAdams · on Oct 26, 2020

> In the case of Pinboard, that operation would be fetching more than 2,500 branches.

Ok, I'll ask: why does a single repository have over 2,500 branches? Why not delete the ones you no longer use?

dahfizz · on Oct 26, 2020

Where I work doesn't delete branches, because there is no reason to. Git branches have essentially zero overhead and deleting them is just extra complexity in the CI toolchain. Deleting branches also deletes context in some scenarios. When dealing with an old codebase its nice to be able to checkout the exact version of the code at some point without having to dig through the log to get hashes and then dealing with a detached head.

The example in the article is a bit of a special case. It is a huge, and old, monorepo. In the typical case, fetching everything and fetching master is equivalent because all commits in all branches make their way into master anyway. If you have a weird branching strategy where you maintain multiple, significantly diverged branches at once, but only care about one of those branches at build time, then this optimization would save you time.

rovr138 · on Oct 26, 2020

> Git branches have essentially zero overhead

Based on the article linked here, they do.

dahfizz · on Oct 26, 2020

> If you have a weird branching strategy where you maintain multiple, significantly diverged branches at once, but only care about one of those branches at build time, then this optimization would save you time.

Its not the fact that they had lots of branches itself, its the fact that they had lots of commits hanging out in the middle of nowhere.

richardwhiuk · on Oct 26, 2020

If you are doing squash merges, git branches have a cost.

dahfizz · on Oct 26, 2020

A git branch is literally a file with a commit hash in it. It's conceptually a pointer to a commit. Creating, destroying, and maintaining a branch has all the overhead of a ~40 byte file.

Squash merges leave a ton of commits just floating in your old branch. If you delete the branch (the 40B file), all those commits are still there. Doing lots of squash merges brings you into this case I mentioned:

> If you have a weird branching strategy where you maintain multiple, significantly diverged branches at once, but only care about one of those branches at build time, then this optimization would save you time.

tetris11 · on Oct 26, 2020

If you have several releases with different targets, and want to make future security updates accessible to all

DoingIsLearning · on Oct 26, 2020

They could already be doing that.

That is if we assume they copy google's philosophy of a single monolith repository.

Pinterest has about 2000 employees, assuming 20% are active developers thats about 400 people, that gives you roughly 6 branches per developer which wouldn't be outrageous.

hn_throwaway_99 · on Oct 26, 2020

Because they use a monorepo. With monorepos at large companies the individual git repositories will be much larger and contain a ton more branches than if you have a repository-per-project model.

casperb · on Oct 26, 2020

Probably because they have 1600 employees and the 2500 branches are the active ones.

est · on Oct 26, 2020

monorepo culture.

jniedrauer · on Oct 26, 2020

One of the (many) things that drives me batty about Jenkins is that there are two different ways to represent everything. These days the "declarative pipelines" style seems to be the first class citizen, but most of the documentation still shows the old way. I can't take the code in this example and compare it trivially to my pipelines because the exact same logic is represented in a completely different format. I wish they would just deprecate one or the other.

chrisweekly · on Oct 26, 2020

I find the self-congratulatory tone in the post kind of off-putting, akin to "I saved 99% on my heating bill when I started closing doors and windows in the middle of winter."

If your repos weigh in at 20GB in size, with 350k commits, subject to 60k pulls in a single day, having someone with half a devops clue take a look at what your Jenkinsfile is doing with git is not exactly rocket science or a needle in a haystack. (Here's hoping they discover branch pruning too; how many of those 2500 branches are active?)

As a consultant I've seen plenty of apallingly poor workflows and practices, so this isn't all that remarkable... but for me the post seems kind of pointless.

paledot · on Oct 26, 2020

Indeed. I wasn't aware of that specific git option, but a build pipeline with a checkout step taking FORTY MINUTES is unacceptable. Plenty of ways to solve that problem, but it's a problem that never should have made it into a critical workflow.

I don't care for casting stones. It's clearly a big win, and you don't get numbers like that every day. But I feel like someone should've twigged to this much sooner.

YokoZar · on Oct 26, 2020

Can someone explain the intended meaning behind calling six different repositories "monorepos"?

It sounds to me like you don't have a monorepo at all and instead have six repositories for six project areas.

yen223 · on Oct 26, 2020

My interpretation is that each "monorepo" is a big git repository that consists of a collection of individually-deployed services, as opposed to having a single git repository per service.

I do not know whether that's what the blog author meant by that though.

whatatita · on Oct 26, 2020

I got that impression too. I can imagine the Pintrest monorepo, for example, has the website and server code together.

Their iOS and Android repos may contain the code for multiple apps. Though, I'm not aware of which other apps Pintrest (the company) creates besides the obvious one.

muststopmyths · on Oct 26, 2020

I'm a git noob, so I'm sorry if this sounds dumb but wouldn't

git clone --single-branch

achieve the same thing (i.e, check out only the branch you want to build) ?

Also, why would you not only check out one branch when doing CI ?

jxramos · on Oct 26, 2020

Looks like its implied from the documentation

    Implies --single-branch

https://git-scm.com/docs/git-clone#Documentation/git-clone.t...

muststopmyths · on Oct 26, 2020

hmm, the --depth implies single-branch, but the +refs overrode it by making sure it had the data to match all branches because of the wildcard refspec ?

tracer4201 · on Oct 26, 2020

I truly appreciate articles like this — it’s warming to see other companies running into the kinds of issues I’ve ran into or had to deal with, and more so that their culture openly discusses and shares these learnings with the broader community.

The most effective organizations I’ve worked at built mechanisms and processes to disseminate these kinds of learnings and have regular brown bags on how a particular problem was solved or how others can apply their lessons.

Keep it up Pinterest engineering folks.

uglycoyote · on Oct 26, 2020

He says that "Pinboard has more than 350K commits and is 20GB in size when cloned fully." I'm not clear though, exactly what "cloned fully" means in context of the unoptimized/optimized situation.

He says it went from 40 minutes to 30 seconds. Does this mean they found a way to grab the whole 20GB repo in 30 seconds? seems pretty darn fast to grab 20GB, but maybe on fast internal networks?

Or maybe they meant that it was 20GB if you grabbed all of the many thousands of garbage branches, when Jenkins really only needed to test "master", and finding a solution that allowed them to only grab what they needed made things faster.

I'm also curious about the incremental vs "cloning fully" aspect of it. Does each run of Jenkins clone the repo from scratch or does it incrementally pull into a directory where it has been cloned before? I could see how in a cloning-from-scratch situation the burden of cloning every branch that ever existed would be large, whereas incrementally I would think it wouldn't matter that much.

degrews · on Oct 27, 2020

> He says that "Pinboard has more than 350K commits and is 20GB in size when cloned fully." I'm not clear though, exactly what "cloned fully" means in context of the unoptimized/optimized situation.

It probably means including all commits.

It looks like they were successfully only pulling the last 50 commits, but they were doing that for each of 2500 branches. Now they are pulling only the most recent 50 commits for one branch.

bluedino · on Oct 26, 2020

My similar story goes like this: We had CRM software that let you setup user defined menu options. Someone at our organization decided to make a set of nested menu options where you could configure a product, with every possible combination being assigned a value!

So if you had a large, blue second generation widget with a foo accessory and option buzz, you were value 30202, and if was the same one except red, it was 26420...

Every time the CRM software started up, it cycled through the options, generated a new XML file with all the results, this took about a minute and created like a 60MB file.

The fix was to basically version the XML file and the options definition file. If someone had already generated that file, just load the XML file instead of parsing and looping through the options file. Started up in 5 seconds!

What was the excuse that it took so long in the first place? "The CRM software is written in Java, so it's slow."

saagarjha · on Oct 26, 2020

Seems like there's a lot of hostility towards the title, which might be considered the engineering blog equivalent of clickbait. If the authors are around, the post was quite informative and interesting to read, but I'm sure it would have been much more palatable with a more descriptive title.

But back on topic: does anyone have any insight into when git fetches things, and what it chooses to grab? It is just "when we were writing git we chose these things as being useful to have a 'please update things before running this command' implicitly run before them"? For example, git pull seems to run a fetch for you, etc.

sambe · on Oct 26, 2020

Ok, I'll ask the obvious question: why did setting the branches option to master not already do this?

EDIT

https://www.jenkins.io/doc/pipeline/steps/workflow-scm-step/ makes it sounds like the branches option specifies which branches to monitor for changes, after which all branches are fetched. This still seems like a counter-intuitive design that doesn't fit the most common cases.

jtchang · on Oct 26, 2020

This is good info. Need to check my own build pipelines now and see if we are just blindingly cloning everything or not. 40 minutes to do a clone is a pretty long time to wait though.

quickthrower2 · on Oct 26, 2020

Parkinson's Law of builds. "work expands so as to fill the time available for its completion", or in this case the available time is the point at which people can't stand the build taking too long. 30-60 minutes is normal because anything > 1 minute required you to context-switch anyway, and > 60 minutes means you are now at risk of taking a day if you have a work queue of a 1-pizza team. So [1..60] range causes a grumble but nothing will be done.

nathan_f77 · on Oct 26, 2020

Is there any way to do this for GitLab CI [1]? I'm using GIT_DEPTH=1, but I'm not sure how to set refspecs. It's not too important right now since it only takes about 11 seconds to clone the git repo, but maybe it's a quick win as well.

[1] https://docs.gitlab.com/ee/ci/large_repositories/

mcintyre1994 · on Oct 26, 2020

The docs seem to give the impression that they already do this, but it'd be great if someone from Gitlab could confirm because it doesn't use the refspec term or show the resulting git command.

> The following example makes the runner shallow clone to fetch only a given branch; it does not fetch any other branches nor tags.

https://docs.gitlab.com/ee/ci/large_repositories/#shallow-cl...