Tracking developer build times to decide if the M3 MacBook is worth upgrading

Aurornis · on Dec 29, 2023

This is a great write-up and I love all the different ways they collected and analyzed data.

That said, it would have been much easier and more accurate to simply put each laptop side by side and run some timed compilations on the exact same scenarios: A full build, incremental build of a recent change set, incremental build impacting a module that must be rebuilt, and a couple more scenarios.

Or write a script that steps through the last 100 git commits, applies them incrementally, and does a timed incremental build to get a representation of incremental build times for actual code. It could be done in a day.

Collecting company-wide stats leaves the door open to significant biases. The first that comes to mind is that newer employees will have M3 laptops while the oldest employees will be on M1 laptops. While not a strict ordering, newer employees (with their new M3 laptops) are more likely to be working on smaller changes while the more tenured employees might be deeper in the code or working in more complicated areas, doing things that require longer build times.

This is just one example of how the sampling isn’t truly as random and representative as it may seem.

So cool analysis and fun to see the way they’ve used various tools to analyze the data, but due to inherent biases in the sample set (older employees have older laptops, notably) I think anyone looking to answer these questions should start with the simpler method of benchmarking recent commits on each laptop before they spend a lot of time architecting company-wide data collection

lawrjone · on Dec 29, 2023

I totally agree with your suggestion, and we (I am the author of this post) did spot-check the performance for a few common tasks first.

We ended up collecting all this data partly to compare machine-to-machine, but also because we want historical data on developer build times and a continual measure of how the builds are performing so we can catch regressions. We quite frequently tweak the architecture of our codebase to make builds more performant when we see the build times go up.

Glad you enjoyed the post, though!

FunnyLookinHat · on Dec 30, 2023

I think there's something to be said for the fact that the engineering organization grew through this exercise - experimenting with using telemetry data in new ways that, when presented to other devs in the org, likely helped them to all level up and think differently about solving problems.

Sometimes these wandering paths to the solution have multiple knock-on effects in individual contributor growth that are hard to measure but are (subjectively, in my experience) valuable in moving the overall ability of the org forward.

nox101 · on Dec 30, 2023

I didn't see any analysis of network building as an alternative to M3s. For my project, ~40 million lines, past a certain minimum threshold, it doesn't matter how fast my machine is, it can't compete with the network build our infra-team makes.

So sure, an M3 might make my build 30% faster than my M1 build, but the network build is 15x faster. Is it possible instead of giving the developers M3s they should have invested in some kind of network build?

Kwpolska · on Dec 30, 2023

Network full builds might be faster, but would incremental builds be? Would developers still be able to use their favourite IDE and OS? Would developers be able to work without waiting in a queue? Would developers be able to work offline?

If you have a massive, monolithic, single-executable-producing codebase that can't be built on a developer machine, then you need network builds. But if you aren't Google, building on laptops gives developers better experience, even if it's slower.

idontknowifican · on Dec 30, 2023

i hate to say it but working offline is not really a thing at work anymore. it is no one thing, but a result of k8s by and large. i think a lot of places got compliant when you could just deploy a docker image, fuck how long that takes and how slow it is on mac

Kwpolska · on Dec 30, 2023

That depends entirely on the tech stack, and how much you care about enabling offline development. You can definitely run something like minikube on your laptop.

novok · on Dec 30, 2023

That is a very large company if you have a singular 40 million line codebase, maybe around 1000 engineers or greater? Network builds also takes significant investment in adopting stuff like bazel and a dedicated devex team to pull off most of the time. Setting up build metrics to determine a build decision and the other benefits that come from it is a one month project at most for one engineer.

It's like telling an indie hacker to adopt a complicated kubernetes setup for his app.

Cacti · on Dec 30, 2023

1,000 is a small company.

novok · on Jan 1, 2024

A company with ~1000 software engineers is probably in the top 100 in software company sizes in the world, especially if they are not a sweat shop consultancy, bank or defense contractor, which are all usually large companies themselves.

Cacti · on Jan 6, 2024

I mean, the vast majority of software engineers in the world are not in software engineering companies. If we are purposefully limiting ourselves to Bay Tech companies, then sure, I guess 1,000 software engineers is big. But the companies in the world that are around the top 100 employers in the world, like you suggested, these places have 150,000 to 250,000 employees. 1,000 programmers for internal CRUD and system integration is quite realistic; the IT staff alone for a company that size is like 5,000 people, that is not even accounting for their actual business (which these days almost always has a major software component).

This is also not including large government employers like the military, intelligence, or all of the large services like the mail.

1,000 software engineers is simply not that much.

ironick09 · on Dec 31, 2023

They said 1000 engineers. Surely a company consists of roles other than software engineers, right?

dzek69 · on Dec 30, 2023

Maybe, but I feel that s not the point here

rag-hav · on Dec 30, 2023

What do you mean by network build?

kwk1 · on Dec 30, 2023

They probably mean tools like distcc or sccache:

https://github.com/distcc/distcc

https://github.com/mozilla/sccache

baq · on Dec 30, 2023

And incredibuild: https://www.incredibuild.com/

DeathArrow · on Dec 30, 2023

Dedicated build machines.

dimask · on Dec 30, 2023

> This is a great write-up and I love all the different ways they collected and analyzed data.

> [..] due to inherent biases in the sample set [..]

But that is an analysis methods issue. This serves as a reminder that one cannot depend on AI-assistants when they are not themselves enough knowledgeable on a topic. At least for the time being.

For once, as you point, they conducted a t-test on data that are not independently sampled, as multiple data points were sampled by different people, and there are very valid reasons to believe that different people would have different tasks that may be more or less compute-demanding, which confound the data. This violates one of the very fundamental assumptions of the t-test, which was not pointed out by the code interpreter. In contrast, they could have modeled their data with what is called "linear mixed effects model" where stuff like person (who the laptop belongs to) as well as possibly other stuff like seniority etc could be put into the model as "random effects".

Nevertheless it is all quite interesting data. What I found most interesting is the RAM-related part: caching data can be very powerful, and higher RAM brings more benefits than people usually realise. Any laptop (or at least macbook) with more RAM than it usually needs has most of the time its extra RAM filled by cache.

_teyd · on Dec 30, 2023

I agree, it seems like they were trying to come up with the most expensive way to answer the question possible for some reason. And why was the finding in the end to upgrade M1 users to more expensive M3s when M2s were deemed sufficient?

heliodor · on Dec 30, 2023

If employees are purposefully isolated from the company's expenses, they'll waste money left and right.

Also, they don't care since any incremental savings aren't shared with the employees. Misaligned incentives. In that mentally, it's best to take while you can.

barbariangrunge · on Dec 31, 2023

Are m2s meaningfully cheaper? M1s are still being sold at their launch price

sgjohnson · on Dec 30, 2023

Because M2s are no longer produced.

matthew-wegner · on Dec 30, 2023

I would think you would want to capture what/how was built, as like:

* Repo started at this commit

* With this diff applied

* Build was run with this command

Capture that for a week. Now you have a cross section of real workloads, but you can repeat the builds on each hardware tier (and even new hardware down the road)

barbariangrunge · on Dec 31, 2023

The dev telemetry sounds well intentioned… but in 5-10 years will some new manager come in and use it as a productivity metric or work habit tracking technique, officially or unofficially?

dash2 · on Dec 29, 2023

As a scientist, I'm interested how computer programmers work with data.

* They drew beautiful graphs!

* They used chatgpt to automate their analysis super-fast!

* ChatGPT punched out a reasonably sensible t test!

But:

* They had variation across memory and chip type, but they never thought of using a linear regression.

* They drew histograms, which are hard to compare. They could have supplemented them with simple means and error bars. (Or used cumulative distribution functions, where you can see if they overlap or one is shifted.)

dgacmu · on Dec 30, 2023

I'm glad you noted programmers; as a computer science researcher, my reaction was the same as yours. I don't think I ever used a CDF for data analysis until grad school (even with having had stats as a dual bio/cs undergrad).

novok · on Dec 30, 2023

It's because that's usually the data scientist's job, and most eng infra teams don't have a data scientist and don't really need one most of the time.

Most of the time they deal with data the way their tools generally present data, which correlate closely to most analytics, perf analysis and observability software suites.

Expecting the average software eng to know what a CDF is the same as expecting them to know 3d graphics basics like quaternions and writing shaders.

mpoteat · on Dec 30, 2023

A standard CS program will cover statistics (incl. calculus-based stats e.g. MLEs), and graphics is a very common and popular elective (e.g. covering OpenGL). I learned all of this stuff (sans shaders) in undergrad, and I went to a shitty state college. So from my perspective an entry level programmer should at least have a passing familiarity with these topics.

Does your experience truly say that the average SWE is so ignorant? If so, why do you think that is?

oakejp12 · on Dec 31, 2023

> A standard CS program will cover statistics

> graphics is a very common and popular elective

I find these statements to be extremely doubtful. Why would a CS program cover statistics? Wouldn't that be the math department? If there any required courses, it's most likely Calc 1/2, Linear Algebra, and Discrete Math.

Also, out of the hundreds of programmers I've met, I don't know any that has done graphics programming. I consider that super niche.

dehugger · on Jan 1, 2024

Thankfully all programmers have a CS degree, as there are absolutely no career paths into the industry that could possibly bypass a four year degree. What a relief, imagine the horror of working with plebians that never took a course on statistics!

DeathArrow · on Dec 30, 2023

>Expecting the average software eng to know what a CDF is the same as expecting them to know 3d graphics basics like quaternions and writing shaders.

I did write shaders and used quaternions back in the day. I also worked on microcontrollers, did some system programming, developed mobile and desktop apps. Now I am working on a rather large microservice based app.

perfmode · on Dec 30, 2023

you’re a unicorn

azalemeth · on Dec 30, 2023

> ChatGPT punched out a reasonably sensible t test!

I think the distribution is decidedly non normal here and the difference in the medians may well have also been of substantial interest -- I'd go for a Wilcox test here to first order... Or even some type of quantile regression. Honestly the famous Jonckheere–Terpstra test for ordered medians would be _perfect_ for this bit of pseudoanalysis -- have the hypothesis that M3 > M2 > M1 and you're good to go, right?!

(Disclaimers apply!)

whimsicalism · on Dec 30, 2023

12,000 builds? Sure maybe the build time distribution is non-normal, but the sample statistic probably is approximately normal with that many builds.

RayVR · on Dec 30, 2023

Many people misinterpret what is required for a t-test.

azalemeth · on Dec 30, 2023

I meant that the median is likely arguably the more relevant statistic, that is all -- I realise that the central limit theorem exists!

whimsicalism · on Jan 1, 2024

Fair enough - I see what you are saying. Sorry for my accidentaly condescending reply.

Herring · on Dec 29, 2023

>They drew histograms, which are hard to compare.

Note that in some places they used boxplots, which offer clearer comparisons. It would have been more effective to present all the data using boxplots.

tmoertel · on Dec 29, 2023

> They drew histograms, which are hard to compare.

Like you, I'd suggest empirical CDF plots for comparisons like these. Each distribution results in a curve, and the curves can be plotted together on the same graph for easy comparison. As an example, see the final plot on this page:

https://ggplot2.tidyverse.org/reference/stat_ecdf.html

mnming · on Dec 29, 2023

I think it's partly because the audiences are often not familiar with those statistics details either.

Most people hates nuances when reading data report.

fallous · on Dec 30, 2023

I think you might want to add the caveat "young computer programmers." Some of us grew up in a time where we had to learn basic statistics and visualization to understand profiling at the "bare metal" level and carried that on throughout our careers.

jxcl · on Dec 29, 2023

Yeah, I was looking at the histograms too, having trouble comparing them and thinking they were a strange choice for showing differences.

NavinF · on Dec 30, 2023

> cumulative distribution functions, where you can see if they overlap or one is shifted

Why would this be preferred over a PDF? I've rarely seen CDF plots after high school so I would have to convert the CDF into a PDF inside my head to check if the two distributions overlap or are shifted. CDFs are not a native representation for most people

unsung · on Dec 30, 2023

I can give a real example. At work we were testing pulse shaping amplifiers for Geiger Muller tubes. They take a pulse in, shape it to get a pulse with a height proportional to the charge collected, and output a histogram of the frequency of pulse heights, with each bin representing how many pulses have a given amount of charge.

Ideally, of all components are the same, there is no jitter, and if you feed in a test signal from a generator with exactly the same area per pulse, you should see a histogram where every count is in a single bin.

In real life, components have tolerances, and readouts have jitter, so the counts spread out and you might see, with the same input, one device with, say, 100 counts in bin 60, while a comparably performing device might have 33 each in bins 58, 59, and 60.

This can be hard to compare visually as a PDF, but if you compare CDF's, you see S-curves with rising edges that only differ slightly in slope and position, making the test more intuitive.

dash2 · on Dec 30, 2023

If one line is to the right of the other everywhere, then the distribution is bigger everywhere. (“First order stochastic dominance” if you want to sound fancy.) I agree that CDFs are hard to interpret, but that is partly due to unfamiliarity.

LASR · on Dec 29, 2023

Solid analysis.

A word of warning from personal experience:

I am part of a medium-sized software company (2k employees). A few years ago, we wanted to improve dev productivity. Instead of going with new laptops, we decided to explore offloading the dev stack over to AWS boxes.

This turned out to be a multi-year project with a whole team of devs (~4) working on it full-time.

In hindsight, the tradeoff wasn't worth it. It's still way too difficult to remap a fully-local dev experience with one that's running in the cloud.

So yeah, upgrade your laptops instead.

eysi · on Dec 30, 2023

My team has been developing against a fully remote environment (K8s cluster) for some years now and it makes for a really powerful DevEx.

Code sits on our laptops but live syncs to the remote services without requiring a Docker build or K8s deploy. It really does feel like local.

In particular it lets us do away with the commit-push-pray cycle because we can run integ tests and beyond as we code as opposed to waiting for CI.

We use Garden, (https://docs.garden.io) for this. (And yes I am afilliated :)).

But whether you use Garden or not, leveraging the power of the cloud for “inner loop” dev can be pretty amazing with right tooling.

I wrote a bit more about our experience here: https://thenewstack.io/one-year-of-remote-kubernetes-develop...

jayd16 · on Dec 30, 2023

Kind of interesting to think that CI is significantly slower in practice and both systems need to be maintained. Is it just the overhead of pushing through git or are there other reasons as well?

eysi · on Dec 30, 2023

The way we do things is that we build everything in the cloud and store in a central container registry. So if I trigger a build during dev, the CI runner can re-use that, e.g. if it’s needed before running a test or creating a preview env.

Similarly if another dev (or a CI runner) triggers a build of one of our services, I won’t have to next time I start my dev environment. And because it’s built in the cloud there’s no “works on my machine”.

Same applies to tests actually. They run in the cloud in an independent and trusted environment and the results are cached and stored centrally.

Garden knows all the files and config that belong to a given test suite. So the very first CI run may run tests for service A, service B, and service C. I then write code that only changes service B, open a PR and only the relevant tests get run in CI.

And because it’s all in prod-like environments, I can run integ and e2e tests from my laptop as I code, instead of only having that set up for CI.

mewpmewp2 · on Dec 30, 2023

You would need a very perfect and flexible CI system in place that wouldn't need to rebuild anything it doesn't need and only run the tests you want or only recently failed tests etc.

Many CI systems would spin up a new box instead of using persistent so likely have to rebuild if no cache, etc.

So basically I would say most of the overhead is in not having a persistent box with knowledge of last build or ability to choose what to run in there, which pretty much just equals to local capabilities.

peanball · on Dec 30, 2023

Often you also have the CI system designed in a way to verify a “from scratch” build that avoids any issues with “works on my machine” scenarios due to things still being cached that shouldn’t be there anymore.

jayd16 · on Dec 31, 2023

Having persistent boxes with sticky sessions seems seems pretty achievable.

8n4vidtmkvmk · on Dec 30, 2023

I tried Garden briefly but didn't like it for some reason. DevSpace was simpler to set up and works quite reliably. The sync feature where they automatically inject something into the pod works really well.

eysi · on Dec 30, 2023

DevSpace is a great tool but it’s bummer you didn’t like Garden.

Admittedly, documentation and stability weren’t quite what we’d like and we’ve done a massive overhaul of the foundational pieces in the past 12 months.

If you want to share feedback I’m all ears, my email is in my profile.

rsanek · on Dec 29, 2023

This might have to do with scale. At my employer (~7k employees) we started down this path a few years ago as well, and while it has taken longer for remote to be better than local, it now definitively is and has unlocked all kinds of other stuff that wasn't possible with the local-only version. One example is working across multiple branches by switching machines instead of files on local has meant way lower latency when switching between tasks.

kaishiro · on Dec 29, 2023

One thing I've never understood (and admittedly have not thoroughly researched) is how a remote workspace jives with front-end development. My local tooling is all terminal-based, but after ssh'ing into the remote box to conduct some "local" development, how do I see those changes in a browser? Is the local just exposed on an ip:port?

Ambroos · on Dec 30, 2023

Facebook's web repo which includes all the PHP and JS for facebook.com and a bunch of other sites is one big massive repo. For development you claim out a server that has a recent checkout of the codebase. Right after claiming it it syncs in your personal commits/stacks you're working on, ready to rebase. You access that machine on a subdomain of any of the FB websites. As far as I remember it was something along the lines of 12345.od.facebook.com, but the format changed from time to time as infra changed. Client certificate authentication and VPN needed (that may no longer be the case, my info is 1y+ old).

There was an internal search provider (bunnylol) that had tools like putting @od in front of any FB URL to generate a redirect of that URL to your currently checked out On Demand server. Painless to work with! Nice side benefit of living on the same domain as the main sites is that the cookies are reused, so no need to log in again.

avbor · on Dec 29, 2023

You can expose the browser port via ssh, with a command line flag like `-L 8080:127.0.0.1:8080`. So you can still preview locally

kaishiro · on Dec 30, 2023

Ah yeah, tunneling it back makes perfect sense - not sure why I never considered that. I'll explore that a bit - thanks for the heads up!

MonaroVXR · on Dec 30, 2023

If you're using vs code, vscode is doing that automatically

neurostimulant · on Dec 30, 2023

Have you tried vs code remote development plugin? It can do port forwarding (e.g. forwarding port 8080 on your local machine to port 8080 on the remote machine).

fragmede · on Dec 30, 2023

Yes, modulo networking VPN magic so it's not available over the wider Internet for hackers to discover.

notzane · on Dec 30, 2023

My company is fully using cloud desktops for engineering except iOS and Android development (we get faster laptops instead).

kaishiro · on Dec 30, 2023

Are you using a product or have you just rolled your own solution?

vladvasiliu · on Dec 30, 2023

Are you using a public cloud to host the dev boxen? Is compilation actually faster than locally – assuming that your pc's having been replaced to lower-specced versions since they don't do any heavy lifting anymore?

I work for a not-really-tech company (and I'm not a full-time dev either), so I've been issued a crappy "ultra-portable" laptop with an ultra-low-voltage CPU. I've looked into offloading my dev work to an AWS instance, but was quite surprised that it wasn't any faster than doing things locally for things like Rust compiles.

spockz · on Dec 30, 2023

In our case it is mostly faster when provisioning a machine with significantly more cores. In cloud machines you get “vcores” which are not the same as a core on a local cpu.

I’ve been integrating psrecord into our builds to track core utilisation during the built and see that a lot of time is spent in single threaded activities. Effort is required to compile modules in parallel but that is actually quite straightforward. Running all tests in parallel is harder.

We get the most out of the cloud machines by being able to provision a 16+ core machine to run more complicated (resilience) tests and benchmarks.

Also note that typically the cloud machines run on lower clocked CPUs than you would find in a workstation depending on which machine you provision.

ParetoOptimal · on Dec 30, 2023

Can't you locally switch between branches with git worktrees if you make your build cache key on worktree name?

vghaisas · on Dec 29, 2023

Haha, as I read more words of your comment, I got more sure that we worked at the same place. Totally agree, remote devboxes are really great these days!

However, I also feel like our setup was well suited to remote-first dev anyway (eg. syncing of auto-generated files being a pain for local dev).

tracker1 · on Dec 30, 2023

Second on this. Not being able to run a solution entirely local introduces massive friction in terms of being able to reason with said solution.

When you need to have 200+ parts running to do anything, it can be hard to work in a single piece that touches a couple others.

With servers that have upwards of 128+ cores and 256+ threads, my opinion is swinging back in favor of monoliths for most software.

jeffbee · on Dec 30, 2023

My company piles so much ill-considered Linux antivirus and other crap in cloud developer boxes that even on a huge instance type, the builds are ten or more times slower than a laptop, and hundreds of times slower than a real dev box with a Threadripper or similar. It's just a pure waste of money and everyone's time.

It turns out that hooking every system call with vendor crapware is bad for a unix-style toolchain that execs a million subprocesses.

xvector · on Dec 30, 2023

This is just a manpower thing.

At large tech companies like Google, Meta, etc the dev environment is entirely in the cloud for the vast majority of SWEs.

This is a much nicer dev experience than anything local.

dmos62 · on Dec 29, 2023

My dev box died (that I used for remote work), and instead of buying something new immediately, I moved my setup to a Hetzner cloud vps. Took around 2 days. Stuff like setting up termux on my tablet and the cli environment on the vps was 90 percent of that. The plus side was that I then spent the remaining summer working outside in the terrace and in the park. Was awesome. I was able to do it because practically all of my tools are command line based (vim, etc).

sweetjuly · on Dec 30, 2023

How much does this cost you? I've been dealing with a huge workstation-server thing for years in order to get this flexibility and while the performance/cost is amazing, reliability and maintenance has been a pain. I've been thinking about buying some cloud compute but an equivalent workstation ends up being crazy expensive (>$100/mo).

dinvlad · on Dec 30, 2023

There’s a crazy good deal for a dedicated server with 14-core/20-thread i5-13500 CPU and 64GB RAM, for just around 40 EUR/mo: https://www.hetzner.com/dedicated-rootserver/matrix-ex

This is honestly a bit overkill for a dev workstation (unless you compile Rust!), but since it’s a dedicated server it can also host any number of fully isolated services for homelab or saas. There’s nothing else like it in the wild, afaik.

dmos62 · on Jan 2, 2024

6 eur/month. It's the most basic offering. I don't really need power for what I'm doing. Running tests locally took a bit longer than I had been used to, but just a bit. I'm pretty used to underpowered environments though. I find that underpowering is a good way to enforce a sort of hygiene in terms of what you do and how.

randomgiy3142 · on Dec 30, 2023

I’d be careful with Hetzner. I was doing nothing malicious and signed up. I had to submit a passport which was valid US. It got my account cancelled. I asked why and they said they couldn’t say for security reasons. They seem like an awesome service, I don’t want knock them I just simply asked if I could resubmit or something the mediate and they said no. I don’t blame them just be careful. I’m guessing my passport and face might have trigged some validation issues? I dunno.

ametrau · on Dec 30, 2023

You have to give a hosting company a copy of your passport?!? (And hope they delete it… eventually?)

albert180 · on Dec 30, 2023

Only if you triggered some risk checking systems. I didn't need to provide anything when I signed up this year.

ametrau · on Jan 1, 2024

Ah, so after you’ve been with them a while and it’s a pain to go elsewhere.

albert180 · on Jan 12, 2024

Nope, I guess they use some algorithms during sign up to decide if they would like to have additional data or not

dmos62 · on Jan 2, 2024

Thanks for the heads up. I had to provide a passport as well.

makeitdouble · on Dec 30, 2023

It of course steongly depends on what your stack is, my current job provides a full remote dev server for our backend and it's the best experience I've seen in a long time. In particular having a common DB is suprinsingly uneventful (nobody's dropping tables here and there) while helping a lot.

We have interns coming in and fully ready within an hour or two of setup. Same way changing local machines is a breeze with very little downtime.

__jem · on Dec 30, 2023

Isn't the point of a dev environment precisely that the intern can drop tables? Idk, I've never had a shared database not turn to mush over a long enough period, and think investing the effort to build data scripts to rebuild dev dbs from scratch has always been the right call.

makeitdouble · on Dec 30, 2023

Dropping tables to see what happens or resetting DBs every hour is fine with a small dataset, but it becomes impractical when you work on a monolith that talks to a set of DB with a hundred+ tables in total and takes 5 hours to restore.

As you point out rebuilding small test datasets instead of just filtering the prod DB is an option, but those also need maintenance, and take a hell of time to make sure all the relevant cases are covered.

Basically, trying to flee from the bulk and complexity tends to bring a different set of hurdles and missing parts that have to be paid in time, maintenance and bugs only discovered in prod.

PS: the test DB is still reset everyday. Eorse thing happening is we need to do something else for a few hours until it's restored.

arthens · on Dec 30, 2023

> We have interns coming in and fully ready within an hour or two of setup. Same way changing local machines is a breeze with very little downtime.

This sounds like the result of a company investing in tooling, rather than something specific to a remote dev env. Our local dev env takes 3 commands and less than 3 hours to go from a new laptop to a fully working dev env.

rc_kas · on Dec 30, 2023

my company did this. fuck i hate it so much. if anyone wants to hire me away from this remote desktop hellscape, please do.

vladvasiliu · on Dec 30, 2023

If I understand correctly, they're not talking about remote desktops. Rather, the editor is local and responds normally, while the heavy lifting of compilation is done remotely. I've dabbled in this myself, and it's nice enough.

vb6sp6 · on Dec 30, 2023

I've been working this way for years, really nice. What is main complaint?

Aeolun · on Dec 30, 2023

Slowness, latency, lack of control. The usual suspects?

There’s moments where you try to do a thing that normal on a local PC and it’s impossible on remote. That cognitive dissonance is the worst.

rc_kas · on Dec 31, 2023

Yep. And shortcut keys and other smaller behaviors like that get weird sometimes.

tuananh · on Dec 30, 2023

Thanks for the insight. It maybe depends on each team too.

While my team (platform & infra) much prefer remote devbox, the development teams are not.

It could be specific to my org because we have way too many restrictions on the local dev machine (eg: no linux on laptop but it's ok on server and my team much prefer linux over crippled Windows laptop).

WaxProlix · on Dec 29, 2023

I suspect things like GitHub's Codespaces offering will be more and more popular as time goes on for this kind of thing. Did you guys try out some of the AWS Cloud9 or other 'canned' dev env offerings?

hmottestad · on Dec 29, 2023

My experience with GitHub Codespaces is mostly limited to when I forgot my laptop and had to work from my iPad. It was a horrible experience, mostly because Codespaces didn’t support touch or Safari very well and I also couldn’t use IntelliJ which I’m more familiar with.

Can’t really say anything for performance, but I don’t think it’ll beat my laptop unless maven can magically take decent advantage of 32 cores (which I unfortunately know it can’t).

Kwpolska · on Dec 30, 2023

AWS Cloud9 is a web IDE that can run on any EC2 box. The web IDE is a custom Amazon thing and is quite mediocre.

jiggawatts · on Dec 29, 2023

https://xkcd.com/1205/

behnamoh · on Dec 30, 2023

I disagree though. If a task is boring and repetitive, I just won't ever do it. So the comparison for people like me is:

    "spend X time to automate this task vs not do the task at all".

Whereas the xkcd is like (n = frequency that you do the task):

    "Spend X time to automate this task that takes Y×n time normally and get it down to Z×n time, vs spend Y×n time to do the task"

faeriechangling · on Dec 31, 2023

Even where a task being automated is likely to be done, automation can mean it’s done more reliably or quickly. Automation is generally needed to advance to greater and greater levels of complexity, time-efficient or not.

Also not all minutes have equal value, spending a few hours to save 5 minutes in an emergency with automation can be well worth it.

mdbauman · on Dec 29, 2023

This xkcd seems relevant also: https://xkcd.com/303/

One thing that jumps out at me is the assumption that compile time implies wasted time. The linked Martin Fowler article provides justification for this, saying that longer feedback loops provide an opportunity to get distracted or leave a flow state while ex. checking email or getting coffee. The thing is, you don't have to go work on a completely unrelated task. The code is still in front of you and you can still be thinking about it, realizing there's yet another corner case you need to write a test for. Maybe you're not getting instant gratification, but surely a 2-minute compile time doesn't imply 2 whole minutes of wasted time.

newaccount74 · on Dec 30, 2023

If you can figure out something useful to do during a two minute window, I envy you.

I really struggle with task switching, and two minutes is the danger zone. Just enough time to get distracted, by something else; too little time to start meaningful work on anything else...

Hour long compiles are okay, I plan them, and have something else to do while they are building.

30 second compiles are annoying, but don't affect my productivity much (except when doing minor tweaks to UI or copywriting).

2-10 minute compiles are the worst.

chiefalchemist · on Dec 29, 2023

Spot on. The mind often needs time and space to breathe, especially after it's been focused and bearing down on something. We're humans, not machines. Creativity (i.e., problem solving) needs to be nurtured. It can't be force fed.

More time working doesn't translate to being more effective and more productive. If that were the case then why are a disproportionate percentage of my "Oh shit! I know what to do to solve that..." in the shower, on my morning run, etc.?

majikandy · on Dec 29, 2023

I love those moments. Your brain has worked on it in the background like a ‘bombe’ machine cracking the day’s enigma code. And suddenly “ding… the days code is in!”

chiefalchemist · on Dec 30, 2023

You might like the book "Your Brain on Work" by Dr David Rock. In fact, I'm due for a re- read.

https://davidrock.net/about/

seadan83 · on Dec 30, 2023

I agree to some extent. Though, I don't think it has to be a trade-off though. After a sub-5 second compile time, I go over to get a coffee to ponder the results of the compile rather than imagine what those results might be. Taking time to think is not mutually exclusive to a highly responsive dev process.

norir · on Dec 30, 2023

I get what you are saying but I still think fast compilation is essential to a pleasant dev experience. Regardless of how fast the compiler is, there will always be time when we are just sitting there thinking, not typing. But when I am implementing, I want to verify that my changes work as quickly as possible and there is really no upside to waiting around for two minutes.

perrygeo · on Dec 30, 2023

Yes! Pauses allow you to reflect on your expectations of what you're actually compiling. As you sit in anticipation, you reflect on how your recent changes will manifest and how you might QA test it. You design new edge cases to add to the test suite. You sketch alternatives in your notebook. You realize oh compilation will surely fail on x because I forgot to add y to module z. You realize your logs, metrics, tests and error handling might need to be tweaked to unearth answers to the questions that you just now formulated. This reflection time is perhaps the most productive time a programmer will spend in their day. Calling it "wasted" reflects a poor understanding of the software development process.

tomaskafka · on Dec 29, 2023

My personal research for iOS development, taking the cost into consideration, concluded:

- M2 Pro is nice, but the improvement over 10 core (8 perf cores) M1 Pro is not that large (136 vs 120 s in Xcode benchmark: https://github.com/devMEremenko/XcodeBenchmark)

- M3 Pro is nerfed (only 6 perf cores) to better distinguish and sell M3 Max, basically on par with M2 Pro

So, in the end, I got a slightly used 10 core M1 Pro and am very happy, having spent less than half of what the base M3 Pro would cost, and got 85% of its power (and also, considering that you generally need to have at least 33 to 50 % faster CPU to even notice the difference :)).

mgrandl · on Dec 29, 2023

The M3 Pro being nerfed has been parroted on the Internet since the announcement. Practically it’s a great choice. It’s much more efficient than the M2 Pro at slightly better performance. That’s what I am looking for in a laptop. I don’t really have a usecase for the memory bandwidth…

orangecat · on Dec 30, 2023

The M3 Pro and Max get virtually identical results in battery tests, e.g. https://www.tomsguide.com/news/macbook-pro-m3-and-m3-max-bat.... The Pro may be a perfectly fine machine, but Apple didn't remove cores to increase battery life; they did it to lower costs and upsell the Max.

Ar-Curunir · on Dec 30, 2023

It might be the case that the yield on the chips is low, so they decided to use “defective” chips in the M3 Pro, and the non-defective in the M3 Max.

dagmx · on Dec 30, 2023

In all M generations, the max and pro are effectively different layouts so can’t be put down to binning. Each generation did offer binned versions of the Pro and Max with fewer cores though.

brandall10 · on Dec 30, 2023

These aren't binned chips... the use of efficiency cores does reduce transistor count considerably which could improve yields.

That said, while the transistor count of the M2 Pro -> M3 Pro did decrease, it went up quite a bit from the M2 -> M3.

It seems most likely Apple is just looking to differentiate the tiers.

tomaskafka · on Dec 29, 2023

Everyone has a different needs - for me, even M1 Pro has more battery life than I use or need, so further efficiency differences bring little value.

Joeri · on Dec 30, 2023

AI is the main use case for memory bandwidth that I know of. Local LLM’s are memory bandwidth limited when running inference, so once you fall into that trap you end up wanting the 400 gb/s max memory bandwidth of the m1/m2/m3 max, paired with lots and lots of RAM. Apple pairs memory size and bandwidth upgrades to core counts a lot more in m3 which makes the m3 line-up far more expensive than the m2 line-up to reach comparable LLM performance. Them touting AI as a use case for the m3 line-up in the keynote was decidedly odd, as this generation is a step back when it comes to price vs performance.

dgdosen · on Dec 29, 2023

I picked up an M3Pro/11/14/36GB/1TB to 'test' over the long holiday return period to see if I need an M3 Max. For my workflow (similar to blog post) - I don't! I'm very happy with this machine.

Die shots show the CPU cores take up so little space compared to GPUs on both the Pro and Max... I wonder why.

wlesieutre · on Dec 29, 2023

I don’t really have a usecase for even more battery life, so I’d rather have it run faster

Aurornis · on Dec 29, 2023

My experience was similar: In real world compile times, the M1 Pro still hangs quite closely to the current laptop M2 and M3 models. Nothing as significant as the differences in this article.

I could depend on the language or project, but in head-to-head benchmarks of identical compile commands I didn’t see any differences this big.

lawrjone · on Dec 29, 2023

That's interesting you saw less of an improvement in the M2 than we saw in this article.

I guess not that surprising given the different compilation toolchains though, especially as even with the Go toolchain you can see how specific specs lend themselves to different parts of the build process (such as the additional memory helping linker performance).

You're not the only one to comment that the M3 is weirdly capped for performance. Hopefully not something they'll continue into the M4+ models.

tomaskafka · on Dec 29, 2023

That's what Xcode benchmarks seem to say.

Yep, there appears to be no reason for getting M3 Pro instead of M2 Pro, but my guess is that after this (unfortunate) adjustment, they got the separation they wanted (a clear hierarchy of Max > Pro > base chip for both CPU and GPU power), and can then improve all three chips by a similar amount in the future generations.

Reason077 · on Dec 29, 2023

> ”Yep, there appears to be no reason for getting M3 Pro instead of M2 Pro”

There is if you care about efficiency / battery life.

kjkjadksj · on Dec 30, 2023

Don’t you get better single core performance in m3 pro? Iirc it has stronger performance and efficiency cores as well.

ramijames · on Dec 29, 2023

I also made this calculation recently and ended up getting an M1 Pro with maxed out memory and disk. It was a solid deal and it is an amazing computer.

jim180 · on Dec 29, 2023

I love my M1 MacBook Air for iOS development. One thing, I'd like to have from Pro line is the screen, and just the PPI part. While 120Hz is a nice thing to have, it won't happen on Air laptops.

geniium · on Dec 29, 2023

Basically the Pareto effect in choosing the right cpu vs cost

euos · on Dec 30, 2023

I am ex-core contributor Chromium and Node.js and current core contributor to gRPC Core/C++.

I am never bothered with build times. There is "interactive build" (incremental builds I use to rerun related unit tests as I work on code) and non-interactive build (one I launch and go get coffee/read email). I have never seen hardware refresh toggle non-interactive into interactive.

My personal hardware (that I use now and then to do some quick fix/code review) is 5+ year old Intel i7 with 16Gb of memory (had to add 16Gb when realized linking Node.js in WSL requires more memory).

My work laptop is Intel MacBook Pro with a touch bar. I do not think it has any impact on my productivity. What matters is the screen size and quality (e.g. resolution, contrast and sharpness) and storage speed. Build system (e.g. speed of incremental builds and support for distributed builds) has more impact than any CPU advances. I use Bazel for my personal projects.

gizmo · on Dec 30, 2023

Somehow programmers have come to accept that a minuscule change in a single function that only result in a few bytes changing in a binary takes forever to compile and link. Compilation and linking should be basically instantaneous. So fast that you don't even realize there is a compilation step at all.

Sure, release builds with whole program optimization and other fancy compiler techniques can take longer. That's fine. But the regular compile/debug/test loop can still be instant. For legacy reasons compilation in systems languages is unbelievably slow but it doesn't have to be this way.

PhilipRoman · on Dec 30, 2023

This is the reason why I often use tcc compiler for my edit/compile/hotreload cycle, it is about 8x faster than gcc with -O0 and 20x faster than gcc with -O2.

With tcc the initial compilation of hostapd it takes about 0.7 seconds and incremental builds are roughly 50 milliseconds.

The only problem is that tcc's diagnostics aren't the best and sometimes there are mild compatibility issues (usually it is enough to tweak CFLAGS or add some macro definition)

cole-k · on Dec 30, 2023

I mean yeah I've come to accept it because I don't know any different. If you can share some examples of large-scale projects that you can compile to test locally near-instantly - or how we might change existing projects/languages to allow for this - then you will have my attention instead of skepticism.

euos · on Dec 30, 2023

That’s why I write test first. I don’t want to build everything.

euos · on Dec 30, 2023

I am firmly in test-driven development camp. My test cases build and run interactively. I rarely need to do a full build. CI will make sure I didn’t break anything unexpected.

karolist · on Dec 30, 2023

I too come from Blaze and tried to use Bazel for my personal project which involves backend + frontend dockerized, the build rules got weird and niche real quick and I was spending lots of time working with the BUILD files making me question the value against plain old Makefiles, this was 3 years ago, maybe the public ecosystem is better now.

euos · on Dec 30, 2023

I use Basel for C++. I would write normal dockerfile if I need it. Bazel docker support is an atrocity. For JS builds I also use regular TSC.

dilyevsky · on Dec 30, 2023

rules_docker is now deprecated but rules_oci[0] is the replacement and so far I find it much nicer

[0] - https://github.com/bazel-contrib/rules_oci

euos · on Dec 31, 2023

You may pry my Dockerfiles from my cold dead fingers :)

syntaxing · on Dec 30, 2023

Aren’t M series screen and storage speed significantly superior to your Intel MBP? I transitioned from an Intel MBP to M1 for work and the screen was significantly superior (not sure about storage speed, our builds are all on a remote dev machine that is stacked).

euos · on Dec 30, 2023

I only use laptop screen in emergencies. Storage is fast enough.

syntaxing · on Dec 30, 2023

For my curiosity, what do you use for your main monitor? I’ve been wanting to replace my ultrawide with something better.

euos · on Dec 30, 2023

I use 4k 32' as my monitor. My home monitor is Dell U3219Q, I am very happy with picture quality, though kids say it is bad for gaming.

steeve · on Dec 30, 2023

This is because you’ve been spoiled by Bazel. As was I.

euos · on Dec 30, 2023

One day I will learn cmake. But not today :)

Kwpolska · on Dec 30, 2023

Chromium is a massive project. In more normally-sized projects, you can build everything on your laptop in reasonable time.

euos · on Dec 30, 2023

When I worked at Chromium there were two major mitigations:

1. Debug compilation was split in shared libraries so only a couple of them has to be rebuilt in your regular dev workflow. 2. They had some magical distributed build that "just worked" for me. I never had to dive into the details.

I was working on DevTools so in many cases my changes would touch both browser and renderer. Unit testing was helpful.

dilyevsky · on Dec 30, 2023

Bazel is significantly faster on m1 compared to i7 even if it doesn’t try to recompile protobuf compiler code which it’s still attempting to do regularly

white_dragon88 · on Dec 30, 2023

5+ year old i7 are potato and would be a massive time waster today. Build times matter.

ComputerGuru · on Dec 30, 2023

I have a seven year old ThreadRipper Pro and would not significantly benefit from upgrading.

wtallis · on Dec 30, 2023

The Threadripper PRO branding was only introduced 3.5 years ago. The first two generations didn't have any split between workstation parts and enthusiast consumer parts. You must have a first-generation Threadripper, which means it's somewhere between 8 and 16 CPU cores.

If you would not significantly benefit from upgrading, it's only because you already have more CPU performance than you need. Today's CPUs are significantly better than first-generation Zen in performance per clock and raw clock speed, and mainstream consumer desktop platforms can now match the top first-generation Threadripper in CPU core count and total DRAM bandwidth (and soon, DRAM capacity). There's no performance or power metric by which a Threadripper 1950X (not quite 6.5 years old) beats a Ryzen 7950X. And the 7950X also comes in a mobile package that only sacrifices a bit of performance (to fit into fairly chunky "laptops").

ComputerGuru · on Dec 30, 2023

I guess I should clarify: I am a rust and C++ developer blocked on compilation time, but even then, I am not able to justify the cost of upgrading from a 1950X/128GB DDR4 (good guess!) to the 7950X or 3D. It would be faster, but not in a way that would translate to $$$ directly. (Not to mention the inflation in TRx costs since AMD stopped playing catch-up.) performance-per-watt isn’t interesting to me (except for thermals but Noctua has me covered) because I pay real-time costs and it’s not a build farm.

If I had 100% CPU consumption around the clock, I would upgrade in a heart beat. But I’m working interactively in spurts between hitting CPU walls and the spurts don’t justify the upgrade.

If I were to upgrade it would be for the sake of non-work CPU video encoding or to get PCIe 5.0 for faster model loading to GPU VRAM.

washadjeffmad · on Dec 31, 2023

sTR4 workstations are hard to put down! I'll replace mine one day, probably with whatever ASRock Rack Epyc succeeds the ROMED8-2T with PCIe 5.0.

In the meantime, I wanted something more portable, so I put a 13700K and RTX 3090 in a Lian Li A4-H2O case with an eDP side panel for a nice mITX build. It only needs one cable for power, and it's as great for VR as it is a headless host.

euos · on Dec 30, 2023

I don’t notice myself sitting and waiting for a build. I don’t want to waste my time setting up a new workstation so why bother?

drgo · on Dec 30, 2023

To people who are thinking about using AI for data analyses like the one described in the article:

- I think it is much easier to just load the data into R, Stata etc and interrogate the data that way. The commands to do that will be shorter and more precise and most importantly more reproducible.

- the most difficult task in data analysis is understanding the data and the mechanisms that have generated it. For that you will need a causal model of the problem domain. Not sure that AI is capable of building useful causal models unless they were somehow first trained using other data from the domain.

- it is impossible to reasonably interpret the data without reference to that model. I wonder if current AI models are capable of doing that, e.g., can they detect confounding or oversized influence of outliers or interesting effect modifiers.

Perhaps someone who knows more than I do on the state of current technology can provide a better assessment of where we are in this effort

balls187 · on Dec 30, 2023

That is effectively what the GPT4 based AI Assistant is doing.

Except when I did it, it was python and pandas. You can ask it to show you the code it used to do it's analysis.

So you can load the data into R/Python and google "how do I do xyzzzy" and write the code yourself, or use ChatGPT.

drgo · on Dec 30, 2023

so ChatGPT can build a causal model for a problem domain? How does it communicate that (using a DAG?)? It would be important for the data users to understand that model.

mcny · on Dec 29, 2023

> All developers work with a fully fledged incident.io environment locally on their laptops: it allows for a <30s feedback loop between changing code and running it, which is a key factor in how productively you can work with our codebase.

This to me is the biggest accomplishment. I've never worked at a company (besides brief time helping out with some startups) where I have been able to run a dev/local instance of the whole company on a single machine.

There's always this thing, or that, or the other that is not accessible. There's always a gotcha.

8n4vidtmkvmk · on Dec 30, 2023

I never couldn't run the damn app locally until my latest job. Drives me bonkers. I don't understand how people aren't more upset and this atrocious devex. Damn college kids don't know what they're missing.

kamikaz1k · on Dec 30, 2023

I used to work in a company like that, and since leaving it I’ve missed that so much.

People who haven’t lived in that world just cannot understand how much better it is, and will come up with all kinds of cope.

physicles · on Dec 30, 2023

I can’t imagine not having this. We use k3s to run everything locally and it works great. But we (un)fortunately added snowflake in the last year — it solves some very real problems for us, but it’s also a pain to iterate on that stuff.

daxfohl · on Dec 30, 2023

We used to have that, but it's hard to support as you scale. The level of effort is somewhat quadratic to company size: linear in the number of services you support and in the number of engineers you have to support. Also divergent use cases come up that don't quite fit, and suddenly the infra team is the bottleneck to feature delivery, and people just start doing their own thing. Once that Pandora's Box is opened, it's essentially impossible to claw your way back.

I've heard of largeish companies that still manage to do this well, but I'd love to learn how.

That said, yeah I agree this is the biggest accomplishment. Getting dev cycles down from hours or days to minutes is more important than getting them down from minutes to 25% fewer minutes.

mcny · on Dec 31, 2023

I was thinking of this as I was reading the comments on another thread here

https://news.ycombinator.com/item?id=38816135

Like if you have logic apps and azure Data pipelines, how do you create and more importantly keep current the local development equivalents for those?

I'm not saying if you are YouTube that all the videos on YouTube must fit on a developer's local machine but would be nice if you could run the whole instance locally or if not, at least be able to reproduce the whole set up on a different environment without six months worth of back and forth emails.

daxfohl · on Jan 1, 2024

Yeah, the whole top down vs bottom up quandary. Top down is more boring but easier to support at scale. Bottom up is more dynamic but cross cutting changes end up taking inordinate effort.

teaearlgraycold · on Dec 30, 2023

I’m currently doing my best to make this possible with an app I’m building. I had to convince the CEO the M2 Max would come in handy for this (we run object detection models and stable diffusion). So far it’s working out!

lawrjone · on Dec 29, 2023

Author here, thanks for posting!

Lots of stuff in this from profiling Go compilations, building a hot-reloader, using AI to analyse the build dataset, etc.

We concluded that it was worth upgrading the M1s to an M3 Pro (the max didn’t make much of a difference in our tests) but the M2s are pretty close to the M3s, so not (for us) worth upgrading.

Happy to answer any questions if people have them.

aranke · on Dec 29, 2023

Hi,

Thanks for the detailed analysis. I’m wondering if you factored in the cost of engineering time invested in this analysis, and how that affects the payback time (if at all).

Thanks!

lawrjone · on Dec 30, 2023

Author here: this probably took a 2.5 days to put together, all in.

The first day was spent hacking together a new hot reloaded but this also fixed a lot of issues we’d had with the previous loader such as restarting into stale code, which was really harming people’s productivity. That was well worth even several days of effort really!

The second day I was just messing around with OpenAI to figure out how I’d do this analysis. We’re right now building an AI assistant for our actual product so you can ask it “how many times did I get paged last year? How many were out-of-hours? Is my incident workload increasing?” Etc and I wanted an excuse to learn the tech so I could better understand that feature. So for me, well worth investing a day to learn.

Then the article itself took about 4hrs to write up. That’s worth it for us given exposure for our brand and the way it benefits us for hiring/etc.

We trust the team to make good use of their team and allowing people to do this type of work if they think it’s valuable is just an example of that. Assuming I have a £1k/day rate (I do not) we’re still only in for £2.5k, so less than a single MacBook to turn this around.

bee_rider · on Dec 30, 2023

They could also add in the advertising benefit of showing off some fun data on this site :)

smabie · on Dec 30, 2023

But then they'd have to factor in the engineering time invested in the analysis of the analysis?

zarzavat · on Dec 30, 2023

Zeno’s nihilism: nothing is worth doing because the mandatory analysis into whether something is worth doing or not takes infinite time.

hedgehog · on Dec 30, 2023

I'm curious how you came to the conclusion the Max SKUs aren't much faster, the distributions in the charts make them look faster but the text below just says they look the same.

firecall · on Dec 30, 2023

So can we assume the M3 Max offered little benefit because the workloads couldn’t use the cores?

Or the tasks maybe finished so fast that it didn’t make a difference in real world usage?

afro88 · on Dec 30, 2023

Great analysis! Thanks for writing it up and sharing.

Logistical question: did management move some deliverables out of the way to give you room to do this? Or was it extra curricular?

BlueToth · on Dec 29, 2023

Hi, Thanks for the interesting comparison. What I would like to see added would be a build on a 8GB memory machine (if you have one available).

ribit · on Dec 30, 2023

Interesting idea, but the quality of data analysis is rather poor IMO and I'm not sure that they are actually learning what they think they are learning. Most importantly, I don't understand why they would see such a dramatic increase of sub 20s build times going from M1 Pro to M2 Pro. The real-world performance delta between the two on code compilation workloads is around 20-25%. It also makes little sense to me that M3 machines have fewer sub-20s builds than M2 machines. Or that M3 Pro with half the cores has more sub-20s builds than M3 Max.

I suspect there might be considerable difference in developer behavior which results in these differences. Such as people with different types of laptops typically working on different things.

And a few random observations after a very cursory reading (I might be missing something):

- Go compiler seems to take little advantage from additional cores

- They are pooling data in ways that makes me fundamentally uncomfortable

- They are not consistent in their comparisons, sometimes they use histograms, sometimes they use binned density plots (with different y axis ranges), it's real unclear what is going on here...

- Macs do not throttle CPU performance on battery. If the builds are really slower on battery (which I am not convinced about btw looking at graphs), it will be because of "low power" setting activated

epolanski · on Dec 30, 2023

> also makes little sense to me that M3 machines have fewer sub-20s builds than M2 machines.

M3s have a smaller memory bandwidth, they are effectively a downgrade for some use cases.

ribit · on Dec 30, 2023

You are not going to saturate a 150GB/s memory interface building some code on a six-core CPU... these CPUs are fast, but not that fast.

inkyoto · on Dec 31, 2023

Oh, yes, you are. The optimising steps, linking and the link-time optimisation (LTO) are very heavily memory bound, especially on large codebases.

The Rust compiler routinely hits the 45-50Gb/sec ballpark of the intra-memory transfer speed on compiling a medium sized project, more if the code base large. Haskell (granted, a fringe yet revealing case) case just as routinely hits the 60-70 Gb/sec memory transfer speed at the compile time, and large to very large C++ codebases add a lot of stress on the memory at the optimisation step. If I am not mistaken, Go is also very memory bound.

Then there comes the linking and particularly the LTO that want all the memory bandwidth they can get to get the job done quickly, and the memory speed becomes a major bottleneck. Loading the entire codebase into memory, in fact, the major optimisation technique used in mold[0] that can vastly benefit from a) faster memory, b) a wider memory bus.

[0] https://github.com/rui314/mold

ribit · on Dec 31, 2023

Which hardware were these results obtained on? Are you talking about laptops or large multi-core workstations? I am not at all surprised that linking needs a lot of memory bandwidth (after all, it’s mostly copying), but we are talking about fairly small CPUs (6-8 cores) by modern standards. To fully saturate M3 Pro’s 150GB/s on a multicore workload you’d need to transfer ~8 bytes per cycle/core between L2 and the RAM on average, which is a lot for a compile workload. Maybe you can hit it during data transfer spikes but frankly, I’d be shocked if it turned out that 150GB/s is the compilation bottleneck.

Regarding mold… maybe it can indeed saturate 150GB/s on a 6-core laptop. But they were not using mold. Also, the timing differences we observe here are larger than what would be expected with a linker like mold with a 25% reduction in bandwidth. I mean mold can link clang in under 3 seconds. Reducing bandwidth would increase this by a second at most. We see much larger variation in M2 vs. M3 results here.

inkyoto · on Jan 1, 2024

I was directly addressing the «saturate» part of the statement, not memory becoming the bottleneck. Since builds are inherently parallel nowadays, saturating the memory bandwidth is very easy since each CPU core runs a scheduled compiler process (in the 1:1 core to process mapping scenario), and all CPU cores suddenly start competing for memory access. This is true for all architectures and designs where memory is shared. The same reasoning does not apply to NUMA architectures but those are nearly entirely non-existant apart from certain fringe cases.

Linking, in fact, whilst benefitting from faster/wider memory is less likely to result in the saturation unless the linker is heavily and efficiently multithreaded. For insance, GNU ld is single threaded, gold is multi-threaded but Ian Taylor has reported very small performance gains from the use of multithreading in gold, and mold takes the full advantage of the concurrent processing. clang's lld is somewhere in between.

In M1/M2/M3 Max/Ultra, the math is a bit different. Each performance core is practically capped at the ~100Gb/sec memory transfer speed. Then the cores are organised into core clusters of n P and m E cores, and each core cluster is capped at the ~240Gb/sec speed. The accumulative memory transfer is ~400Gb/sec (800Gb/sec for the Ultra setup) for the entire SoC, but that is also shared with GPU, ANE and other compute acceleration cores/engines. Since each core cluster has multiple cores, a large parallel compilation process can saturate the memory bandwidth easily.

Code optimisation and type inference in strongly statically typed languages with polymorphic types (Haskell, Rust, ML and others) are very memory intensive, esp. at scale. There are multiple types of optimisation and most of them are of either the constraint solving or NP completeness type, but the code inlining coupled with the inter-procedural optimisations require very large amounts of memory on large codebases, and there are other memory bound optimisation techniques as well. Type inference for polymorphic types in the Hindley–Milner type system is also memory intensive due to having to maintain a large depth (== memory) in order to be able to successfully deduce the type. So it is not entirely unfathomable that «~8 bytes per cycle/core between L2 and the RAM on average» is rather modest for a highly optimising modern compiler.

In fact, I am of the opinion that the inadequate computing hardware coupled with the severe memory bandwidth and capacity constraints was a major technical contributing factor that led to the demise of the Itanium ISA (coupled with less advanced code optimisers of the day).

Tarball10 · on Dec 30, 2023

This is only tangentially related, but I'm curious how other companies typically balance their endpoint management and security software with developer productivity.

The company I work for is now running 5+ background services on their developer laptops, both Mac and Windows. Endpoint management, priviledge escalation interception, TLS interception and inspection, anti-malware, and VPN clients.

This combination heavily impacts performance. You can see these services chewing up CPU and I/O performance while doing anything on the machines, and developers have complained about random lockups and hitches.

I understand security is necessary, especially with the increase in things like ransomware and IP theft, but have other companies found better ways to provide this security without impacting developer productivity as much?

andrekandre · on Dec 30, 2023

  > have other companies found better ways to provide this security without impacting developer productivity as much?

only way i've seen is if things get bad, report it to it/support and tell them what folder/files to exclude from inspection so your build temp files and stuff don't clog and slow up everything

suchar · on Dec 30, 2023

Same here, but IMO, if company believes that such software is useful (and they wouldn't be using it if company believed otherwise), then why do they often (always?) include node_modules in exclusion rules? After all, node_modules usually contains a lot of untrusted code/executables

rendaw · on Dec 29, 2023

> People with the M1 laptops are frequently waiting almost 2m for their builds to complete.

I don't see this at all... the peak for all 3 is at right under 20s. The long tail (i.e. infrequently) goes up to 2m, but for all 3. M2 looks slightly better than M1, but it's not clear to me there's an improvement from M2 to M3 at all from this data.

vessenes · on Dec 29, 2023

The upshot: M3 Pro is slightly better than M2 and significantly better than M1 Pro is what I've experienced with running local LLMs on my Macs; currently M3 memory bandwidth options are lower than for M2, and that may be hampering the total performance.

Performance per watt and rendering performance are both better in the M3, but I ultimately decided to wait for an M3 Ultra with more memory bandwidth before upgrading my daily driver M1 Max.

lawrjone · on Dec 29, 2023

This is pretty much aligned with our findings (am the author of this post).

I came away feeling that:

- M1 is a solid baseline

- M2 improves performance by about 60% - M3 Pro is marginal on the M2, more like 10%

- M3 Max (for our use case) didn’t seem that much different on the M3 Pro, though we had less data on this than other models

I suspect Apple saw the M3 Pro as “maintain performance and improve efficiency” which is consistent with the reduction in P-cores from the M2.

The bit I’m interested about is that you say the M3 Pro is only a bit better than the M2 at LLM work, as I’d assumed there were improvements in the AI processing hardware between the M2 and M3. Not that we tested that, but I would’ve guessed it.

vessenes · on Dec 29, 2023

Yeah, agreed. I'll say I do use the M3 Max for Baldur's gate :).

On LLMs, the issue is largely that memory bandwidth: M2 Ultra is 800GB/s, M3 Max is 400GB/s. Inference on larger models are simple math on what's in memory, so the performance is roughly double. Probably perf / watt suffers a little, but when you're trying to chew through 128GB of RAM and do math on all of it, you're generally maxing your thermal budget.

Also, note that it's absolutely incredible how cheap it is to run a model on an M2 Ultra vs an H100 -- Apple's integrated system memory makes a lot possible at much lower price points.

lawrjone · on Dec 29, 2023

Ahh right, I'd seen a few comments about the memory bandwidth when it was posted on LinkedIn, specifically that the M2 was much more powerful.

This makes a load of sense, thanks for explaining.

cameroncairns · on Dec 29, 2023

I've been considering buying a Mac specifically for LLMs, and I've come across a lot of info/misinfo on the topic of bandwidth. I see you are talking about M2 bandwidth issues that you read about on linkedin, so I wanted to expand upon that in case there is any confusion on your part or someone else who is following this comment chain.

M2 Ultra at 800 GB/s is for the mac studio only. So it's not quite apples to apples when comparing against the M3 which is currently only offered for macbooks.

M2 Max has bandwidth at 400 GB/s. This is a better comparison to the current M3 macbook line. I believe it tops out at 96GB of memory.

M3 Max has a bandwidth of either 300 GB/s or 400 GB/s depending on the cpu/gpu you choose. There is a lower line cpu/gpu w/ a max memory size of 96GB, this has a bandwidth of 300 GB/s. There is a top of the line cpu/gpu with a max memory size of 128GB, this has the same bandwidth as the previous M2 chip at 400 GB/s.

The different bandwidths depending on the M3 max configuration chosen has led to a lot of confusion on this topic, and some criticism for the complexity of trade offs for the most recent generation of macbook (number of efficiency/performance cores being another source of criticism).

Sorry if this was already clear to you, just thought it might be helpful to you or others reading the thread who have had similar questions :)

karolist · on Dec 30, 2023

Worth noting that when AnandTech did their initial M1 Max review, they never were able to achieve full 400GB/s memory bandwidth saturation, the max they saw when engaging all CPU/GPU cores was 243GB/s - https://www.anandtech.com/show/17024/apple-m1-max-performanc....

I have not seen the equivalent comparisons with M[2-3] Max.

cameroncairns · on Dec 30, 2023

Interesting! There are anecdotal reports here and there on local llama about real world performance, but yeah I'm just reporting what Apple advertises for those devices on their spec sheet

vessenes · on Dec 30, 2023

All this sounds right!

If money is no object, and you don't need a laptop, and you want a suggestion, then I'd say the M2 Ultra / Studio is the way to go. If money is still no object and you need a laptop, M3 with maxed RAM.

I have a 300GB/s M3 and a 400 GB/s M1 with more RAM, and generally the LLM difference is minimal; the extra RAM is helpful though.

If you want to try some stuff out, and don't anticipate running an LLM more than 10 hours a week, lambda labs or together.ai will save you a lot of money. :)

cameroncairns · on Dec 30, 2023

The tech geek in me really wants to get a studio with an M2 ultra just for the cool factor, but yeah I think cost effectiveness wise it makes more sense to rent something in the cloud for now.

Things are moving so quickly with local llms too it's hard to say what the ideal hardware setup will be 6 months from now, so locking into a platform might not be the best idea.

nightski · on Dec 30, 2023

H100 is kind of a poor comparison. There are much cheaper ways to get to decent memory without that. Such as 2 A6000s.

Aurornis · on Dec 29, 2023

> - M2 improves performance by about 60%

This is the most shocking part of the article for me since the difference between M1 and M2 build times has been more marginal in my experience.

Are you sure the people with M1 and M2 machines were really doing similar work (and builds)? Is there a possibility that the non-random assignment of laptops (employees received M1, M2, or M3 based on when they were hired) is showing up in the results as different cohorts aren’t working on identical problems?

lawrjone · on Dec 29, 2023

The build events track the files that were changed that triggered the build, along with a load of other stats such as free memory, whether docker was running, etc.

I took a selection of builds that were triggered by the same code module (one that frequently changes to provide enough data) and compared models on just that, finding the same results.

This feels as close as you could get for an apples-to-apples comparison, so I'm quite confident these figures are (within statistical bounds of the dataset) correct!

sokoloff · on Dec 29, 2023

> apples-to-apples comparison

No pun intended. :)

aschla · on Dec 29, 2023

Side note, I like the casual technical writing style used here, with the main points summarized along the way. Easily digestible and I can go back and get the details in the main text at any point if I want.

lawrjone · on Dec 29, 2023

Thank you, really appreciate this!

JonChesterfield · on Dec 30, 2023

If dev machine speed is important, why would you develop on a laptop?

I really like my laptop. Spend a lot of time typing into it. It's limited to a 30W or similar power budget on thermal and battery constraints. Some of that is spent on a network chip which grants access to machines with much higher power and thermal budgets.

Current employer has really scary hardware behind a VPN to run code on. Previous one ran a machine room with lots of servers. Both expected engineer laptops to be mostly thin clients. That seems obviously the right answer to me.

Thus marginally faster dev laptops don't seem very exciting.

solatic · on Dec 30, 2023

> Current employer has really scary hardware behind a VPN to run code on. Previous one ran a machine room with lots of servers. Both expected engineer laptops to be mostly thin clients. That seems obviously the right answer to me.

It's quite expensive to set up, regardless of whether we're talking about on-prem or cloud hardware. Your employer is already going to buy you a laptop; why not try to eke out what's possible from the laptop first?

The typical progression, I would think, is (a) laptop only, (b) compilation times get longer -> invest in a couple build cache servers (e.g. Bazel) to support dozens/hundreds of developers, (c) expand the build cache server installation to provide developer environments as well

db48x · on Dec 30, 2023

This is bad science. You compared the thing you had to the thing you wanted, and found a reason to pick the thing you wanted. Honesty should have compelled you to at least compare against a desktop–class machine, or even a workstation with a Threadripper CPU. Since you know that at least part of your workload is concurrent, and 14 CPUs are better than 10, why not check to see if 16, 32, or 64 is better still? And the linker is memory bound, so it is worth considering not just the quantity of memory but the actual memory bandwidth and latency as well.

fourfour3 · on Dec 30, 2023

Being Mac only can be an advantage - I’ve been on both sides of trying to maintain & use non-trivial dev environments and the more OSes you bring in for people to work on, the harder it gets.

Bringing in Windows or Linux has a set up cost and a maintenance cost that may exclude it from even being considered.

Edit: plus, Macs are ARM, other options are inevitably x86. So it’s also two CPU architectures to maintain support for, on top of OS specific quirks - and even if you use eg Docker, you still have a lot of OS specific quirks in play :/

smoldesu · on Dec 30, 2023

My biggest issue with Mac-only shops is that almost nobody actually deploys to Mac. The majority of Mac-only firms I've worked at deploy to x86 Linux and develop in a VM on their Macbook (even pre-M1). Unless your business is writing Mac-native apps, MacOS is probably going to be a second-class deployment platform for you.

Even in an ideal scenario where your app already works on ARM, you will be dealing with OS-specific quirks unless your production machine runs MacOS.

fourfour3 · on Dec 30, 2023

These are fair points, and definitely a rough spot.

Eg at work we use M1/M2 macs and dev on those using docker - so that’s a Linux VM essentially with some nice tooling wrapped around it.

We certainly see differences - mostly around permissions (as docker for Mac doesn’t really enforce any access checks on files on the host), but we also mostly deploy to ARM Linux on AWS.

We went Mac only from a mix of Linux, Windows and Mac as we found the least overall friction there for our developers - Windows, even with WSL, had lots of problems, including performance issues. Linux we had issues finding nice laptops, and more support issues (developers are often not *nix experts!). Mac was a nice middle ground in the end.