Hacker News new | past | comments | ask | show | jobs | submit login
Please do not require AVX support for your software (pavel.network)
45 points by pavel_odintsov on May 15, 2023 | hide | past | favorite | 52 comments



Apple switched to Apple Silicon precisely because Intel was dragging it's feet on acceleration and by doing so could fire customers like you.

Seriously though, Intel has not been good at keeping all of its chips up to date with AVX and one consequence of that is that developers don't use it at all or they use a version that is 10 years out of date. If you look at those graphs in the article, single thread performance really is dying out there, the one thing Intel could offer to make their newer offering compelling is the latest version of AVX but they have been so hypnotized by smartphones that they just had to make a "big-little" clone that disable the most advanced AVX on the good cores so the weak cores not only waste your money, time and power, but actually sap performance from your strong cores. Customers buy AMD or Apple Silicon instead. Intel and their paid shills in the tech media will make as many excuses for this as AMZN will for why you can only get 5 day shipping with Prime, but all it means is 20 years from now some people will remember Intel the way we remember Sears or Atari or AMC.


> they have been so hypnotized by smartphones that they just had to make a "big-little" clone that disable the most advanced AVX on the good cores so the weak cores not only waste your money, time and power, but actually sap performance from your strong cores.

Absolutely, absolutely, amen. I'm going to hunt for a good 12600 with AVX-512 because my BIOS has an option to force-enable it—but I have to find a CPU that doesn't have the AVX-512 fused off on a hardware level (despite it being on the silicon).


"Please do not require racing slicks for your formula one cars, some of us are still racing with wooden wagon wheels and can't afford to upgrade to Bridgestone rubber tyres."

The listed excuses are all in the category: "We've been doing things wrong, sometimes for a decade or more, please don't make us change our erroneous ways!"

First of all, if anyone is using any kind of cloud hosting, then there is no excuse. Zero. None. All public clouds allow the choice of CPUs with instruction sets up to and including AVX-512. It's a dropdown menu. Go look at the options it has.

You didn't, did you? You deployed your high performance data warehouse cluster (Clickhouse) with the default VM SKU, didn't you? Admit it. That's a pile of money you'd rather burning than admit that you are too lazy to even glance at the menu options when creating a $50K/month VM cluster.[1]

The next one is the VMware Enhanced vMotion Compatibility (EVC), which allows specific CPU features to be masked out, enabling old servers to coexist with new servers in a single cluster. This too, is a setting that was almost certainly set once when the cluster was created, and nobody bothered to revisit the setting a decade later. I bet 95% VMware clusters with AVX masked are running on hardware 100% capable of using AVX. Again, this is leaving a ton of performance on the floor. Heck, I've seen brand new, uniform-hardware clusters with AVX-512 capable CPUs lobotomised down to SSE4.2 because admins have no clue what EVC actually does.

I remember reading the same articles by admins just as lazy as the guy from that rant when 64-bit-only server software first appeared. "It's too hard to ask the SOE team to make a 64-bit SOE!", or "Our backup software from 1999 doesn't support 64-bit!", etc...

[1] I watched several customers make this mistake, not just occasionally, but literally every time. We eventually had to block the 11-year-old VM SKUs using cloud policies to stop the unfathomable laziness.


Ok, so MongoDB adds requirement for AVX, and then you blame anyone having non-AVX hardware about laziness? Requiring to buy new hardware to run that software? When there could be possibility to use AVX where feature exists, and not use when it does not exist?

Similarly, Win11 adds new requirements for TPM 2.0 etc, so all not having that are blamed about laziness, and required to buy new hardware to run Win11?

It's like, if there is not enough landfill already.


Great feedback, thank you. We have customers in many countries around the World and in many cases even medium sized businesses cannot afford buying new equipment just to make some random software happy about CPU flags.

In multiple countries import tax may reach 50% on top of equipment cost. When multiplied by exceptionally weak local currency even medium level server will cost like racing car.

Old equipment without AVX is perfectly capable running modern workloads and artificial requirement to have AVX hurts people and increases digital divide.

To address such cases we found nice trick by using FerretDB to replace MongoDB in such environments: https://fastnetmon.com/docs-fnm-advanced/using-fastnetmon-ad...


> Similarly, Win11 adds new requirements for TPM 2.0 etc, so all not having that are blamed about laziness, and required to buy new hardware to run Win11?

How are people even finding hardware without a TPM? my random ASUS laptop from 2020 had a TPM, and Apple has been adding (custom) TPMs to all their computers since 2016. It's honestly more impressive that there are so many people without TPMs to use.


I don't find it surprising at all. Many of my clients and other individuals I know are still using PCs from 2014 or before.

Hardware from that era is still very capable, especially for basic office or home usage.

Even if those computers originally ran Windows 7, or 8, or 8.1, pretty much all of the ones I'm familiar with have been upgraded to Windows 10 since then. They could probably run Windows 11 just fine, too, if it weren't for these new requirements that don't really seem all that necessary.

Being able to use Windows 11 just isn't a compelling enough reason for these kind of people and organizations to buy new hardware, and to go through the hassle of replacing PCs that are otherwise working fine and have been for nearly a decade or longer.


I feel like I just don't understand, as someone who has to upgrade hardware every few years just because it breaks.

My PC laptop's keyboard is broken, the bottom panel is snapped off, one of the hinge covers is missing, the ethernet port is broken off the motherboard, the charging cable is soldered in place (because the barrel jack kept breaking), the USB-C port is iffy, the fans keep failing, and it took all that to make me get a new desktop. But that happened in the span of about two years of using the machine in bed. I can't imagine trying to make that machine last any longer, it's a lost cause.

> Hardware from that era is still very capable, especially for basic office or home usage.

Are people used to waiting for their computers to do things? I can't use slow computers because I heavily depend on multitasking and task switching, but seeing some of the things people put up with—like Firefox taking 15 minutes to load—makes me wonder if everyone else is just okay with having a slow computer.


> I feel like I just don't understand, as someone who has to upgrade hardware every few years just because it breaks. My PC laptop's keyboard is broken [...]

That's your mistake: you're using a laptop. In my experience, desktops are much more robust than laptops (and they're also more modular, so partial replacement when something breaks is viable); most of these people "using PCs from 2014 or before" are probably using desktops.

> Are people used to waiting for their computers to do things? I can't use slow computers because I heavily depend on multitasking and task switching, but seeing some of the things people put up with—like Firefox taking 15 minutes to load—makes me wonder if everyone else is just okay with having a slow computer.

Yes, they're used to it. You can't stand it because you're used to a faster computer, but those who are used to a slower computer might not even notice it, or they work around it (like going for a coffee while the software starts up).


> That's your mistake: you're using a laptop.

Not anymore. I've replaced my laptop with a desktop.

But yes. My mistake was indeed thinking that a Windows laptop could fill the void of a broken MacBook Pro. Build quality simply does not exist. The MacBook lasted over 5 years before it had a random logic board crap-out (unknown whether it was my fault or not).

> most of these people "using PCs from 2014 or before" are probably using desktops.

Yep.

> Yes, they're used to it. You can't stand it because you're used to a faster computer, but those who are used to a slower computer might not even notice it, or they work around it (like going for a coffee while the software starts up).

Valid argument, honestly.

I've experienced this with monitors: before my MacBook I could use a regular old 1080p monitor and not have any issues whatsoever, but now trying to use the same exact 1080p monitor today is extremely painful, and my minimum requirement is 4k.

And because my computers have been so fast, I'm used to being able to use alt+tab instead of having multiple windows on-screen, I'm used to being able to augment every conversation with quick Google searches, and so on. My workflow wouldn't work on a slow computer, because my workflow is efficient and demands interactivity.

But I guess if people are just OK with their computer being slow, there's no reason to upgrade just because it's getting out of date. And I guess if software requirements are the only reason people are being forced to upgrade, software requirements are what people are going to get upset with.


> That's your mistake: you're using a laptop.

Even so -- I just replaced the cheap laptop I was using as my main driver because it was starting to get a little flaky. It was 10 years old.


> as someone who has to upgrade hardware every few years just because it breaks.

You do? What are you doing to those poor machines??

> if everyone else is just okay with having a slow computer.

I don't have a good handle on what you consider to be a "slow" machine, but most people that I know aren't really that fussed about having the fastest possible machine.


> You do? What are you doing to those poor machines??

I... legitimately have no freaking idea. I do nothing to the machine, the fan bearings start to fail. I open the screen too many times, the bezel starts popping off and the hinge cover breaks due to that. And the ethernet cable was always loose in the port and trying to fix it eventually resulted in the port snapping off. And the charger barrel is so cheap that the metal coating scrapes right off the moment it ever rotates, causing it to stop being conductive. I think the main thing that was actually my fault was the keyboard because I think I drove a screw into the PCB and cracked it during one of my repairs (you have to remove the entire heatsink assembly and repaste the entire machine in order to get at the fans).

> I don't have a good handle on what you consider to be a "slow" machine, but most people that I know aren't really that fussed about having the fastest possible machine.

I started considering my laptop slow when it started taking a couple seconds to alt+tab. Sometimes it would even skip as if I pressed Tab twice when I really didn't. That kinda stuff annoys me so much because I need to be able to switch tasks really quickly. But I don't know if that's how decade-old machines really perform, which is why I just said "slow computer".

I guess by "slow" I really meant "unresponsive" where it doesn't react quickly to your inputs. That's what would annoy me the most from a computer.


The desktop computer I've been using since 2015 doesn't have a TPM and is still plenty fast for my needs. And probably lots of machines a fair bit newer don't have them either. Hardware from 2020 is very new.

Luckily, I have no desire to run Windows 11.


> The desktop computer I've been using since 2015 doesn't have a TPM

Yeah but it's from 2015. Windows 11 is from, like, 2021. That's over 6 years before the OS started asking for something you don't have. I think this is why people cry laziness, because it does legitimately sound that way from an outside perspective.


That doesn't sound lazy. That sounds like not wanting to replace perfectly good equipment.


For me "perfectly good" means "supporting the capabilities required to run the software I need to run" so once I need to run something that my computer doesn't support, it stops being perfectly good.


> How are people even finding hardware without a TPM?

Most of the computers I use don't have TPM. I buy older, used computers, but I think the oldest one I have in every day use is 7 years, so not ancient.

Of my friends, both technical and not, most of them have computers that are older than 4 years.


This comment honestly smells like arrogance. It's frustrating that you are writing as if ClickHouse and MongoDB is only used by large companies that are either can afford new servers or cloud production since that I have seen cases where software (especially MongoDB) being used on desktop-class and embedded-class hardware. To note, "Not all CPUs from the listed families support AVX. Generally, CPUs with the commercial denomination Core i3/i5/i7/i9 support them, whereas Pentium and Celeron CPUs before Tiger Lake do not." Intel is rather famous of withholding features for the sole sake of market segmentation (ECC anyone?)

I'll withheld my expletives on you narrow-minded comment but the point is that you are only considering the subset of actual users to the point that it's not funny.


Quite an edge case but Graylog 5 requires MongoDb 5 which requires AVX.

I was unable to get this setup on my 3 year old nas (920+) and had to resort to running an older version which will most likely stop getting updates very soon. AVX is old but apparently Intel decided to keep it out of certain line ups.


Yep that's serious issue and it's similar to our case. Our main product can work just fine even without SSE 4.2 but MongoDB requires it and then indirectly leads to AVX1 support as we use MongoDB as storage. We did PoC with FerretDB last month and I think it may be good option for Gerylog: https://www.ferretdb.io


It’s not only about old CPUs. Intel Atom still doesn’t support AVX. Atoms are quite common on industrial controllers. For example, Tensorflow crashes with an obscure error on Atom CPUs. If you don’t support AVX at least output an error message.


Recent Atom CPUs (since Gracemont) do support AVX and AVX2.


FWIW, the 'AVX' CPU feature is part of the x86-64-v3 level of the x86-64 microarchitecture standard [0].

There are chips produced in 2015 that are only supporting x86-64-v2 [0]. Also, according to [1], "The new server-class CPUs released in 2020 [2] do not implement the AVX instruction set."

FWIW2, RedHat Enterprise Linux 9 (RHEL9) requires x86-64-v2 or newer [1]. So, as a reference, they decided to not yet require support for AVX.

[0] https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...

[1] https://developers.redhat.com/blog/2021/01/05/building-red-h...

[2] https://www.intel.com/content/www/us/en/products/details/pro...


Thank you for sharing such great insights.


I dont much like this article since SIMD is not “niche” as the author says, even if people aren’t aware of it. It’s hard to pick up a single piece of software and say that “yes my overall workload overall will improve” by turning it on, but if you are aware of it, it will speed up most non-branching loop operations via auto-vectorisation.

Having had to work on software that did runtime dispatch for SIMD - there is a small performance cost to this. For most software it’s just not worth it. Compiling multiple versions is preferable but more confusing for users, and more costly for the developer in CI and build time. There are therefore good reasons not to support multiple microarchitectures and picking one released 12 years ago is a good compromise. I wouldn’t at all advocate picking AVX512 but this is not that - almost all consumer and professional grade hardware sold in the last 10 years supports AVX, and it’s unreasonable to expect vendors to continue supporting an ever shrinking niche…


“Runtime dispatch” does not require that the CPU feature be queried on each call. There are designs that can do a single query on launch and set up the environment so that future calls do not incur any additional penalty.


I have still a Intel Westmere cpu running, which is the CPU iteration before Sandy Bridge. Honestly this platform is outdated, its inefficient and slow. It's not even cheap, considering you can get Broadwell/Epyc Rome cheap from Ebay today. I'm just running it, because I have not had time to migrate yet and it works. I could probably run it for another 10 years, as long I don't update anything.

However if Linux should require a new CPU generation, I would quickly migrate it to my Epyc Rome VM server. No questions asked. A 10 years old CPU is outdated, its single core performance may be ok. But this generation had 4-6 cores max and sucked 100watts at full load, like 25 watt per cor. My Epyc uses 2 watt per core and is also faster per core.


This entire article is just laziness. Including the data.

I mean the graph provided is just incredibly outdated.. it gives hard data until 2010 and projects outwards.. charitably laziness, uncharitably cherry picked data.

https://mlech26l.github.io/pages/2020/12/17/cpus.html

You can see there's about a 10-15% uplift in perf every year in single thread performance. Multi-thread increases performance significantly as 16+ core cpus become more common.


This is, at its core a software distribution problem. Windows, Linux distros, etc. must have a way wherein the OS requests to download a more optimized version of the package.

I think this is most easily fixed for Linux distros, all it takes is for them to create a new architecture, say amd64_avx, which only contains packages with avx optimizations enabled where applicable.


One complication is that AVX is not one but more than a dozen ISA extensions each of which may or may not be implemented on a particular processor. This means software delivered to a customer should ideally check CPUID at runtime to dispatch the appropriate processing kernel. https://en.m.wikipedia.org/wiki/AVX-512


If someone just says "AVX" they usually mean AVX(1). And that's what the article is discussing.

AVX-512 forking into tons of different beasts is a separate, but related, problem. But it's more like how SSE2, 3, 4, 4.1 and so on existed.

Sometimes people said "SSE" and they might have meant one the later versions, but I don't hear the same statement with AVX, since people very explicitly seem to say AVX2 and AVX512.


I also want to point out that there’s also the extreme-crazy option that is Gentoo linux.

All packages are distributed as source and compiled on the destination machine before being installed there. And yes, you can modify build flags per package to enable/disable compile flags.

I recommend doing it for fun. It’s a crazy world.


The article was referring to the original AVX1 not the subsequent variants. Likewise I was referring to the first set of extensions.


It also seems like Clickhouse could offer runtime detection of AVX and dispatch to the optimized functions in that case.


Yes, here is the article about the techniques: https://maksimkita.com/blog/cpu-dispatch-in-clickhouse.html


It’s considerably more onerous than just compiling to a single/multiple microarchitecture(s) though. Plus when you do this, you need to split out this code to be conditionally compiled so that you can support other architectures like ARM.


Here is an example on how to do this using github.com/google/highway: https://gcc.godbolt.org/z/zP7MYe9Yf

You write the code only once and do not have to worry about any #pragma/conditional compilation. Just copy-paste about a dozen lines of boilerplate, link with the Highway library, and done.

Disclosure: I am the main author; happy to discuss.


That’s great! Never seen this library before. It’s much neater than the other approaches I’ve seen/used.


Thanks, glad to hear :) Feedback is always welcome, do let us know via Github issue if there is anything you think can be improved.


If anyone is interested, Intel has its own Linux distribution called Clear Linux which has all optimizations enabled. Of course this has the issue of not running on systems which cannot support all these optimizations, in particular, very few chips have AVX512 unless you have the latest 11th and 12th gen Intel CPUs.


Wasn’t avx-512 disabled rather shortly after alderlakes release?


> very few chips have AVX512 unless you have the latest 11th and 12th gen Intel CPUs

12th gen had AVX-512 physically fused off because the E-cores didn't support it.


No. AVX512 was disabled because the money-men want to force you to buy the far more expensive Sapphire-Rapids which are the same chip. Same old Intel. Learned nothing. They didn't just fuse it off, they also forced microcode updates that permanently disable it for working alderlakes.


It's almost as if people make these throwaway accounts just to post hot takes without affecting their real karma value.


Games used to shy away from AVX support for various reasons. Tbh even SSE4.2 was in bad shape due to AMD chips that were before Ryzen.

But not anymore. With hardware requirements of "new generation only" games are going to require AVX and many even AVX2. This is the only practical way to have consistency between platforms. Especially when crossplay is the norm. Idea that software should be 'portable everywhere' is impractical for games.


Sometimes it's interesting to read the comments on articles like these. Many if not most of the comments which lean towards ignoring support for older or lesser CPUs don't seem to care about much more than sheer popularity, as though something that 90% of the world has should become the de facto standard. That's the same logic that used to be used to rationalize not supporting free Unix / Linux.

Why do people who have money and fancy, sometimes expensive things so often insist on getting even more, when getting more means that those without the means to afford better get less? I suppose we could fix the planet if we could figure out how to deal with this.

So, in a nutshell, I dismiss those who have fancy CPUs and who insist that everyone use AVX, even when it affects those who can't.

Really, though, the answer is quite simple: run-time code paths for various CPU feature sets. This problem would be moot if programmers just programmed and let compilers and assemblers output AVX optimized code for CPUs that have it and regular x86 assembly for those that don't, in the same binary.


I think this article gives poor advice. You should provide binaries which provide best experience for magiority of your users. Old CPUs, Old Operating systems become edge cases at certain point which has cost to maintain which should be considered wisely.

It might be good option to provide alternative "legacy" binaries or enable users to compile their own binaries if users like based on project resources

Too many folks in particular in Open Source are very forceful about maintainers meeting their needs but not understanding resources can be limited and if you're sayting yes to something you're sayting no to something else - new features, better quality for majority of users etc.



Seems like JIT to the rescue here?


Here is a discussion about introducing AVX by default in ClickHouse: https://github.com/ClickHouse/ClickHouse/issues/40459

TLDR: runtime CPU dispatching and JIT - ok, AVX by default - not ok.


Amusingly this discussion led to SSE2-only build and discovery of some issues with their tests. Nothing serious, but still!

https://github.com/ClickHouse/ClickHouse/pull/41498




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: