Hacker News new | past | comments | ask | show | jobs | submit | mshockwave's comments login

> No one's doing ASM programming on x86 CPUs these days

I don't think that's entirely true...it's still pretty common to write high-performance / performance sensitive computation kernels in assembly or intrinsics.


> (Intermediate)1. Adding to memory faster than adding memory to a register

I'm not familiar with Pentium but my guess is that memory store is relatively cheaper than load in many modern (out-of-order) microarchitectures.

> (Intermediate)14. Parallelization.

I feel like this is where compilers come into handy, because juggling critical paths and resource pressures at the same time sounds like a nightmare to me

> (Advanced)4. Interleaving 2 loops out of sync

Software pipelining!


Reminds me that's also many people's speculation on how Qualcomm builds their RISCV chips -- swap an ARM decoder for a RISCV one.


That's not speculation.

Qualcomm made a 216-page proposal for their Znew[0] "extension".

It was basically "completely change RISC-V to do what Arm is doing". The only reason for this was that it would allow a super-fast transition from ARM to RISC-V. It was rejected HARD by all the other members.

Qualcomm is still making large investments into RISC-V. I saw an article estimating that the real reason for the Qualcomm v Arm lawsuit is that Qualcomm's old royalties were 2.5-3% while the new royalties would be 4-5.5%. We're talking about billions of dollars and that's plenty of incentive for Qualcomm to switch ISAs. Why should they pay billions for the privilege of designing their own CPUs?

[0] https://lists.riscv.org/g/tech-profiles/attachment/332/0/cod...


RP2350 is using a hybrid of ARM and RISCV already. Also it's not really hard to use RISCV not as the main computing core but as a controller in the SoC. Because the area of RISCV cores are so small it's pretty common to put a dozen (16 to be specific) into a chip.


I think this is indeed invented for controlled burns in fighting bush fires


So do I.


I'm actually glad to see that restriction in their study, because there are many cases where you just can't / more difficult to use PGO. One of the most common reasons I ran into is that clients refuse to use it (no, I'm not even joking), either they don't know how to come up with a good training data (in which case they'll point at your nose shouting IT DOESN'T WORK!!) or they thought it's a stupid idea (again, not joking). You'll be amazed by some people's stubbornness


I only buy keyboards, even when I wanted a compact one, with home/end keys because it's so much more productive especially in coding in a terminal.


> If you’re relying on it then you’re playing with fire.

Unfortunately I think Linux kernel is one of the most notable examples in which case you have to compile with -O1 or up.


That's not playing with fire.

Playing with fire is when you require an optimization to hit in a specific place in your code.

Having software that only works if optimized means you're effectively relying on optimizations hitting with a high enough rate overall that your code runs fast enough.

It's like the difference between an amateur gambler placing one huge ass bet and praying that it hits and a professional gambler (like a Blackjack card counter, or a poker pro) placing thousands or even millions of bets and relying on postive EV to make a profit.


NASA used to be nice over SLS was that they really didn't have a choice + congress pressure (hiring people who lost their jobs due to space shuttle cancelation) but now it seems like NASA _does_ have a choice to choose an alternative (and waaay cheaper) vehicle over SLS. Curious whether those senators will keep their pressure.


which flags did you use and which compiler version?


clang 19, -O3 -ffast-math -march=native


can confirm fast math makes the biggest difference


I feel like I’m kinda being the bad aunt by encouraging -ffast-math. It can definitely break some things (i.e. https://pspdfkit.com/blog/2021/understanding-fast-math/ ) but I use it habitually and I’m fine so clearly it’s safe.


> It can definitely break some things

I recall it totally fudged up the ray-axis aligned bounding box intersection routine in the raytracer I worked on. The routine relied on infinities being handled correctly, and -ffast-math broke that.

I see the linked article goes into that aspect in detail, wish I had it back then.

IIRC we ended up disabling it for just that file, as it did speed up the rest my a fair bit.


I would love a fast-math implementation which handled inf correctly, but no language/compiler seems to care.


Fast Math basically means "who cares about standards just add in whatever order you want" :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: