More

mshockwave · 2024-11-15T16:50:15 1731689415

> No one's doing ASM programming on x86 CPUs these days

I don't think that's entirely true...it's still pretty common to write high-performance / performance sensitive computation kernels in assembly or intrinsics.

mshockwave · 2024-11-15T04:23:23 1731644603

> (Intermediate)1. Adding to memory faster than adding memory to a register

I'm not familiar with Pentium but my guess is that memory store is relatively cheaper than load in many modern (out-of-order) microarchitectures.

> (Intermediate)14. Parallelization.

I feel like this is where compilers come into handy, because juggling critical paths and resource pressures at the same time sounds like a nightmare to me

> (Advanced)4. Interleaving 2 loops out of sync

Software pipelining!

mshockwave · 2024-11-06T06:30:07 1730874607

Reminds me that's also many people's speculation on how Qualcomm builds their RISCV chips -- swap an ARM decoder for a RISCV one.

hajile · 2024-11-06T19:35:21 1730921721

That's not speculation.

Qualcomm made a 216-page proposal for their Znew[0] "extension".

It was basically "completely change RISC-V to do what Arm is doing". The only reason for this was that it would allow a super-fast transition from ARM to RISC-V. It was rejected HARD by all the other members.

Qualcomm is still making large investments into RISC-V. I saw an article estimating that the real reason for the Qualcomm v Arm lawsuit is that Qualcomm's old royalties were 2.5-3% while the new royalties would be 4-5.5%. We're talking about billions of dollars and that's plenty of incentive for Qualcomm to switch ISAs. Why should they pay billions for the privilege of designing their own CPUs?

[0] https://lists.riscv.org/g/tech-profiles/attachment/332/0/cod...

mshockwave · 2024-11-06T06:27:31 1730874451

RP2350 is using a hybrid of ARM and RISCV already. Also it's not really hard to use RISCV not as the main computing core but as a controller in the SoC. Because the area of RISCV cores are so small it's pretty common to put a dozen (16 to be specific) into a chip.

mshockwave · 2024-11-04T01:14:55 1730682895

I think this is indeed invented for controlled burns in fighting bush fires

Wistar · 2024-11-04T05:36:29 1730698589

So do I.

mshockwave · 2024-11-02T22:09:46 1730585386

I'm actually glad to see that restriction in their study, because there are many cases where you just can't / more difficult to use PGO. One of the most common reasons I ran into is that clients refuse to use it (no, I'm not even joking), either they don't know how to come up with a good training data (in which case they'll point at your nose shouting IT DOESN'T WORK!!) or they thought it's a stupid idea (again, not joking). You'll be amazed by some people's stubbornness

mshockwave · 2024-11-02T21:59:56 1730584796

I only buy keyboards, even when I wanted a compact one, with home/end keys because it's so much more productive especially in coding in a terminal.

mshockwave · 2024-10-28T15:54:43 1730130883

> If you’re relying on it then you’re playing with fire.

Unfortunately I think Linux kernel is one of the most notable examples in which case you have to compile with -O1 or up.

pizlonator · 2024-10-28T17:13:41 1730135621

That's not playing with fire.

Playing with fire is when you require an optimization to hit in a specific place in your code.

Having software that only works if optimized means you're effectively relying on optimizations hitting with a high enough rate overall that your code runs fast enough.

It's like the difference between an amateur gambler placing one huge ass bet and praying that it hits and a professional gambler (like a Blackjack card counter, or a poker pro) placing thousands or even millions of bets and relying on postive EV to make a profit.

mshockwave · 2024-10-19T18:35:56 1729362956

NASA used to be nice over SLS was that they really didn't have a choice + congress pressure (hiring people who lost their jobs due to space shuttle cancelation) but now it seems like NASA _does_ have a choice to choose an alternative (and waaay cheaper) vehicle over SLS. Curious whether those senators will keep their pressure.

mshockwave · 2024-10-15T22:39:18 1729031958

which flags did you use and which compiler version?

QuadmasterXLII · 2024-10-15T22:42:12 1729032132

clang 19, -O3 -ffast-math -march=native

mshockwave · 2024-10-15T22:56:20 1729032980

can confirm fast math makes the biggest difference

QuadmasterXLII · 2024-10-16T00:14:27 1729037667

I feel like I’m kinda being the bad aunt by encouraging -ffast-math. It can definitely break some things (i.e. https://pspdfkit.com/blog/2021/understanding-fast-math/ ) but I use it habitually and I’m fine so clearly it’s safe.

magicalhippo · 2024-10-16T01:11:24 1729041084

> It can definitely break some things

I recall it totally fudged up the ray-axis aligned bounding box intersection routine in the raytracer I worked on. The routine relied on infinities being handled correctly, and -ffast-math broke that.

I see the linked article goes into that aspect in detail, wish I had it back then.

IIRC we ended up disabling it for just that file, as it did speed up the rest my a fair bit.

hansvm · 2024-10-16T14:05:51 1729087551

I would love a fast-math implementation which handled inf correctly, but no language/compiler seems to care.

mkristiansen · 2024-10-15T23:19:48 1729034388

Fast Math basically means "who cares about standards just add in whatever order you want" :)