Hacker News new | past | comments | ask | show | jobs | submit login
The compiler will optimize that away (royalsloth.eu)
348 points by 0-_-0 on May 2, 2021 | hide | past | favorite | 324 comments



The implicit premise of the article seems to be that all software has to be heavily optimized. This is completely wrong. All the languages mentioned in the article are still around because it doesn’t matter how performant 99% of code is. For the 1%, we can think about cache misses, SIMD and data parallel approaches. In my experience this is totally possible and not too hard, but has the enormous downside of making that code hard to reason about. You are also optimizing for today’s hardware, which means that it might not run that great on new hardware in 20 years. This means that somebody has to go in and rewrite that hard to understand optimized code. It’s really a tradeoff between the cost of that and the benefits from optimization.

Some of the future costs can be mitigated by using a few techniques from the start. One good idea is to keep an vanilla unoptimized version of the part that you are optimizing around as a reference implementation. This will help understanding what the optimized code does and can be used as a stopgap to support new hardware. It can also be a nice baseline to compare optimizations against.

The beauty of compilers is that they do a decent job of optimizing the 99% of code that would otherwise never be optimized, at 0 cost. This is a different sweet spot on the cost benefit curve.


I disagree. Almost all software I use these days is absurdly slower than it should be and probably a massive drain on collective productivity. I have ssd, fast internet, 64gb ddr4 etc, almost everything I do should be impercetibly instant but instead takes seconds and even minutes.

Yeah of course most of that slowness is not even because of ignorance of the low level stuff but complete disregard of performance aspect of software. If it's even slightly more convenient for a programmer to do something that is very stupid performance wise, they will do it.

Sure developer time costs, but this cannot actually be the reason. Because tooling is also ridiculously slow in exactly same way and wastes countless of hours of that costly developer time.


> I disagree. Almost all software I use these days is absurdly slower than it should be and probably a massive drain on collective productivity.

This is absolutely true, but the problem is usually a lack of due care about performance at higher levels, rather than low level optimization. Things are usually slow because they do entirely too much of the wrong things, than because nobody optimized the things, or the wrong registers were used or what not.


The article presents the case that writing code with a specific style - OOP compared to Data Oriented - is responsible for slowing down performance. That code architecture is primarily the cause, regardless of the actual implementation, because some architectures will never work well with the compiler.

I get that like the major parts of picking algorithms that are N over N^2 is always going to be more important, but it doesn't seem like the article is disagreeing with you at all, just another area where people need to consider their decisions.


But only a strict subset of problems are heavily data oriented. For your 3 button GUI app whether you use a tightly packed AoS, even SoA doesn’t matter over a slow linked list of pointers for that 3 element list implementation doesn’t matter.

And OOP is simply not contradictory to DOD.


(Agreeing) The 3 button GUI app is much more likely to be/feel slow because the UI framework used makes it difficult to get the widgets on the screen quickly than because of data processing.

Maybe it insists on a deep heirarchy of widget inheritence as many do. Maybe the framework wants to load and configure all the possible widgets, even though the application only uses a few. Maybe theres a bunch of images loaded from disk even though they're not used. Maybe it's just the fantastic default compositing system that puts everything a frame behind adding up with everything else.

There certainly are applications where data structures matter, and time spent processing data is significant to user experience, but that's not why the whole computing experience feels slower (although maybe prettier) than 25 years ago, despite capability being so much more for most things (as pointed out in the article, ram is better than years ago, and we certainly have more of it, but ram access takes a lot more cpu cycles now)


I am torn on the issue. On one hand, there is definitely useless technical debt, over-abstraction, no longer useful features slowing down our programs. But at the other hand, we are actually handling every language on Earth properly (or at least we very well should already), instead of just saying that yeah ascii is good enough, somewhat care about accessibility though far from enough, etc.

Also, actual native GUI apps are quite fast imo. Only electron apps are slow, but I really do think that it is mostly due to the web having a wrong abstraction for GUIs, not due to other reasons (of course there are shitty web apps, as well as shitty programs everywhere else)


> Sure developer time costs, but this cannot actually be the reason.

It totally is because we've abstracted the cost of software engineering unto the hardware.

Back in the 90s, people had to basically beg for memory when writing software while modern development is built on the theory of getting the fastest delivery speed at the cost of performance. We outsource the cost of software unto millions of customer's hardware and it's never coming back because going to market quick makes you money.


The point stands anyway. Why are all my dev tools super slow? If my time is important, shouldn’t those be fast and heavily optimized? And don’t try to tell me that 4 billion cycles a second across 16 cores operating on in-memory contiguous values isn’t enough to update a watch window more than 1 time per second.


If my tools are faster than me, there’s no point in making them even faster. In fact, I’d trade some of that speed for a smaller memory footprint.

In your example of the watch window, updating it at more than 30 fps is probably overkill.

One very common mistake is optimising the wrong thing. My advice is always the same: first run a profiler with a real workload (absent that, eu it over your automated tests), then look at what you need to optimise for speed. All the rest should be optimised for readability.

In the article’s case, if counting warriors is so important, keeping a counter at the AntColony class that’s updated every time you write to an Ant would be the obvious solution.


> Why are all my dev tools super slow? If my time is important,

Quantify important into how much you spend on your tooling.

Aside from some key bits of development software at the IC / board layout, almost everything a software developer needs is free, open source or cheap. Noone pays $1000 for an IDE anymore.


> Noone pays $1000 for an IDE anymore.

How many minutes per developer per day before that level of expenditure pays for itself inside 2-3 months?

Who cares? The premise is false, you can't buy it. There is no "fast tools" you pay for. Every decent shop got their devs an SSD at the first available opportunity because it paid for itself really quickly, cost you money _not_ to buy them. The very definition of a false economy was "no the devs aren't getting new shiny sportsmode SSDs for their boxes."

We all use gcc and clang (for relevant languages), for example because everything else you might pay for sucks a lot harder. Whatever the ideology you claim or deny. Intel literally damaged their customers' products to try to make themselves look better than AMD with theirs! They're a non-starter after showing themselves willing to do that but if you must, yes, their compilers still suck exclusively on intel more than gcc. How is microsoft's compilers standards chasing going? In the 21st century yet or /still/ not? (Oh come one, we're mostly there but for ... yeah). Compiling template heavy C++? Does developer time matter?

(But just quietly, terminals & cmdline tools feel a lot more productive than IDEs to me and I use both in differing circumstances).


I don't have a hard math here. But here's some example: JRebel. It allows for instant class reloading in JVM. It increases one's productivity tremendously. And it's expensive as hell. They used to have free version, they don't have it anymore. I can only judge that they're struggling and forced to squeeze all they can, because developers will prefer free restricted version over paid full-featured version. I know, I do, I don't value my own time and I'll never pay more than $100/year for a tool (and hopefully much less), I already trying hard to justify Idea subscription, because it seems that Idea CE contains almost everything I need.

If developers would love to spend thousands of bucks here and there, paid developer tools industry would flourish. But I think that it's even degraded compared to the past. I remember paid Delphi IDE and I remember paid components and that was a thing back then. I can't remember any popular paid React component right now, despite the fact that much more people are using React compared to Delphi.


Shit is slower than it was in the 90s.

Spotify is slower than Winamp, Slack is slower than IRC, Webmail is slower than my e-mail client, VS Code is slower than Visual C++ 6.0.


People forget that Winamp was instant on a 60Mhz machine with a spinning rust drive. It’s difficult to buy a computer that slow these days. Modern CPUs modulate their speeds by far more than that as part of their moment by moment power management!


Related: when MacBooks switched to SSDs, the performance of my old spinning-disk MBP dropped noticeably basically overnight.


Same thing on Windows. It became incredibly slow on HDDs when SSDs started to become popular.


When I first bought an SSD and noticed the crazy speed-up, my next thought was that developers will get used to it as the "new normal" and stop caring about IOPS which would slow down SSDs (ruining my blink-of-an-eye super-speed fun) and make HDDs near-unusable. Two to four years later, here we were on Windows, and to some extent, on Linux too.


The amount of effort involved in making sure reads were sequential back in the HDD days was immense. It’s not surprising people dropped that as soon as possible.


Having an Hdd running at 100% on Windows 8, slowing everything down, is actually what made me switch to Linux Mint. Never looked back.


Winamp didn't stream all music in existence over the internet. How is it really comparable?


Because streaming music is a trivial task, computationally-wise. It could have had some performance impact in 2000, but we're in 2021 now.

And nothing else related to the functional differences between Spotify and WinAMP can explain why Spotify is slow. Done properly, it should feel snappier than WinAMP felt in 2000.


It's comparable only on an abstract level - mostly because there's hardly anybody who really wants/needs "all music in existence over the internet", most people just want to "listen to music that I like".

On that level, Winamp did the job just fine.


It really did not for me by that standard. I have access to orders of magnitude more music now that I know I like than I did during the Winamp days and I was a pretty early and savvy user back then so I had more access than a lot of people.

If we're trading music player performance for the massive increase in availability I'll make that trade.


Is that why everyone uses a streaming service and no one collects music any more?


Lol arguably Winamp + Napster did :D

But really, 99% of the value-add of Spotify over Winamp is stuff that happens server-side. Your Spotify client doesn’t have a database of all the songs, nor the ML models for computing recommendations. As far as playing streaming music goes, Winamp would happily play an m3u (mmmm soma.fm).

Don’t get me wrong, the discovery and search features in Spotify have brought me a ton of value. But wow is the client resource usage dramatically disproportionate for the added functionality


And the kicker is, the music client is a completely separate concern from a song database, or a recommendation engine.

This is actually a point in favor of directly comparing WinAMP and Spotify - the important "value-add" bits of Spotify all happen server side, so they should have exactly zero impact on client performance.


It had Shoutcast.


Was Shoutcast all that fast? It didn't seem nearly as rock solid as the rest of winamp but I guess it could have been my internet at the time.


You could go to the shoutcast website, get the shoutcast links for the radios you were interested in, then make a playlist that could be dealt with just like any other playlist.

And it was almost instant.


Yes, it was instant.


Winamp 2 was instant. Winamp 3 and 5 were slow.


>Spotify is slower than Winamp, Slack is slower than IRC, Webmail is slower than my e-mail client, VS Code is slower than Visual C++ 6.0.

All of which are Electron apps, apart from "Webmail" which is, well, what one might call a "distributed Electron app" (a website.)

Maybe, just maybe, Electron is the problem.


Spotify is not an Electron app, they are using Chromium Embedded Framework.


Yeah, this is the weirdest thing to me: Windows NT 3.51 on a Pentium was much snappier perceptibly than even my new M1 MBP (developers are already expanding to the new performance envelope).


Dan Luu has done some measurements around this:

https://danluu.com/input-lag/

https://danluu.com/term-latency/


Similarly, SNES games were more responsive than modern games and have zero loading screens.


And didn’t render millions of polygons at 60+ fps, while having many GBs of assets like textures, so I don’t see how is it relevant.

One could write those programs in a truly inefficient manner in a not too performant language and it would run without problem. Today’s computers are really fast.


The problem is that, instead of the speed of modern computers translating to perceptible performance improvements for normal users, it translates to applications that use more resources.

Whether or not the new features make up for the general perceptual slowness of modern computers is a somewhat open question.


I would agree with you, but in terms of games it is simply not a great example. It may be questionable why would we want photorealistic games when 2D, visible pixels are good enough, but games are not known for being inefficient.


Yet many of the games were more entertaining than many games today.


That’s completely orthogonal.


Windows Server 2019 has a much snappier UI then Windows 10 I've found (I recently spent a ton of time automating installs for both and boy did that become noticeable).

Then you look at the Windows 10 list of crapware and it becomes a lot clearer.


Windows 10 is an abomination.

It’s incredible how much grief Windows Vista got in comparison.


Vista was a worse abomination on 2 GB of RAM. It's likely that if I installed a build of Windows 10 from 2 years ago on the machine in question, it would be faster than a greased lightning in comparison.


Then you look at the Windows 10 list of crapware and it becomes a lot clearer.

Is there an actual causal relationship? And which crapware specifically? Asking because I only have access to a bunch of Windows 10 Pro machines which already don't seem to have most of crapware on there (i.e. often I see threads here where people complain about all kinds of ads and other things I never even knew existed) and the rest disabled (as far as I can tell) and still it's less snapy than this one Windows Server instance on comparable machine. (and sadly no single modern OS feels as snappy as even Windows XP SP2 for a similar level of functionality on not-so-recent hardware)


The difference is the list of running background services. Kernel and drivers are the same, libraries are the same, but server doesn't have (ex) the AllJoyn Router Service for Smart Things Control, the 4G LTE Push Notifications Service, the WAP Push Message Service, the Fax Service, Windows Image Acquisition, and so on. Candy Crush doesn't run on startup, these do.


The subjectively fastest, which is to say the most responsive or lowest input latency, computer I've ever used was a Macintosh 512ke upgraded to 2.5 megabytes of memory, running the system software and apps off of a 1.5 megabyte RAM-disk. When you double-clicked, say, MacPaint in the Finder, it was loaded before you finished the physical mouseup. This was when that hardware and software was still current.


Spotify runs on all of my devices. In the morning I can ask my smart speaker to play songs that Spotify presumes I like, continue from my car with a voice command, see what my friends are playing or play their playlists. I'll gladly wait for the ~5 seconds it takes for an app to start if it delivers all this.

Of course IRC is much faster, it's also few orders of magnitude simpler than Slack. For work related communications I prefer Slack to IRC, but for chatting with friends IRC does just fine. One simple protocol vs hundreds of APIs that provide extremely rich content. Once again, Slack takes few seconds to launch on a modern machine, it's not that bad considering it does so much.

Yeah shit is slower, shit is also way more connected and complex than in the 90s.


Slack is definitely faster than my IRC client since it's not running over ssh to a host. You can't actually run IRC locally or you'll miss all the chat from when you're not signed in.


You can use a bouncer so that's not true


I dont know if you've noticed, but most programmers actually do care about performances. They just are bad at it. Software are slow precisely because programmers dont get it: optimising all the code will make slow software. They spend their budget optimising 99% of the code and end up with no more money to make that last 1% fast. For real (game engine programmer speaking). Plus optimising all the code make all the code unreadable which drains even more money from budget. Just as OP said.


Usually simpler code is better. I used to see a lot of mistaken "high performance" code where someone has unrolled all of the loops because they think unreadable code goes faster, though maybe people have gotten better about this? (OpenSSL is an example here.)

Unfortunately not all kinds of slowness only happen in hotspots. This is true for CPU cycles, but if an occasional task uses all memory, it's going to mess up everything downstream as well.


I wonder where the trade-off is for loop unrolling.

Like, unrolling a `for` loop that only has 5 iterations makes sense. But if you have 100 iterations, then the larger memory footprint of all the code might actually make it slower than just keeping the `for` loop.

In some cases, you can use Duff's Device. https://en.wikipedia.org/wiki/Duff%27s_device


> I dont know if you've noticed, but most programmers actually do care about performances. They just are bad at it.

I disagree. Programmers may care about performance, but project managers certainly don't and they always have the final say.


Programmers may care about performance, but project managers certainly don't and they always have the final say

LOL no. Offer a dev choice between a framework that is trending on Twitter, vs one that is 100x faster but "legacy" (meaning more than a year old) and she'll choose the first one every time, and the project manager won't know the difference, or care if he did.


If it's good to develop with, I don't care one whit whether a software framework is 2 months old or 20 years old. Besides, not everyone works on greenfield projects and can pick what framework they want.


I’m with both of you and I don’t think what you two are saying is mutually exclusive. I’m kind of a performance nut (often working with teams, I’m the guy that introduces everyone to a profiler) and, in my N=small experience, PMs are opposed to spending time on performance optimization precisely because most people are bad at it and it takes an undefined amount of time for unknown performance gains.

When I take an hour to profile and analyze some request and say “I think we can cut this thing’s response time by 80% by optimizing $X, which will result in an overall page load time reduction of 3s”, that’s way easier to sell than “I want to figure out why this page load is so slow” and then reporting back a week later with “shrug I optimized 5 things and it didn’t get faster. Dunno.”


Most performance concerns that I run into are related to advertisements that need to load.

The business model isn't right for performant experiences


Faster = more ad revenue. If your app/site is too slow, people will stop using it and won't see your ads. Also, search engines will detect that and downrank you, meaning even less ad impressions. It has been studied many times, and yet, developers don't do it.

Simply, performance is expensive and for most businesses, the additional revenue is not worth the additional expense.


Almost all software I use these days is absurdly slower than it should be and probably a massive drain on collective productivity

Productivity, energy usage, e-waste in landfill as people are forced to upgrade in order to do simple tasks with the current software... The world is paying a high price for programmer laziness.


Is it programmer laziness, or insane demand for software due to many factors, mostly profit?


Is it programmer laziness, or insane demand for software due to many factors, mostly profit?

There's insane demand for games, and those programmers are able to optimise their code, insane demand for trading software and those programmers are able to optimise their code. But webdevs just don't care. I look at pages now that do nothing more in functionality than the same app would have done 20 years ago, yet they are slower now, and I have literally 1000x more CPU power. It's insane.


Optimized games have a well known impact on sales. I'd assume the same with trading software. Putting out a blanket statement that web devs are lazy is just silly. Developers rarely have much say in how much time they can dedicate to features and the app as a whole, that usually boils down to PMs and management. If the org as a whole doesn't value performance or doesn't think it matters, then that will get reflected in the product regardless of what the developers actually want.


Do you use browsers? Browser makers spend billions of dollars in engineering time making them "fast", with great results.


Do you use websites? Because developers unthinkingly pull in billions of dependencies, making them "features", with horrific results.

And fast is one aspect of performance. What's the memory footprint?


And yet developers working on front-end code to run in those browsers are waiting seconds for their linter or test suite or transpiler to process a few thousand lines of JS code, on a modern PC. Not a few million lines, a few thousand. That is orders of magnitude slower than it should be, and it's a cost that hits huge numbers of developers many times every day. Quite often, that awful performance is because the slow tools are written in a certain language also used in those "fast" browsers, and far too many developers are still giving this a pass because they incorrectly assume some magical runtime and JIT will make up for using a language that was never designed for high performance work and writing large, complicated programs.


The problem with the NPM ecosystem is really dire because know-nothing programmers who write JS plug in to that ecosystem and write bad code, and then the know-nothing programmers who don't write JS come along, too, and say look how bad JS is.

JS is plenty fast and can be used for "large, complicated programs"[1] just fine. The problem with most JS written today is in programmer practices—the way that the community associated with NodeJS pushes one other to write code (which is, ironically, not even a good fit for the JavaScript language). It turns out that how you write code actually matters (which is the central point of the article here to begin with...)

1. https://web.archive.org/web/20070127104743/http://www.spread...


Browsers are fast, but the content they render is very slow.


These are related: the faster the browser and the typical internet connection is, the bigger sites will be (absent some force to prevent developers from spending the entirety of the new performance budget.)


Reminds me of the "paradox" that making technology more energy-efficient has the effect of increasing overall energy use.



>"Browsers... "fast", with great results."

Are you joking?

I have seen progressively worse performance from even the best browsers, and page load times that should be instant often literally take minutes or never load unless I completely kill the browser process and return. The slowdown is nearly inexorable, with occasional improvements in some versions before resuming the dismal trend.

Simple word processing and spreadsheets are so laggy on keystrokes as to be almost unusable, and by unusable, unresponsive on a level hundreds of times worse than DECADES ago, on computers orders of magnitude less powerful. Simple cursor movements are so laggy that I must set aside my train of thought to attend to the tool.

And this is on a very solid CAD-level computer, FIOS connection, etc.

It is disgusting and unforgivable. I quit software career 15 years ago for a new industry in no small part because I saw this trend of ever more complex "tools" & "frameworks", etc. creating a situation of building castles on shifting sands, with serious declines in the ability to reason about debugging, performance, or security. It is only worse now, probably exponentially. Worse yet, it seems to have yielded no perceptable "programmer productivity".

It is one thing to architect and program to take advantage of upcoming advances in hardware. It is quite another thing to ignore it and assume that the hardware builders will save you from the bloatware that you foist on the world without a serious thought.


> Are you joking? I have seen progressively worse performance from even the best browsers

Where is the evidence that you're seeing poor performance from the browser itself and that the source of the problem does not lie in the difference between what the server is sending down the tubes today compared to what it was sending 10 years ago?


I agree that what the server is 'sending down the tubes' is a huge part of the problem.

But this is the root cause we're discussing - programmers selecting tools for their convenience (and worse yet, cool factor), instead of FIRST considering the responsiveness of the system as they design and code.

Optimization as an afterthought is about as good as security as an afterthought - anything from a complete waste of time to a disaster.

There are indeed pages that load like lightning, so it can be done (e.g., HN takes about 1.5sec to create a new window and load, so not exactly lightning, but usable), but many are horrible, and clearly due to bad programming.

For starters, when I see a page that loads code from 25 different sites that need NoScript privs to even display, that alone is pretty questionable - license and manage your own damn code (for the sake of minimizing dependency alone!). Twitter is particularly egregious in the last year or so, a new page taking 10sec-?? to load, and the LAST thing that loads is the list of posts -- the same load times it would feel so much more responsive if that was the first to load, and the other navigation, news, etc. panels loaded later while I was reading. That is a many bad programming choices.


> this is the root cause we're discussing

It's not. Someone explained that browser makers spend billions on top talent to make browsers fast, and you posted a flippant comment — "are you joking?" — about your observations that performance of browsers is getting worse.

Now you're talking about stuff that web developers do on the pages that you visit.

Browsers are an example of software that is fast because companies have put effort into making them that way, instead of not caring. That's the claim made by the person you responded to. Dispute it, if you want, but don't make claims and then change the subject or shut down inquiry into the things that you're saying.


But the browser itself is plenty fast. Just open a huge static HTML file, or some well-optimized JS one like some simulation.

The problem is some websites


I'm using very underpowered laptop (Dell 3410) and web is extremely fast for me. There's some serious problem in your OS or your network if websites take minutes to load. Dreadful gmail or youtube loads in 1-2 seconds for me and then works instantly. Simpler websites like HN loads in a fraction of second.


Maybe you should visit better websites? Try http://www.quakejs.com/



You're both right - but about different kinds of software.

I believe the GP post was thinking an application which has some heavy computational kernel to it, where most of the processor time (and other resources) are spent - with a lot of other code for UI, configuration, some parsing etc.

And you seem to be talking about everyday desktop applications: Browser, mail client, word processor, instant messaging client, and maybe even your IDE as a programmer. Those kind of apps don't have that kernel of hard computational work - their work is much more diffuse.


I spent years optimizing physics simulations from different domains (weather, fluid dynamics, all sorts of stuff). Even there you have to carefully pick the parts you want to optimize to get the best outcome for the resources you invest in optimization. It's completely infeasible to make everything crazy fast with a blanket approach.


You're speaking directly past your parent's point. I've spent years optimizing physics simulations, more abstract discrete math stuff, etc. myself. There, especially there, it's usually pretty easy to identify what should be the hot loop, by eye. Easier with a profiler if the project gets big. In scientific computing, it's very common to find that 99% of the time is spent on a scant few lines of code. Unless somebody bungs up the I/O and you end up spending 99% of the time there instead, but I digress.

But a whole OS, with a browser, network stack, dozens of applcations, etc., it's pretty common to find hundreds or thousands of things which are all more or less equally sucking performance. So on one hand, it's a much harder problem. On the other hand, front-end people have this attitude that performance doesn't matter because hardware is fast enough and it's better to write in the most abstract language possible because changing diapers makes your hands smell gross.


I think you missed the point. Parent is saying that without compiler optimization everything would be even slower.


I disagree that everything would be slow. Yeah, there are websites with so many JS bloat that they can make the fans start up in my laptop, but otherwise, your computer does a shitton of things, most of which is essential complexity.

To-the-point software is actually really fast, like gcc/clang does tons of optimizations compared to what compilers did a few decades ago, and I doubt you could realistically optimize it by a significant percentage in the general case. Same for databases. Browsers themselves are also pushing the boundaries of hardware, they can display static HTML/CSS ridiculously fast even though it is not ideal for layouting. Pretty much only the top of the abstractions “suck” sometimes, but there are really fast JS apps as well.


This attitude is very common and has clearly degraded software development particularly for users, and like most issues afflicting modern software it's not perceived by developers so it "doesn't exist" to them. As the top reply to this mentioned, anyone can easily tell how much slower software is today even when computers and devices are so fast now! All in the name of "making code easy to reason about."

I have news for you, all code is difficult to reason about at some level. Sure, certain abstracts can be easier to reason about than others if you take certain examples that I can generally argue are reductionist, but any reasonably well developed piece of code (read, actual useful code) will require a certain level of sophistication and complexity that will eventually make it difficult to reason about.

I know this is a reach and a bit of a red-herring, but developers really really need to remember the allergy to "code being difficult to reason about" is a sometimes a useful intuition or heuristic to use, the heuristic being "easier to reason about is better," but it really is a heuristic at the end of the day. Abuse of this heuristic is often a motive behind all sorts of detrimental behaviors in tech, reinventing the wheel and NIH (I didn't write, it, so it's difficult to reason about!), rewrites from scratch (our "technical debt" is now hard to reason about, let's rewrite from scratch!) and the problem OP refers to (a solution that maps to the machine better is hard to reason about, let's use this less efficient solution instead!). And a big problem here is that these issues are often not issues to hackers because it just means more programming for them which they love! It just hurts users and customers.


>As the top reply to this mentioned, anyone can easily tell how much slower software is today even when computers and devices are so fast now! All in the name of "making code easy to reason about."

Yeah - it also doesn't crash my PC when an app segfaults, hell even video driver failing these days won't bring my system down. I don't need to restart my PC when I plug in a device. That's not really compatible with your childhood DOS games that enter kernel mode and talk to HW directly - you pay for abstractions - and it's a good tradeoff.

I can also have files with more than 5 character file names, and my FS can revert file changes and create snapshots.

These "good old days" people are just unrealistic, VIM is fast even today - but I prefer having a graphical representation of file tree with fancy icons and search. But that's just me and 99% of people out there.

Performance is a feature - and it's not a feature most people rank high untill it becomes an issue. As much as a minority of people here are crying about electron and HTML - a huge number of developers use VSCode daily and enjoy the approachability of a JS/DOM plugin ecosystem. VIM is free and available for decades now, and it has such amazing UX that the one of the most upvoted questions on SO was how to exit from it...

Slack is dog slow - and IRC has been freely available for decades, most people here use Slack daily. Who uses IRC anymore ?

So keep using your Emacs integrated window manager and stop bothering the rest of us :D


My point really isn't about luddite vs. anti-luddite, I'm talking about performance and how it affects users. Also, you're right that less performant or less elegant tools win out sometimes (like the good old mac vs windows debates before macos and apple in general made a come back) and that is due to a variety of reasons. That's fine, life is complicated and many factors affect who wins in life, it doesn't mean we can't still ask for better things though.

Still I don't know what bearing of all this has on the point I made above.


I think other users in here are more on the right track about where the blame for this lies.

As you alluded to, a lot of programmers are happy to tweak and refine for ages so it's probably not that.

Is it just that programmers are lazy or need to care more or be better educated about optimization?

Honestly I really doubt it. I fully agree with the other takes that companies have incentives to ship working software quickly. Features sell subscriptions not performance. If it comes down to choosing between functionality and performance leaders will ship the slow feature every time.

I'm not even sure they're necessarily wrong most of the time. If you know where to look the internet is a graveyard of carefully engineered products that were beat to market and obliterated by three people with a shitty electron app.


You're probably right honestly that bean counters are mostly to blame. I will say the attitude of being afraid of difficult code is a problem in modern IT (IT of any generation really) and doesn't help but companies not caring about users being the primary cause of bad, unperformat code is the correct take. Conversely then, the handwaving of the OP comment is wrong too, so if we limit the discussion of what one ought to do, I still stand by my point.


I generally only observe value in application code optimizing for correct data structures/batching/data parallelism. Once you're getting into fine tuning the batch size/SimD ops for a particular platform you are quickly into the realm of diminished ROI due to fragility against changing hardware, application usage patterns, and diminished returns where dropping an op from 1 microsecond to 10 nanoseconds just doesn't matter as much.

There are cases at the tail where this will make or break a project, but those are pretty rare.


This is exactly how we ended up with someone charging you $50/month to run a browser on a supercomputer.


> All the languages mentioned in the article are still around because it doesn’t matter how performant 99% of code is.

It’s worth challenging this assumption and asking whether it’s really true, or whether this belief is old and outdated like the old languages. I’m certain that providers of compute infrastructure like Amazon, Microsoft, Google and others completely disagree that performance doesn’t matter. Letting slow code run literally costs them money because it prevents multiple people from being able to use the same hardware resource. Even on your own machine, slow processes hog resources that other processes could use. The reason my laptop fans keep spinning up in Windows while my machine is basically idle is due to hundreds of processes written by people who erroneously believe the performance of 99% of the code doesn’t matter. We’re also at a point where we’ve started to measure the energy efficiency and environmental impact of wasting compute cycles. So, performance always matters.

> For the 1%, we can think about cache misses [...] but that has the enormous downside of making code hard to reason about.

You’re agreeing with the author here. Maybe you missed the explicit point in the in the article that is calling for a language that makes basic cache awareness a first class citizen, precisely so that it’s easier to reason about?

“This is exactly the type of problem that should be solved via the features of our programming languages, since writing these custom sorting functions is very tedious. What if a programming language would provide us with a structure that would act like an array of structs, but internally it would really behave like a struct of arrays? We could program in the typical object oriented way that is convenient for humans, while still enjoying in great performance due to playing nice with the hardware.”

> You are also optimizing for today’s hardware, which means that it might not run that great on new hardware in 20 years.

For the last 20 years, the need to be cache-aware has increased in importance. For the next 20 years, the gap between memory and compute will get larger still. There’s no scenario on the horizon where we don’t have to become more memory conscious as programmers or use more memory conscious languages and tools. There aren’t many performance tricks and concepts that worked 20 years ago and don’t work today. Using a structure of arrays (SoA) instead of an array of structures (AoS) has long been and will continue to be a good idea.


Feel free to profile a high-profile application. It is almost always a few hotspots that “suck”.

There are rare cases where there is no one point that is slow, that’s a bad situation to find oneself in — it usually means that the base architecture/abstraction sucks and needs a big refactor.

But even in so low-level code as the linux kernel, you see the use of linked lists all the time. Because it simply doesn’t matter if you are not in a hot loop and iterate slowly over 3 elements.


FWIW, I profile intensive applications all the time, it's my job. You're right about the existence of hot spots, but that's not particularly relevant to my point or the author's. Just because one spot is hot doesn't mean you should rationalize using lazy or bad-practices programming on the rest. Just because you don't see a bottleneck somewhere when you profile doesn't mean it'll stay that way when other processes want the same resources at the same time. Knuth's famous quote was not intended to be a license to ignore using best practices in general for non-hot-spot code.

Use of linked lists in the linux kernel is also not very relevant, and somewhat of a red herring. Kernel-level (as well as games & embedded programming) linked lists are frequently allocated and used very differently from typical high level language linked lists. Scripting languages are allocating individual linked list nodes by default. Linux uses linked lists for the memory allocator itself, for example, and allocates nodes at a lower level in blocks. There are no particularly good alternatives, and no great ways for the allocator itself to be cache aware, though that is ongoing research. The main perf problem with using linked lists in high level languages is not the fact that they cause cache-incoherent access, it's worse, it's the fact that they allocate and deallocate memory constantly which is much much slower. This is one reason why linked lists don't get used as much as arrays in high level languages, when arrays are a reasonable alternative.

Anyway, it's somewhat of a straw man to bring up linked lists when the author's valid point was SoAs are better than AoSs from a cache perspective, and that our languages aren't yet doing much to help us code that way, but they could in the future if we decide to want them to.


I apologize for my cocky reply.

I just meant to point out that SoA vs AoS or an even slower data structure is not necessarily relevant for all programs.

> The main perf problem with using linked lists in high level languages is not the fact that they cause cache-incoherent access, it's worse, it's the fact that they allocate and deallocate memory constantly which is much much slower.

This is not true though; at least the JVM most definitely doesn’t do that, and I’m sure other state-of-the-art GCs neither, like V8.

First of all, allocation is basically free, much faster than malloc and friends. It’s only a pointer bump. And the cost of deallocation is amortized/may not even happen. With plenty of memory available, GC doesn’t really happen, and for long-lived objects in case of a generational GC, they may not necessarily get checked at all after a time.

So the usual linked list gets appended a few times is close to optimal, iteration has the same cost as in lower level languages with pointer chasing (usually a bit more because objects usually have headers that take quite some space, less of them potentially fitting into cache). It even has some advantage in that a modern compacting GC may move the objects close to each other.


>So the usual linked list gets appended a few times is close to optimal, iteration has the same cost as in lower level languages with pointer chasing (usually a bit more because objects usually have headers that take quite some space, less of them potentially fitting into cache). It even has some advantage in that a modern compacting GC may move the objects close to each other.

I wonder why array-backed linked lists aren't more of a thing. I've used those before in C in hobby projects. You allocate an extendable array of nodes and the nodes next and/or prev pointers become indices in the array (although pointers would still work). Added nodes go on the end, deleted nodes stay where they are and just aren't referenced. If there is high churn, do something to keep track of inactive nodes to fill next. You're still spamming around an area of memory when iterating the list, but it's at least confined to one guaranteed contiguous block and you're only allocating to extend the array and not doing a lot of allocating and freeing of memory. You still have the memory overhead of the next/previous indices, though, so fewer nodes fit in cache than otherwise would.


Agreed, especially for high level languages and application level linked list usage. Worth mentioning I think the good linked list libs usually do this under the hood, and kernel & allocator level linked lists are always doing this. People implementing simple linked lists themselves, and/or copy-pasting the kind of code you get by searching for simple code examples (like the link in my sibling comment here), that's when they end up with very poor performing linked lists.

A closely related concept to what you're describing is internal storage linked lists [1], meaning the class/node contains the link(s) and is specialized for one particular kind of data, which often makes it more compact, and easier to allocate a block of nodes in a single call.

Anyway, I'd guess a big reason it's harder to find array backed lists when searching for linked list implementations is just because it's more advanced and may introduce memory management topics into articles and blog posts that the author doesn't want to cover. But I agree, once you use array-backed lists, the common examples of linked lists you can find online where new is used twice per node, once for a generic node, and once for the payload, seems ridiculously wasteful. This is the kind of thing I meant earlier - best practice would be to avoid a naive bad linked list even in code that isn't a hot spot. Time doesn't have to be spent optimizing it, and it doesn't need to appear as a hotspot, to know it should be avoided.

[1] https://en.wikipedia.org/wiki/Linked_list#Internal_and_exter...


> at least the JVM most definitely doesn’t do that, and I’m sure other state-of-the-art GCs neither, like V8.

Which implementation are we talking about here? Do you mean JVM internal linked lists, or user allocations of linked list nodes? I was talking about implementations such as (first Google result for "Java linked list implementation") https://www.geeksforgeeks.org/implementing-a-linked-list-in-...

> First of all, allocation is basically free

No, definitely not true. You're claiming that memory allocation in a tight loop is not a bottleneck. I completely disagree, having written several high performance web apps with thousands to millions of users, the single most important performance "best practices" strategy in any tight-loop Javascript is to avoid memory allocation in the hot spot. Also true in Python, true in C++, true in every language I've ever used, and has been true for decades and is still true despite significant performance improvements in V8 and other engines. This is the whole reason there are high performance libraries in JavaScript like https://glmatrix.net/ that specifically structure the API so that the user has complete control over allocations, precisely because batching and re-using allocations is faster than not.


What are the specific situation that we are talking about? Of course needless creation of non-single-use objects will have a cost, but object reuse via pools may be even worse than creating non-escaping objects in a tight loop. The “problem” with high level languages is the complexity of their runtimes - you can’t easily reason about what and how will get optimized. It can change from version to version so one should profile.

Not too familiar with JS vm internals, but OpenJDK has quite a good escape analysis.


> The “problem” with high level languages is the complexity of their runtimes - you can’t easily reason about what and how will get optimized.

Yes, agreed. This is the point the author was making, and noting that we have room to actually improve the languages, and make some basic optimizations easier to think about and code explicitly.

> object reuse via pools may be even worse than creating non-escaping objects in a tight loop.

When does this happen in practice, and how likely is it? What does it imply about object lifetime if your objects don't escape, and why wouldn't you be re-using an object on the stack or local scope instead? How often would replacing a non-escaping object creating with a re-use that escapes be a realistic alternative? I'm sure a contrived case can be setup, but this seems like an unlikely explanation to me as general advice. It's bad practice to hope your objects don't escape, and difficult to ensure. In general, if you want higher performance, then avoid using new in hot spots. The difference is often an order of magnitude in an flat O(n) loop.


I’m not disagreeing with you, and as a very general advice, less allocation is better of course.

My point was more along the way that sometimes very non-intuitive (from a low-level perspective) solutions may optimize better, because that’s what the compiler writers optimized for.

And object pools at least in Java is not necessarily good. The first go at it should be the plainest implementation one can imagine (on a micro level. Of course architecturally it is important to think about optimization, like whether it will be a class or an array, etc.)


Slightly off topic but I just realized that not only does optimized code = less processor cycles but less cycles = less power used.

This brings two rhetorical questions, How much more effective compute would we have if all code running today were perfectly optimized, and how much less energy would be needed under the same circumstances?

The computer gains could only be used in networks where unused compute can be scheduled / sold to other tasks. The energy gains could only be reaped if processors had proper low power modes they could fall into or some other method of conservation.


High performing code doesn’t have to be hard to reason about.

You wouldn’t say the plan for a safer building is harder to read.

People excusing their poorly written code as “highly-optimized” is the oldest cop-out we’ve got as programmers.


Good article. My own thinking when coding performance is also towards data oriented approaches. For example Bevy in Rust. If stuff needs to be fast, it needs to be in cache. To do that, have everything nicely packed so you only ask for a chunk as often as you need. Then when you need it, it's already there.

Thing about old fashioned OO is it's often fine enough for your run-of-the-mill CRUD app. If you look at the latency chart he provides, there's another level to it: going to a database server. If you're gonna do that, you're not really in the speed game anyway, so why not make it simple? Just write it in the most uncomplicated way the language allows, and that's it. Waiting for the DB will take way longer than filtering the ants, forget all the optimizations.

Edit:

Forgot to mention with ECS, you have a different organizing principle. It can be quite enlightening to work with "systems that operate on data" as opposed to OO where you have the data declared next to the functions that mutate it. It can add clarity for certain problems. Somehow over the years I've found that the animals/cars analogies given in OO tutorials are one of the few places that fit well with the model.

If you imagine a game that has characters, relationships, items, spells, and so on, it's often not that easy to model as encapsulated objects. Does each character have a list of other characters that they have a relationship with? What happens when you want to change the relationship, or a character casts a spell that temporarily changes the relationship? Can easily end up a mess, because it's not obvious where such things should live in an OO model.


> Somehow over the years I've found that the animals/cars analogies given in OO tutorials are one of the few places that fit well with the model.

Yes, and I've never actually needed to implement a cat or a cow in any project :)

The other thing for which OO works much better than plain data is GUIs - and I think it is not a coincidence that OO popularity exploded together with the the coming-of-age of GUIs.

The other canonical example of OO - "Shapes" - doesn't actually work well at all; It doesn't work better with "plain data", but it exposes the fallacies of trying to use OO inheritance to model the real world. Every square is a rectangle, so square should inherit from rectangle ... but, you can't stretch width and height independently in a square, so it's not really a rectangle, etc. etc.


> The other canonical example of OO - "Shapes" - doesn't actually work well at all.

Shapes work quite well with OO if you design the heirarchy right using is-a relationships, the typically cited problem involves applying is-a relationships which apply to immutable geometric concept of shapes to mutable representations whose mutation contracts don’t support the is-a relationship of the immutable shapes. This would actually be fine if the contracts were written in a way which respects the it's-a relationships properly. E.g., the common Circle-Ellipse problem goes away if the contract for the mutator for, say, the major axis of an Ellipse doesn’t specify what the effect is on the minor axis. For the base class, the two can then be independent with the Circle subclass having them invariably equal under mutation.

Alternatively, its not a problem with a Smalltalk like “become”, in which case an unequally stretched Circle becomes an Ellipse with appropriate attributes.

Or, the mutable elements arr restricted to things thst don’t change the shape like scale and location, and shape transforms return a new Shape; if Ellipse’s StretchMajorAxis returns an Ellipse and so does Circle’s, there’s no problem, either.


I should have added "in C / C++ / C#".

Everything OOP works well in Smalltalk (and most of it does in Python), but you pay dearly for that in efficiency - which is what this article as about.


> I should have added "in C / C++ / C#".

The only part of my response that would have effected was the aside about Smalltalk-style “become”, not the main point about the problem being one of constructing an inheritance heirarchy based on relations of immutable entity and using it for mutable entities with mutation contracts that don’t observe the same is-a heirarchy.


I agree, but that's the thing: It is non idomatic to write immutable Shape-style classes in C++. Possible, yes. Idiomatic, no.

IIRC in Stroustroup's book (the one I read a decade ago, anyway), he concludes that Rect:Square and Circle:Ellipse can not inherit from each other in either direction, and did not suggest switching to immutable everything.


> The other thing for which OO works much better than plain data is GUIs - and I think it is not a coincidence that OO popularity exploded together with the the coming-of-age of GUIs.

Yeah, and I think the reason has nothing to do with objects per se. I've recently been coming to conclusion that the one thing that makes OOP last is that it's neatly packaging an important feature that other programming paradigms struggle with or ignore: late binding (aka. dynamic linking).

Is it better to write a class Foo with a method Bar(stuffs), or a struct Foo and a function Bar(Foo&, stuffs)? I don't care. They're literally the same thing.

But say I want to change the system so that, in some cases, it calls Baz instead of Bar - but I don't want to rip the system apart and change every call site. With OOP, I just make a class Quux inheriting from Foo, have it override Bar(stuffs) to do Baz code, and shove a Quux object into the system. Done. Dynamic linking ensure that my new implementation gets called for Quux objects.

Meanwhile, in most programming languages, you can't pull that same trick with a struct. Not without having to manually reimplement dynamic dispatch - at which point you may as well use an OOP language.

Dynamic dispatch is a super useful thing to have. You can minimize its use - and probably should, for performance reasons - but when your language doesn't support it as a built-in feature, these few cases in which you need it get very painful to write.


> But say I want to change the system so that, in some cases, it calls Baz instead of Bar - but I don't want to rip the system apart and change every call site. With OOP, I just make a class Quux inheriting from Foo, have it override Bar(stuffs) to do Baz code, and shove a Quux object into the system. Done. Dynamic linking ensure that my new implementation gets called for Quux objects.

Functional programming does this with pattern matching, it is just as easy and avoids the dangers of sub-classing.


There’s another side, which is that from a source code management perspective, modern pattern matching forces you to group implementation by operation (to get all the benefits), modern OO let’s you group by “class”.

There’s no right or wrong answer here. For GUIs, it appears grouping by class is more effective. But many other times, it’s not.


The source of this capability is hidden in that you had to pass in a Foo to the call site.

If you had done similarly in the struct example by passing in the function Bar to the caller then you could achieve similar functionality by shoving in a Baz function instead.


Yeah, but that's the thing: in case of dynamic dispatch in OOP, the function is tied to the argument you're passing. In a typical OOP language, your object carries an extra pointer that's hidden from your view - a pointer to a table of pointers to functions, specific for that object's class. Dynamic dispatch goes through that pointer on the object.

Doing what you describe with a struct would require at least keeping explicit function pointers on the structure; the OOP mechanism described above is a generalization of that.

(There are other, neater approach too, allowing late binding dispatched on the types of more than one argument - see e.g. methods in Common Lisp.)


OO is a tree and html (UI) on a page is a tree

The problem is when you want to go to page two and want to display the same data as page 1 but in a different configuration

OO is suddenly terrible because the data is the wrong shape, you should hold your data as a graph then derive trees out of it to satisfy views

OO is not good for UIs unless you have one page and the contents on that page is static and doesn't change it's placement outside the statically defined tree shape of OO


> OO is a tree

Inheritance heirarchies in single-inheritance languages are trees. In MI they are DAGs. But neither of those constrains data representations, OO can support arbitrary data graphs.

> OO is suddenly terrible because the data is the wrong shape, you should hold your data as a graph then derive trees out of it to satisfy views

How does this make OO terrible? What you describe is exactly standard OO GUI practice (specifically, in MV<whatever> architecture, the model layer is an arbitrary graph representing the data, from which the view is derived.)


I Was referring to the lower level GUI world, that in which you implement "Button" and "TextField" - OO was, and still is, a great win there. The Win32 / Cocoa / Xt level.

But even inside the browser, the fact that even div/span/thing is an object, to which you can attach listeners, and otherwise apply methods, works rather well - I'm not familiar with a better alternative for implementation and dynamic control of this layer


OO can be a tree, as well as pretty arbitrary graphs. As has been said for over a decade, prefer composition over inheritance. There are cases where inheritance is really cool, but the main point of OOP is encapsulation. It can enforce class invariants.


been thinking about this comment since I read it and really love it (even though OO isn't always a tree). This graph <> tree relationship seems central to the successes and failures of many implementation approaches I've seen over the years. Thanks for sharing.


> The other thing for which OO works much better than plain data is GUIs

This cannot be further from my experience. Switching from Java/AWT/Swing to ClojureScript/Reagent/Re-frame has been suoer liberating.

Every GUI programmer must fiddle with reagent once:

https://reagent-project.github.io/


Well, you really picked the worst representative with awt/swing. I'd say Qt or even classic VB6/Delphi are great examples for OOP gui development.


How about JavaFX? Swing has been old hat for years now.


I haven't touched java gui code in almost a decade, but didn't they suddenly pull a 180 on JavaFX a few years ago?


Not exactly, they moved it out of the core Java standard library. The main effect of this seems to have been that many developers are now uncertain of whether JavaFX can be relied on, but it is still maintained.


Cocoa has been a great OOP-based UI framework forever. The key is that it doesn't use inheritance and has messaging (https://wiki.c2.com/?AlternateHardAndSoftLayers).


> The key is that it doesn't use inheritance

Cocoa is literally based on inheriting from NSView (which itself inherits from NSResponder)

https://developer.apple.com/documentation/appkit/nsview


That's if you want to make a custom view, which very often you don't. Placing default views in a window and receiving events/values from them doesn't use inheritance, instead using composition and delegate patterns.

UIKit lost some of the convenience features like bindings when it shrunk to fit on phones, which might be why people got annoyed enough to invent entirely new UI frameworks.


> That's if you want to make a custom view, which very often you don't.

I have never seen any non-trivial software not have custom-drawn widgets - grepping for `: NSView` in my ~ gives me a few thousand matches (and it's not even a mac)

> Placing default views in a window and receiving events/values from them doesn't use inheritance, instead using composition and delegate patterns.

... but you can't have composition without at least interface inheritance, and in this case you really want access to the parent methods & properties of NSView when creating your own widgets (e.g. bounds, rects, tooltips, etc) so you also want implementation inheritance.


I just...I really want to like Clojure, but I cannot get past the brackety notation of lisp-likes. Surely there's a way to do it where I'm not in bracket hell?


Use tools like paredit that support structural editing. There are tools like this for all major editors.

https://calva.io/paredit/


But you're super limited if you're trying to make a GUI inside a browser, how is this a good example ?


The same technique can be used outside the browser with Clojure + JavaFX [0] and even at the terminal [1]!

[0] https://github.com/cljfx/cljfx

[1] https://github.com/eccentric-j/cljs-tui-template


I’m on thin ice here. A systems complexity can be measured in the number of tops it has. As in different ways to see it as hierarchical.

Could it be that OO favors systems with one top? The tomato example comes to mind. Is it a fruit or vegetables? According to American law it was classified as vegetable at some point to meet a political need to limit imports. Of course botanically it’s a fruit but usually perceived as a vegetable. So three tops identified just in this example. Depends on context; import, science or consumers.


> The other canonical example of OO - "Shapes" - doesn't actually work well at all; It doesn't work better with "plain data", but it exposes the fallacies of trying to use OO inheritance to model the real world.

Shapes work well, but people do the wrong things with them, as you state...

> Every square is a rectangle, so square should inherit from rectangle ... but, you can't stretch width and height independently in a square, so it's not really a rectangle, etc. etc.

A square is a just a rectangle where the length and width happen to be the same size and shouldn't be its own class. Books and websites teaching inheritance should stop trying to use it as an example.



> The other thing for which OO works much better than plain data is GUIs

I think the absolute dominance of React and its functional style (especially among juniors) is pretty good evidence against this belief.


It is pretty much an MVC separation that was mainstream in OOP-based GUI frameworks since the 90s at least.

But instead of “state”, you have a model, which uses the Observable pattern to notify the view of changes.


> If you imagine a game that has characters, relationships, items, spells, and so on, it's often not that easy to model as encapsulated objects.

I...disagree. ECS definitely has efficiency advantages for games in terms of support performant implementations, which for games is often overwhelmingly critical, but there’s well understand OO ways to address the modelling issue tou raise.

> Does each character have a list of other characters that they have a relationship with?

It probably acts like it does, but it probably doesn’t maintain it, instead deferring to a separate object that acts as a repository of relationships, which are themselves objects, which allows you to modify relationships without touching the characters at each side (which may be more than two for multilateral relationships).

> What happens when you want to change the relationship, or a character casts a spell that temporarily changes the relationship?

You retrieve and act on the relationship object, which knows how to handle modifications.


That doesn't really encapsulate anything, then. You end up with global data wrapped in classes.


You always have "global data" in your program - i.e. the sum of all it models. The issue is with controlling who can see and modify it. Encapsulation is a way to establish that control, and it's purpose is to minimize cognitive load on the programmer.

The problem with relationship example is that modelling this is tricky, and you have to be careful - and that's regardless of the programming paradigm used. For instance, if you're talking about "a relationship between two characters" as an objective, categorical thing (e.g. "A and B are friends", "C and D are enemies"), you're thinking about a concept that's separate from the characters themselves. So you don't want your characters to have a list of other characters, because it splits a single concept into two pieces and risks creating inconsistencies (A has a "friend" pointer to B, but B doesn't have a "friend" pointer to A). On the other hand, character A may want to query the list of characters to whom they're related, so you'll want to be able to generate that list on the fly.

Thus the OOP design would be: some object that manages all the relationships, that object being reachable from every character, and able to answer a query "what are all relationships for character $X?". An action modifying a relationship (say a spell temporarily making A and B enemies) would therefore act on that relationship manager too - which makes sense, because the action is modifying a relationship, a concept that's separate from characters themselves.

On the other hand, if you wanted to model relationships from the point of view of your characters, i.e. what they think of another character, then such relationships stop being an independent concept. You're now modelling the subjective perspective - and thus these would belong to characters, and in OOP, you'd likely model them as a list on the character object.

Whether you're doing classic OOP, plain functional programming, data-oriented programming - you still have to think about the questions described above in order to pick a correct representation for your paradigm. These questions are independent from the programming paradigm.

(And to add a further example to the mix, if you're storing character data in a relational database, these questions boil down to "is 'relationship' many:many, or 1:many?", and the answer determines the kind of bridge table you'll use.)


It's mostly a question of hierarchy of data and responsibilities. Character objects that administer changing relationships are doing too much.

For example, relationships involving "characters", "relationships", "items", "spells" etc can be handled by objects like "party" and "inventory" and "spell book" (glorified lists supporting a few custom named operations that basically do the same things) which contain the links to other objects. A spell that temporarily replaces your equipment simply swaps out your "character's" "inventory" object with some magical temporary one, and then swaps the real one back again when the "effect" from that "spell's" "invocation" ends. After this, adding support for something else like a spell that swaps inventory with the victim or steals their equipment would be trivial.

Convert this to functional terms and you've basically got a hierarchy of data that you operate functions on and generate new data to put back into the hierarchy, matching the morphing environment. It's all about the data in the end, regardless of your source point of view.


> If you imagine a game that has characters, relationships, items, spells, and so on, it's often not that easy to model as encapsulated objects. Does each character have a list of other characters that they have a relationship with? What happens when you want to change the relationship, or a character casts a spell that temporarily changes the relationship? Can easily end up a mess, because it's not obvious where such things should live in >an OO model.

It's not really that hard. You just have one RelationshipManager that every Character have access to via simple public methods. OO is not so bad, especially when you forget about inheritance and use composition.


That sounds like a poor man's component if you already separated it into another class...? And that data is no longer belonging to the instance now.


> And that data is no longer belonging to the instance now.

It belongs to the correct instance. The noun in the domain to which the information logically uniquely belongs is a relationship, not a character (of which there is typically more than one with different roles in a relationship, the number and roles varying by type of relationship.)

One problem that textbook OOP examples produce is that they tend to favor a heirarchy of physical entities as examples, which has the virtue of familiarity but really obscures the analysis of nouns in the domain, the most important of which in most real domains are the nouns describing relationships, activities, and events.


The same thing can apply to every other property I think? probably you can seperate the User::name into a PersonIdentityProperty something like that, and remove the type limit to allow it to contain any property. Then it is basically a ecs now. And that do actually has some pros. People can have multi name, you system at least works properly with it now(and for free).


OOP based component approaches are really common so it shouldn't be surprising.


Ah ... here we are again.

http://www.paulgraham.com/reesoo.html

OO is not so bad because OO is not well defined.


But splitting it out into another class is halfway there already, no? The relations object doesn't correspond to a real domain object anymore. Next step is just to split out all the other components.


> The relations object doesn't correspond to a real domain object anymore.

“relationship” is a noun describing a real feature of the domain, and thus is a real domain object.


You could say that about anything you construct in your model though? Stops being OO if it's not somehow similar to an admittedly vague concept of what an object is, surely?

I could write the whole thing as an ECS and call all the objects systems or components, and then they are nouns in the domain?


> You could say that about anything you construct in your model though?

That’s kind of the point, yes, model -> nouns -> objects.

> Stops being OO if it's not somehow similar to an admittedly vague concept of what an object is, surely?

There’s nothing particularly vague about “noun in the domain”; linguistic-based modelling certainly isn’t the be-all and end-all of OOP modelling, but its one of the traditional approaches dating back to the early 80s and its more sophisticated than thr kind of naive tangible-item approach that seems to be set up as a strawman here.

> I could write the whole thing as an ECS and call all the objects systems or components, and then they are nouns in the domain?

A relationship is a real thing in the domain being modelled, “systems” and “components” are (in the sense you are using them) not, unless the thing you are modelling is a an ECS implementation of a domain. There might be some utility for such a second-order model in OOP (if the OOP is for a code generator, for instance), but it wouldn’t be a model of the domain addressed by the ECS system, but of the ECS system itself.


ECS can definitely be built in an OO fashion.


> If you look at the latency chart he provides, there's another level to it: going to a database server. If you're gonna do that, you're not really in the speed game anyway, so why not make it simple?

OTOH, this is where I can see the biggest benefits from building a language/system around the article’s approach. You’d really like to execute a lot of the logic on the DB (or some other distributed system) which needs to be deeply integrated with the DB to be as convenient and expressive as pulling the data and operating locally.


I always laugh when people talk about ECS vs OO as if the entities, components and systems weren't objects. Its an OO pattern.

Even in the most abstract form when entities are just ids, and you combine components and systems, you're still going to implement the component-system like an object because OO is super useful. Using arrays doesn't mean you're not using OO.


You're talking like ECS and OOP are separate, while ECS is actually relational OOP : https://www.gamedev.net/blogs/entry/2265481-oop-is-dead-long...


To elaborate on Bevy: the data-oriented approach used is called an Entity Component System.


And super importantly an ECS is not necessarily data-oriented.


> as opposed to OO where you have the data declared next to the functions that mutate it.

In the popular OO languages. As usual, the Common Lisp Object System is always worth a look. You have classes that encapsulate data, and then you have "free" generic functions (well, methods that implement those generic functions) that operate on that data.


How does that enforce class invariants?


Character interacts with the 'world' or another local context object which contains all characters it can interact with.


Summary: the computer has changed -- memory latency measured in CPU cycles has grown a lot. So we should not be using traditional struct/class-like programming model, where we put all properties of an object next to each other. Instead, we should be using game-style "data oriented programming" a.k.a. "column databases" for a much higher performance.

However, most modern languages (C, C++, Python, Java, etc..) are not making it easy, and this is bad! Meanwhile you can data into columns and enjoy faster programs.


> So we should not be using traditional struct/class-like programming model, where we put all properties of an object next to each other. Instead, we should be using game-style "data oriented programming" a.k.a. "column databases" for a much higher performance.

Something like DataDraw (http://datadraw.sourceforge.net/) does with C?

> However, most modern languages (C, C++, Python, Java, etc..) are not making it easy

Funnily enough, it was yesterday when I though about how cool it would be to make an implementation of Oberon-07 (http://oberon07.com , https://people.inf.ethz.ch/wirth/ProjectOberon/index.html - a simple-enough language to experiment with in various ways) that would use something like this as its in-memory data representation.

It came up as a consequence of me wanting to have an implementation of Oberon-07 for WebAssembly, and realizing that the necessity of "restarting" the whole binary at any code change (necessitated by how WebAssembly works) also allows for some liberties in data representation (for example running out of compressed IDs for instances of a type could be handled in exactly the same way as changing the code of the running program - by generating a new binary with expanded space for these IDs and restarting it using the serialized old heap).


> Instead, we should be using game-style "data oriented programming" a.k.a. "column databases" for a much higher performance.

This makes logical sense, but I don’t buy it in practice. Most of the heavy data reads are handled by databases, which do optimize for this stuff. I just doubt that, in most software, a significant amount of software performance issues are a result of poor memory alignment of data structures.


Cache misses can lead to major slowdowns.

Everything depends on the access patterns in critical loops. If you need most of the fields of a struct in each iteration, the classic way is beneficial. If you need a narrow subset (a "column") of fields in a large collection of objects, splitting objects into these columns speeds things up.


The most upvoted stackoverflow question is about branch prediction (which is caching in a way):

https://stackoverflow.com/questions/11227809/why-is-processi...


When your datasets are quite large a sql database becomes an absurd bottleneck. Think of weather or aerodynamics simulations.


Structs are easy to envision as a repeating pattern tessellating across memory space; it makes slicing regions of data and bulk memory copy (e.g. DMA) easy.

Yet if the algorithms mostly work with subsets of those fields then a 'column database' type of data layout clearly has benefits.

Though for the complex case I would prefer more syntax sugar from the language.

A "column struct" datatype that might include tuning suggestions for stride, size, etc; but would present a simple interface for humans. Given multiple underlying blocks of records a parallalizable foreach style loop might even shard across NUMA distributed threads if supported at runtime.


What do you mean by game-style here? I’m very intrigued. Any place to read more about these?


Look for "Data oriented design" or "Structs of arrays".

Here's a talk on the former by Mike Acton: https://www.youtube.com/watch?v=rX0ItVEVjHc

On the latter, there's Jai, a new programming language for games (WIP, unpublished) by Jonathan Blow, who has been very public in documenting the process creating it, and which is centered around such concepts. There is some unofficial documentation of the ideas on SoA vs AoS and how the language can help switching between the two, e.g. here: https://pixeldroid.com/jailang/overview/Features/SOA/#/overv... and here: https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md...

Edit: Just realized that the articale even mentions Jai towards the end.


It's called entity-component systems in games. Note that OOP was designed for simulations but games, the only kind of simulation people like using, don't use OOP.


I think the more accurate references is "Parallel arrays", aka "Structure of Arrays" (SoA) [1]

The amusing thing is that SoA is something you could see in old languages that didn't have data structures, or the code of newbies that don't know data structures.

[1] https://en.wikipedia.org/wiki/Parallel_array


Plenty of games use OOP (C++ !). And ECS is just relational OOP.


It's "Entity Component System". The "system" is part of the pattern name.


Gaming has always been pushing boundaries of what's possible performance-wise. For high-throuhput data loops, there's already established patterns used by gamedevs that maximize use of cache and cpu/gpu.



The keywords here are things like data oriented or "structures of arrays". Some examples are unity ECS or amethyst and bevy in Rust language.


> However, most modern languages (C, C++, Python, Java, etc..) are not making it easy,

The article shows how easy it is to declare a struct of arrays in Java: You declare a struct (class). Inside it you declare arrays. Done.


The code you to actually use it is pretty dangerous - you need to keep the index and “main pointer” in separate variables, but always use them together. And indexes are all plain untyped integers, so no compiler will warn you if you use wrong index variable. And GC is no longer thing, so “write after free” and friends come back.

Going to column arrays in high-level languages brings back many of the C bugs that we really hoped to forget.


It is easy to declare, but it is not easy to use:

You can pass the whole thing, but if you want to pass a subset of it? just one element?

When you want to extend it, you have to extend many individual arrays, etc. When you want to copy entry 17 to entry 50, you can't loop over the fields - you have to spell out every field.


A subset of an AntColony is another AntColony containing sub-arrays. One element of an AntColony is an Ant which you can instantiate with the data at the appropriate index from each array. I'm not trying to be difficult, I really don't see the problem with implementing these basic programming tasks.

I agree that spelling out the fields can become tedious if you have a lot of structures of arrays or lots of fields per structure. A 100-line Python script can generate all this code for you if you prefer. Sure, it's a "limitation" of most programming languages that they don't provide this out of the box. But not a big limitation, I'd argue. Many languages even have features that allow you to write this functionality as a library, 100% inside the language (Python and Java annotations, for example).


> I really don't see the problem with implementing these basic programming tasks.

That you have to write them over and over again for every single class. You offer to do that with Python.

> But not a big limitation, I'd argue. Many languages even have features that allow you to write this functionality as a library, 100% inside the language (Python and Java annotations, for example).

This is an argument of the "all languages that are Turing complete are essentially equivalent" kind. True, but not interesting in this context.

Ecosystem matters. If your Java program is the only one that does these things, it may just as well not be Java. You will be unable to use any other library without back-and-force adapters, for example.


> That you have to write them over and over again for every single class. You offer to do that with Python.

Or with some other code generator. A mythical language with SoA/AoS transformation "built in" will do exactly the same under the covers. But it might not offer other features you are used to, or (your argument) a sufficiently rich library ecosystem. You might as well do it yourself and use a language you are otherwise familiar and happy with.

> This is an argument of the "all languages that are Turing complete are essentially equivalent" kind.

Is it? You can't do what I described in C, for instance. To be clear, what I meant when I wrote "as a library, 100% inside the language" was that, given the appropriate definitions in a library, not in an external tool you run before your build, you just write something like the following Python:

    class Ant:
        someField = ...
        someOtherField = ...

    @StructureOfArrays
    class AntColony:
        pass
or something the following Java:

    class Ant {
        int someField;
        boolean someOtherField;
    }

    @StructureOfArrays
    class AntColony {
    }
and get all the features you can dream of generated for you inside the AntColony class, which you then use exactly as if you had typed things out yourself. You can't do this in C. This isn't an "all Turing complete languages" argument as far as I can see.

> You will be unable to use any other library without back-and-force adapters, for example.

I don't understand what you mean. In the above example, the Ant class exists, so you don't need adapters. The arrays inside the AntColony class exist (once the annotation processor is done with them), so you don't need adapters.


> You might as well do it yourself and use a language you are otherwise familiar and happy with.

Nim, Rust, D, Lisp support this already as compiled languages, without sacrificing anything - perhaps also Zig - at the macro level. Jai offers it as a language feature, but I don't think that's a particularly good idea - it's just that Jai tries to stay simple in such a way that doesn't let it be a user-level thing.

> Is it? You can't do what I described in C, for instance.

Not exactly, no, but you can with "X-macros", and I have - in fact - been doing that since 2005 or so, using only features from C89 ; I admit I have not used Java in the last 10 years, and then it was impossible - maybe some of the revisions in the last decade made it possible "inside the language" - what's the name of the feature I need to look up?

> I don't understand what you mean. In the above example, the Ant class exists, so you don't need adapters. The arrays inside the AntColony class exist (once the annotation processor is done with them), so you don't need adapters.

And now, you try to use the standard library "Sort" or "Find", it doesn't work without adapters, even if you have comparators defined for Ant -- and many other things that assume that objects are not part of a colony.

It's hard to explain in a short post. I occasionally use a language called "K" of the APL family, which makes all of this incredibly trivial from the get go. It's not a new language - K itself IIRC predates Java by a couple of years, APL (of which K is often considered a dialect) was first specified in 1958 and implemented in 1962. It basically considers everything an array - one or multiple dimensional - and field access is nothing more than having one of the axes specified by symbols rather than numbers. This makes everything incredibly simple and uniform to an extent that is hard to describe to someone who has not experienced it. (And indeed, in the turing sense it makes no difference .... but if you implement your Java objects as "mapping from field to value" you may get some of the benefits but will not really be using Java or be able to use the ecosystem)


> Not exactly, no, but you can with "X-macros",

I only gave this a passing thought before and was too quick to dismiss it. Yes, I see how it can be done. Which is just another argument that the whole thing is a non-issue, no? Languages with macro systems can do it. Languages with other build time code generation systems can do it as well.

> what's the name of the feature I need to look up?

Annotation processors. Here's the first thing I found with an example of code generation: https://cloudogu.com/en/blog/Java-Annotation-Processors_3-Ge.... Yes, it's considerably more verbose than a macro system. But it's still a tiny, write-once (or completely off-the-shelf) part of your overall project.

> And now, you try to use the standard library "Sort" or "Find", it doesn't work without adapters

OK, but the same is true for all the macro based systems you mentioned above, isn't it? And if it isn't: whatever you can generate with macros you can also generate some other way.

Edit: You won't like this, but looking at the docs for Java's Collections.sort method (https://docs.oracle.com/javase/7/docs/api/java/util/Collecti...: "This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array." So you can use it just fine if the SoA implements the List interface. But yes, at the cost of temporarily materializing an array of structs.


That what I refer to as the "Turing complete" argument. Yes, you can definitely do all of those. But almost no one does - and those that do, usually silo themselves from the ecosystem.

Yes, if you like Java, you can definitely implement a Java compiler/interpreter in C and get everything you like about Java using C. But no one does.

Thanks for the reference, I'll read about annotation processors later tonight.


I appreciate the discussion, but I believe you're arguing in bad faith.

> Yes, you can definitely do all of those. But almost no one does

<shrug> That's what you say. What is this based on, considering that until a few hours ago you didn't even know that Java has this built-in code generation facility?

When I search for "entity component sytem java", the first hit is this project: https://github.com/Rubentxu/Entitas-Java which uses a "code generator [that] generates classes and methods for you, so you can focus on getting the job done. Radically reduce the amount of code you have to write and improve readability. It makes more of a sea code less prone to errors to ensure the best performance."

Sure, it's probably not as widely used as other frameworks in other languages. Java isn't necessarily the first thing on people's mind when it comes to implementing AAA games.

> and those that do, usually silo themselves from the ecosystem.

You have not shown this. And if you mean that your C macro implementations do this -- fine. But you still seem to find them useful enough to keep using them?

> Yes, if you like Java, you can definitely implement a Java compiler/interpreter in C and get everything you like about Java using C.

That is a Turing equivalence argument. But using a very simple code generator for a very simple problem is not the same thing. Given that Java syntax is very similar to C, I wouldn't be surprised if your existing macros could be used 1:1, with the hardest part being getting a Java build system to call the C preprocessor on a Java file. Not quite the same thing as implementing an interpreter.

(Also, as far as I'm concerned this was never about Java per se.)


I appreciate it too, and I don't believe I am.

I wasn't aware it is now an integrated part of the JDK, but it doesn't change much; I've used ANTLR decades ago (and YACC even more decades ago) which is ... essentially the same thing, except Java has now standardized and simplified it so that you don't need to fiddle with your Ant or Maven or Makefile.

> Sure, it's probably not as widely used as other frameworks in other languages.

I've been doing professional work for over 30 years now, so I think I have some perspective, which may of course be different than yours. Code generators and processors have always had their niches, but that's what they are - niches - and they are quite small.

People used to use YACC, or ANTLR or Bison or Lemon or whatever your favorite parsing tool is. That's because they save an enormous amount of work in most use cases where they are applied. But they also come with great costs: Hard to debug when things go wrong, but most importantly - inflexible. GCC and many, many other projects move off parser generators at some point in their life for flexibility reasons.

Qt has MOC, which was essential in older C++ and is now IIRC optional because modern C++ has mostly caught up. It is widely hated among Qt users, though it did offer significant advantages. But I'm not aware of any project that used MOC except to interface with Qt - the boundary is always evident.

> You have not shown this. And if you mean that your C macro implementations do this -- fine. But you still seem to find them useful enough to keep using them?

How could I "show" this? It is obviously predicated on the projects I've been involved with, which - I noted - have not been Java in the last ten years. It's not like it's something that can be "shown", except in some statistical sense, the validity of which we will argue forever.

And yes, people shy away from macro heavy C code (whenever J source code or A source code gets posted on HN, people are less than happy, e.g. https://news.ycombinator.com/item?id=8533843 ; the whole J system looks like that ).

I use macro heavy code only when I don't have to share it with others, for that reason.

> But using a very simple code generator for a very simple problem is not the same thing.

Well, perhaps we're not thinking of the same level of simple - e.g. c++ template meta programming, which I first used in 1995 is built in, and (literally) turing complete, so it can do anything, but is far from "simple" - not to actually write, not to use, and definitely not to debug.

What I had in mind was things like https://github.com/Hirrolot/poica - It uses C macros to do incredible magic. All it needs is one #include and it gives C introspection, oop and a few other things. Is it simple? I wouldn't say that. But more importantly, is it C ? I wouldn't say that, and so does the guy who made it. It can be used in a C program, yes. It can e.g. go further and define comparators for you so you can use qsort() on the new types it helps you define. But it's not really C because when you try to integrate it with an existing C codebase, you'll find that you keep having to translate things at every boundary - which is quite likely to happen with a Java SoA library.

I think my point is that languages in common use (Java, C, C#, ...), despite offering the facilities to locally solve the issues described in the original article (locality of reference mostly), they do not practically allow these solutions to become widespread or common because they are not part of the core, and as a result you lose all benefits on library boundaries.


columnar layout is only faster if reads are sequential


If things work one or a few columns at a time this isn't necessarily true. Prosumer CPU L3 caches are already up to 128MB and will be up to 1GB before long. If one or a few full columns fits fully in cache but all the data doesn't, columnar layout may still be faster in the face of random access within the column.


There’s a nice introduction to a data-oriented approach in here, but it’s buried in some junk.

E.g., I don’t think a lot of developers believe compilers are magic optimizers. And programming languages, including the ones referenced, have changed quite a bit over the last 20 years. A strong enough case for general data-oriented features would cause at least some of them to support it.

Also, a great amount of code should be written for maintainability — that is, to be read and easily understood by a future developer without special insight — rather the performance. Talk of “software cultists” is silly and shows how poorly the author understands the issues driving software development approaches.

Not to mention, a performance analysis of software systems that stops at main memory is about only a subset of software (one that has a pretty thin slice of overlap with “enterprise” software, where your bottlenecks are overwhelmingly IO and DB).


To me, the goal of language implementations should be to make the most maintainable code also the fastest code. I shouldn't have to worry much about the performance characteristics of my software outside of general concerns about algorithmic complexity.


At the end of the day our software is going to run on real machines, sending data across real networks. I feel like we'll always be leaving performance on the table if we don't understand the characteristics of the hardware we're developing on.

I can build a cubby house in the backyard (low performance system) without understanding the the physical characteristics on the materials I'm using, but if I want to build a sky-scraper (high performance system) I need to understand the actual physical materials and how to use them effectively.


> I feel like we'll always be leaving performance on the table if we don't understand the characteristics of the hardware we're developing on.

Absolutely. Mine is not a suggestion to ignore performance, but to build languages that push programmers towards doing the right thing for performance and maintainability. These do not have to be at odds, except often at the extreme of performance.


That all depends on what type of software you write. I don't think there'll ever be a language that ticks every requirement box.


You can’t get all the way there but it’s a North Star, I think.


> So far, the only programming language I know of that supports this type of crazy data transformations is JAI,

As far as I know JAI is indeed the only language that explicitly lets you switch between AoS and SoA with one bit, but the APL family of languages - (APL, J, K, Shakti, and a couple more) has basically - for 60 years no - taken the "data oriented" SoA approach for storage, and provides the language support that makes it as easy to use as the "object oriented approach" AoS. (For some definition of "as easy as" - the languages themselves are generally not considered easy to use, but within the language, treating things in either way is straightforward, at most a "flip" away, but usually not even that is needed).

Additionally, Nim macros (and I suspect Rust and D as well) allow this to be a library-level thing as well. Lisp does too, of course -- but Nim/Rust/D are much closer to the Algol family-and-friends list given in the article.


I also think that it's a shame he didn't bring up NumPy & Pandas, or R. He's just created a data frame and then complained that there's no functions to sort it, but we do have those. They are here.

    ants = pd.DataFrame({
      "name": ["bob", "alice", "carol"],
      "color":["red", "blue", "red"], 
      "age":[1.1, 0.5, 1.2], 
      "warrior":[True, False, True]})
    # Or read_csv to get the data in.

    # Count number of warriors.  True => 1, False => 0
    ants['warrior'].sum() # Returns 2

    # Count old red ants
    ((ants.color == "red") & (ants.age > 1.0)).sum() # 2 again

    ants.sort_values(by="age")
I think Pandas and NumPy should be up there as making the language easy to use in a SoA way.


Yeap. I think Pandas could and should see a lot more use outside data science as performant in-memory data storage in Python.

The best thing is? Most data scientists don't even care about SoA vs AoS - tabular structure is easy to grok, easy to use AND performant by default!


By coincidence, there is a new Nim blog post demonstrating exactly that. With the same ant example. https://nim-lang.org/blog/2021/05/01/this-month-with-nim.htm...


ISPC also has first class support for "(array of) structure of arrays", see: https://ispc.github.io/ispc.html#structure-of-array-types

For example:

  soa<8> Point pts[...];
declares an array of struct of arrays, where each inner struct array contains 8 elements. This would have the advantage of playing well with the streaming prefetcher if you're working with all fields of the `Point` type, and allowing the compiler to use and increment a single pointer for loading/storing, accessing each field with a small compile time offset (that can get folded into addressing calculations, making them essentially free).

Julia's StructArrays package is also very convenient: https://github.com/JuliaArrays/StructArrays.jl



> Additionally, Nim macros (and I suspect Rust and D as well)

Aye somebody posted their SoA library on /r/rust just a few days back (https://crates.io/crates/soa_derive), and I don’t think it was the first such.


Impedance mismatch. Languages don't give the compiler enough semantic information to work with, and some require heavy runtimes. Compilers, on top of having to infer semantic information about the above, aren't able to sufficiently utilize the hardware, either due to constraints on the types of analyses done, or lack of control. The hardware exposes an API too high-level or too unfamiliar to existing compilers.

IMO, the future is bright, because all these layers are being re-thought. However, phrasing this as a debate over programming paradigms is a red herring. If coupling data and behavior is a useful mental model, we ought to accommodate that while giving the compiler more to work with.


If you want to utilise this data oriented approach in rust, check out soa-derive[0]. SoA stands for struct of array, rather than the conventional array of struct.

[0]: https://crates.io/crates/soa_derive


I hacked up something similar (https://gitlab.com/mbryant/functiontrace/-/blob/master/src/p...) a while ago. Your variant looks much more useful!


Yeah I'm a big fan of proc-macros. If utilised well, it can do some really great things


This is amazing!


I can remember the columnar approach used in Fortran back in the day. Fortran did not have records, only scalar arrays, so a natural way to represent a bunch of "objects" was to have a bunch of arrays for each property value.

IDK if such prior art is any helpful today, though :)


This approach is still used (conceptually, at least) in statistical computing. NumPy and R are good examples of this approach, and also have solutions for the indexing problems outlined in the article.

That being said, statistical computing is at the mercy of the high cost of matrix multiplication.

I think that you can alter the order of arrays in numpy, and examine the performance difference.


I think the desired functionality is closer to Pandas than Numpy — you want coordinated arrays of different data types, not higher-dimension arrays of the same data type.

R and Pandas aren’t exactly known for their outstanding native performance (they’re generally fast after converting dataframes to arrays and outsourcing to C or Fortran) so they’re great demonstrations of the syntax for some of these indexing problems but leave a lot to be desired.


ROOT TTrees will "split" class members to effectively transform them into AoS although this is mostly for IO efficiency.


Bullshit!

The most important feature for the utter most of enterprise software is to be maintainable over requirement, managing and employee changes.

Such optimizations as those described in the article make sense only in a handful of situations, and using them must not be a light-heart decision, because their cost is big.

I fully agree with going all in when it makes sense, but we all know what is the root of all evil.


> Such optimizations as those described in the article make sense only in a handful of situations, and using them must not be a light-heart decision, because their cost is big.

This is really not true. Inheritance is not only slow it is needlessly complicated. It introduces dependencies and indirection while gaining nothing.

People who have never programmed before think having a car class inherit from a vehicle class makes sense. Eventually they realize that what they have are arrays of positions and velocities that they need to combine. That could be multiple files with lots of boiler plate nonsense or it could be a single loop.

> I fully agree with going all in when it makes sense, but we all know what is the root of all evil.

If you are implying optimization, you should realize that that quote comes from someone telling his students not to noodle the tiniest details like post or pre increment in their for loops to get the compiler to output one less instruction when their program isn't even done yet.

If you want modern software to run fast, you need to architect for it first. Then you can make things work, then you can profile and optimize. This idea that there are no consequences to the early design decisions you make is a gross distortion of the context of this quote.


There is a Nim library inspired by this blog post that simulates array-of-structs with struct-of-arrays (implemented through macros):

https://github.com/liquidev/datarray


Wish normal people witnessed how few innovations are being introduced on the CPU/GPU side in the 21st century. The high-performance software industry is low-key carrying the hardware architecture industry on its back. Shaders, pipelines, coroutines, etc are there to make the programmer's job easier in the end.

On my old nokia 3110c, I could access my calendar and appointments instantly by clicking right on the d-pad. On my Nokia Lumia 710, I had to wait for a splash screen. On my Nokia 3.1, I had to wait for internet synching. The difference between slow and fast software far outstretches that of slow and fast hardware.


> The high-performance software industry is low-key carrying the hardware architecture industry on its back

I don't understand this statement. To me it evidently looks like the opposite: the software gets worse (because most programmers don't even learn how CPUs/GPUs really work), and the hardware has to make up for it.

The reason the calendar loads up slower is not because the hardware got worse, it's the software.


I can't believe anyone would say this, it is basically the opposite of reality.

CPUs and GPUs have come a long way in two decades. Even CPUs are incredibly fast, but between memory allocations, memory layout, multi-threading and just weeding out terrible algorithms, most software can probably be sped up by 100x.

> The high-performance software industry is low-key carrying the hardware architecture industry on its back.

That doesn't even make sense. Do you think GPUs aren't more powerful than 20 years ago and that the difference between quake 3 and Order 1888 is software?

> On my Nokia 3.1, I had to wait for internet synching. The difference between slow and fast software far outstretches that of slow and fast hardware.

Right - doesn't this contradict everything you just said?


These are steady, predictable, iterative improvements. We have nothing groundbreaking like 1970s-1990s. Of course I'm not saying modern hardware is slow, but it's not progressing like it used to.


That is both not true and not what you said at first. Transistor density has continued to rise but there is only so much you can do when people mostly run javascript on a single thread.

There were a lot more break throughs in the early days of electronic components too because everything was new.

CPUs are a world away from where they were two decades ago, but people don't notice because typing into a facebook window still lags. It is an incredibly superficial nonsense way to look at what has really happened at the hardware level.


If you only have a single core (common on battery powered devices) then you should only run a single thread. Excess parallelism doesn't make things go faster and uses more memory. The important thing is avoiding unnecessary waits on non-CPU resources like I/O.


This reads like a reply to a different comment. The person above was saying there haven't been any advancements in CPUs or GPUs which is obviously not true. No one said anything about battery powered devices and even those have had multiple cores for many years now - as a result of transistors still shrinking.


They're multicore devices but often only one core is free. Depending on how important you are, the rest are busy doing other things and you're not going to get scheduled.


This thread was someone saying there haven't been any advances in CPUs and GPUs in general.

You seem to be in a completely different goal post shifting context of some mythical mobile device that has only one core, but actually has multiple cores, but all them are pegged at 100% all the time. This doesn't make any sense in any context since it isn't even true, but it has nothing to do with what you are replying to.


Did you also notice how the need for interface and type abstractions goes away? If you have a function that sum()'s a list of numbers, you can use that to sum() any of the Ant's members by passing a ref to the list. Otherwise you have to write AntSummer classes that take Ant (and all the children of Ant's class).

It does occur to me that there's a certain type of programmer for whom this is obvious. And that type of programmer was not using classes anyway for the reasons above. For that programmer, objects are a massively overused hammer in software design. I know those types, and they're very good. It might be there's too much brainpower wasted in designing object hierarchies that could go to exploring functionality or producing value. Or it might be coincidence.

I'm not really that guy necessarily, but for simply reasons of communicating and comprehending designs I always thought objects more than a few members were just plain wasteful. There just never seemed to be a reason to have all those mutable hidden member variables unless it was to abstract something like a data structure.


Another orthogonal aspect with modern hardware is the increased amount of parallel execution. Most of the standard programming languages are not really designed to support this well. So we use extensions like CUDA. But this is not really general purpose but only for GPU.

Once we reach maybe 100 cores, or 1000 cores, or some orders of magnitude more cores in the CPU, we have to have better general purpose language support for this. Most memory is local for some subset of cores. There are warps which execute the same code (SIMT).


What's this "once"? This was a theory of future CPUs from maybe a decade ago that totally hasn't happened. We're instead getting increasingly smaller few-core CPUs in things like smartwatches. In laptops, M1 replaces a hyperthreading CPU with one that runs a single thread much faster.


> What if a programming language would provide us with a structure that would act like an array of structs, but internally it would really behave like a struct of arrays?

I do this all the time (well, it’s not uncommon) in C++. The nice thing is I can mix the two with a bit of manual glue so callers don’t even have to know (though if you are doing this it’s often worth letting some callers know for the reasons described in this post.


I’ve never been able to get my head around the “anti-optimization” mindset. Like... what else should we be doing? If you’re going to be writing code anyway, you might as well take optimal advantage of the hardware while you’re at it. Now, some people point out that attempts to optimize sometimes backfire, but that’s a different problem - that’s something you should work on improving though practice (and measurement).


"Anti-optimization" advocates doesn't say "optimized code is bad." It means "other things are more important than optimization."

One of those things is the amount of work required. For example, if it takes 30 minutes to get a "good-enough" solution, and 2 days to get a "very optimized" solution, most of the time you should go with the good-enough solution, because most of the code you're writing isn't performance-critical.


> What if a programming language would provide us with a structure that would act like an array of structs, but internally it would really behave like a struct of arrays? We could program in the typical object oriented way that is convenient for humans, while still enjoying in great performance due to playing nice with the hardware.

> So far, the only programming language I know of that supports this type of crazy data transformations is JAI,

Haskell supports this type of data transformation.

http://hackage.haskell.org/package/vector-0.12.3.0/docs/Data...

> unboxed vectors of pairs are represented as pairs of unboxed vectors.


Haskell uses linked lists everywhere, so any transformation like this likely does nothing for speed.


Haskell uses linked lists as a metaphor for generators (since it's a lazy language), which is a strange decision since linked lists are a bad data structure, so you have to write the program to avoid accidentally ever doing what it says it does.


> linked lists are a bad data structure

Why is that? They're perfectly fine for most purposes, and have the advantage that when you pass one across a function boundary the callee can play with it or modify it or do whatever the hell they want and when the callee returns, the caller's linked list is untouched. That's an incredibly valuable feature for correctness and safety.


They have poor cache performance when you want to iterate over them, and more size overhead, because there's one pointer lookup per item. Also, the common definition of a linked list has the list responsible for allocating each item, meaning it can't be a member of more than one list at once. The actual useful version of this is called instrusive linked lists, which isn't the one taught.

> That's an incredibly valuable feature for correctness and safety.

Are you thinking of something else? That's an effect of call by value, but linked lists actually aren't good at that and you have to copy them to pass them around. You need something like these to make it efficient:

https://en.wikipedia.org/wiki/Finger_tree

https://en.wikipedia.org/wiki/Read-copy-update


> Are you thinking of something else? That's an effect of call by value, but linked lists actually aren't good at that and you have to copy them to pass them around.

Are you thinking of something else? Haskell (and all other ML-family languages, and Lisps) lists are passed around by copying a single pointer to the head element. Different lists can share tail elements, i.e., elements can be members of more than one list.

All this is in contrast to Java-style linked lists, where there is a list object that controls its elements.


> Are you thinking of something else?

I think the point is to contrast them with dynamic arrays --- linked lists, unlike arrays, are persistent data structures, which does make them easier to reason about.

> Also, the common definition of a linked list has the list responsible for allocating each item

I'm not sure how this is relevant in the context of Haskell, given that it's garbage collected.


The given link describes unboxed vectors. This is Haskell-speak for "actual array of actual machine numbers".


> The programming languages listed above are all over 20 years old and their initial design decisions, like Python’s global interpreter lock or Java’s everything is an object mindset, no longer make any sense today

I wonder what general purpose languages (as the mentioned Jai is meant for games primarily) will be the best fit for modern architectures, besides obvious things like doing it in C and just doing everything manually to fit the machine nicely.


It's the APL/J/K/Fortran model. It fits modern CPU and memory architectures, it fits GPUs, and slow external storage.

And rather surprisingly, it always has, even when memory latency was 1 cycle and there was no prefetch (for which you have to go back to the late 70s). The first APL implementation is from 1962 or so, first Fortran is 1955 or so.

R and Numpy use it to deliver their speed while still providing high level semantics.


I too suspect "data oriented programming" is just array programming rebranding itself.


> I wonder what general purpose languages

I'm looking forward to zig. The author is taking a lot of care and effort to make sure the content works on legacy systems as well as modern systems, so I think the level of flexibility that entails will also mean the flexibility to adapt to the future.

Also, IIRC zig has an AOS to SOA transformation in its stdlib? In any case, they have been refactoring the language parser to use SOA to great effect:

https://github.com/ziglang/zig/pull/7920


I don't see any particular reason why Jai shouldn't be usable for other stuff than games, there are no baked-in assumption about any gamey nature of programs.


I'm late to the party but his table is missing an entry

   Event            Latency   Scaled Latency
   Network Access   50-200ms  5-20 years.


While this article starts to discuss the low level details that underlie our code, I feel like its incomplete.

* Latency hiding -- CPUs work extremely hard on latency hiding. A lot of those OOP-indirect calls will be branch predicted by the CPU in the simple cases (ex: if one object-type is more common than others, then those indirect branches will be branch predicted correctly in most situations).

There are other bits of latency hiding: Hyperthreads, Out-of-order execution, pipelines.

* Cache: L1, L2, and L3 caches can grossly mitigate a lot of the problems. While CPUs today are far faster than DDR4 RAM, L1 cache scales perfectly with the CPU. If you can keep your code+data under 64kB, you'll always be executing in L1 cache and basically get the full benefits.

If 64kB is too restrictive, then you have 256kB or 512kB (L2 cache), or 8MB (L3 cache), both of which scale with the CPU.

------------

SIMD and Multicore goes backwards IMO. It turns out that the 1970s and 1980s were filled with very intelligent programmers working on extremely sophisticated supercomputers. I enjoy going back to the 1980s and reading Connection Machine articles on C-Star (a parallel language that's arguably the precursor to CUDA), or Star-Lisp.

--------

If you write multicore software, then SMT / Hyperthreads are very effective at latency hiding. Running two threads on one core (or 4-threads or 8-threads, in the case of some IBM processors) allows your cores to "find something else to do" during those long latency periods where your CPU is traversing memory and waiting for RAM to respond.

Working extremely hard to optimize just a single thread seems somewhat counterproductive. Its often easier to write "less optimized" multicore software than highly optimized single-core software.

And SMT / hyperthreads will converge those onto a single core ANYWAY. So might as well take advantage of that feature.


> * Cache: L1, L2, and L3 caches can grossly mitigate a lot of the problems. While CPUs today are far faster than DDR4 RAM, L1 cache scales perfectly with the CPU. If you can keep your code+data under 64kB, you'll always be executing in L1 cache and basically get the full benefits.

I think even now you're underselling this. Depending on the types of transforms you're doing, an object-oriented (or row-oriented) instead of data-oriented (or column-oriented) approach can be faster due to caching. With OOP approaches, the object you're operating on will basically always be in L1 cache. On the other hand, if your ants name and age are in different columns and each column has 10000 things, you'll have to load both into L1 cache separately.

Stuff like vectorization can further complicate things, but it isn't' as simple as data-oriented is more performant.


The real answer is you almost never need to solve this silly problem. If your data is static, why are you counting through arrays? If its dynamic, your n is probably low enough it doesn't matter or you're using some better data structure than an array.

If you do hit this kind of problem its usually in games (lots of dynamic user generated content) and even then the n is usually negligible. The parts where this does matter are already very data oriented. (Although not just in the way the article implies. Think vertex buffers and bitmaps.)

Point is, languages could probably add some kind of column oriented type of array and compilers could efficiently loop through them but it probably wouldn't really help all that much.


Would have to agree, common data structures like linked lists and maps can't really be easily composed into this sort of sequential layout as required, and the main advantages only come about when you're searching through the data for something.

The other data we process a lot of is strings which are commonly variable in length and scattered around in separate mem allocations.

Still, the performance difference between the 2 examples was eye-opening so I'll definitely spend a bit of time looking at Data-Oriented Design book for some ideas.


"What if a programming language would provide us with a structure that would act like an array of structs, but internally it would really behave like a struct of arrays?"

What about SQL when using a columnar data store?


An array of structs is already a struct of arrays if you're willing to do some offset math.


That would make the whole thing moot. The crucial difference between AOS and SOA is that the "physical" memory layout is different:

AOS: [xyzw][xyzw][xyzw][xyzw]

vs

SOA: [xxxx][yyyy][zzzz][wwww]

If you have such a layout, a SIMD instruction touching 4 packed words has quite different behavior. Arguably, SIMD is pointless without SOA or AOSOA.


> SOA: [xxxx][yyyy][zzzz][wwww]

That difference only occurs if, for some random reason, you came about what's the address of element zero of that array, or if for any even rarer reason you have a hard requirement of when you cease to process x and start to process y, and you find a single memory jump prohibitively expensive.

Meanwhile, AOS is already by definition [xxxx] and [yyyy] and [zzzz] and [wwww], given the remaining members of the struct are treated as padding.


> Meanwhile, AOS is already by definition [xxxx] and [yyyy] and [zzzz] and [wwww]

There must be a misunderstanding here. AOS is by definition [xyzw][xyzw](...), the (C/C++) compiler has no leeway in rearranging this. I'm also assuming that there's no padding necessary, in either case.

Perhaps my way of notation is confusing, in that case I'm going to refer you to this page, which has code examples:

https://software.intel.com/content/www/us/en/develop/article...


> So far, the only programming language I know of that supports this type of crazy data transformations is JAI

I believe zig supports this too, and it's very encouraging that the creator has recently done a refactoring of the language parser taking a data-oriented approach, so that has surfaced pain points to be repaired for folks who would like to do things that way.


It's interesting that Java/JVM is both one of the worst cases and best here. When you work only with arrays of primitives its performance is shockingly good and aligns well with the needs for locality. Hopefully future improvements extend that to include structs/records and then it may actually be quite a powerful platform in this regard.


“Programming languages are old, therefore they will never take advantage of our hardware.” Have you looked at how compilers have changed the past 30 years? How they take advantage of SIMD? AVX? The LLVM-revolution? It’s delusional to think compilers will never be smart enough to vectorise object-oriented code, which is the main gripe of this article.


No, the main gripe of this article is data locality, not vectorization. Those two are completely orthogonal and complementary: you can have data locality without vectorization and vectorization without data locality, and both bring performance improvements. But vectorization bring a constant speedup (from x2 to maybe x16 depending on the ISA of the processor), but data locality can improve performance much more because memory accesses are thousands of times slower than processor cycles.

But if the semantic of your programming languages doesn't allow your compiler to change the layout of your data in memory, then there is almost nothing the compiler can do to improve data locality in your application.


Struct of arrays layouts does put you in a good position to apply vectorization techniques though. The article is rather more about columnar vs row formats for program entities than it is about locality generally. Columnar databases often use vectorized evaluation strategies because it's low-hanging fruit.


The impetus for using a columnar format is precisely to get data locality. It is absolutely about data locality generally. You wouldn't take his approach at all if you didn't get the locality with it's associated performance improvements. It would be strictly more work for no gain. Locality is the entire point.


This article is about transforming memory layouts, which compilers don't do. GCC has some really really basic support for it which never works.

Autovectorizing doesn't transform memory layouts, which is part of the reason it doesn't work that well either.


The sad part is chris lattner (llvm author)'s phd dissertation was on that exact topic. And yet nothing seems to have come of it since despite llvm's hegemony. (https://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html)


Article seems to suppose the entities will only ever exist in one collection. If that’s the case then sure remap the field layouts to behave better in cache. But the second you want to reference only a single element of that collection, or reference many elements of many different collections, you’re now dealing with managing N pointers for N fields, which of course all get invalidated when the collection is modified, and ha w lifetimes tied to the lifetime of the collection. As opposed to the single object with its own lifetime. And of course all the cache locality arguments go out the window.

The takeaway seems to be: if the entity in an atomic collection of items, represent that collection as the entity in as efficient manner as you see fit, rather than crating entities for each one. This makes sense, and is not out-of-line with OOP reasoning: each object corresponds to an atomic logical unit.


The data-oriented design approach is and has been standard practice for doing numerical simulations on High Performance Computing (HPC) environments. Still the problem of language support and syntactic sugar to make this style less error-prone and easier to do needs more work.


I think this approach applies even more so for GPU compute as it’s the only sensible way to code these devices.


Something I genuinely don't understand "buttery smooth 3D game"s are created with C++ too but why isn't C++ an issue for games?

Seems to contradict the premise which is that programming languages were designed in the past and hence not efficient for modern computers.


If you use C++ in the "default" object-oriented way, your program won't be as efficient as it could've been. Computers are fast, so you can still make games which run well, but not as well as they could've been.

C++ can also be used in ways which are really fast on modern machines. You can, for example, write code in a data-oriented "struct of arrays" approach with C++. The argument is just that C++ isn't designed with that in mind, and that a high performance language designed today could be designed with modern machines in mind.


> If you use C++ in the "default" object-oriented way, your program won't be as efficient as it could've been

That’s way too general to hold any truth. C++’s classes are zero cost, SoA vs AoS is not always relevant. If I have three instances of my class, each doing something compute-intensive in a simple method, how is it different to dod?

Also, even virtual functions are not necessarily any more expensive than the alternative if one knows what he/she does.


The context of the discussion is one where the difference between SoA and AoS is significant. It’s not always true in general that your program won’t be as efficient as possible, but it’s true in the context we’re discussing, where you have a lot of things (entities or whatever) and you’re iterating through them, which is what is being discussed.

I was writing a direct answer to the question of “how can it be true that C++ isn’t built for modern machines even though modern games with good performance on modern machines use C++”. I think the answer is entirely correct in context.


Fair enough, sorry I misunderstood your point.


C++ provides programmer control over memory layout, which is something managed languages don't provide. So although C++ is an older language and it isn't designed for modern machines, it does allow you the control to make performant programs for modern machines.


C++ is an issue for games. Most game engines implement a garbage collected script language (e.g. Lua, CLR, JavaScript...) to implement huge parts of game logic.


They’re not implementing game logic in those languages for performance reasons though. The question, as I interpret it, is about why C++‘s performance isn’t an issue for games. Using those languages for scripting/game logic is a trade-off where performance is sacrificed for developer productivity.


You are right. I misunderstood the question.

I thought it was about the language as efficient tool not about performance. The Unreal engine is not even using data oriented design (at least for most parts), so I was assuming C++ sufficient efficiency was out of question. I agree with your argument, that the games would probably be faster. Game object processing is probably not the most pressing issue in games regarding performance, so I guess the impact would be not that visible (when talking about the stuff that is implemented in C++ in game engines).


The article seems to claim that if we did as he says and rewrote all of our code to be data-oriented instead of object-oriented, then our code would run ~2X faster and all of our performance problems would be solved. However, I got my first 64 thread server CPU 15 years ago and my first 64 thread desktop CPU six years ago, so I could get a much larger increase by instead keeping my OO code and making it concurrent (64X > 2X). Also, if being 2X faster would make everything better, we could just wait 18 months and let Moore's law take care of the 2X savings, rather than the much longer period of time it would take to redesign our languages and rewrite all of our code.


Why can't a compiler optimize that? Perhaps I'm naive, but it seems totally possible for a compiler to have some sort of heuristic that allows it to do this transformation behind-the-scenes when it makes sense


The ironic part is that the compiler can't because the languages are fairly low level for performance's sake.

C and Rust make various guarantees about data layout which programmers rely upon. C compilers cannot turn an array of structs into a struct of arrays because the layout of arrays is guaranteed, and needed to make common and trivial C accesses work which work by pointer arithmetic.

Turning an array of structs into a struct of arrays is indeed an optimization that would be possible in a language that did not make such guarantees about data layout. Perhaps it would be interesting to have a data structure that provides less guarantees on lower level access that in turn could be more aggressively optimized.


It is technically also possible in C as long as the program can't tell the difference (thanks to the as-if rule). This is hard to prove though and in practice requires whole program compilation.


The C standard provides guarantees about data layout even if the compiler can prove it will never be used internally, for the sake of, say, external debugging.


It doesn't require this as long as the program has the same side effects - the contents of memory are not a side effect necessarily.

The problem is that there are lots of operations which imply the existence of part of an object, like passing around pointers or allocating single objects at a time, and that would disable these layout optimizations. An optimization that's easy to break isn't useful even if it would help a lot; you want them to be predictable more than anything.


And C compilers routinely optimize structures in such a way that they are impossible to inspect in a debugger. Because of SRA, non escaping aggregates might even not be allocated on a memory location at all.


What is an "escaping aggregate"?; this term has 80 hits on Google, none of which seem to be related to programming.

Looking it up “aggregate” is a C++ term for which the standard indeed does explicitly allow various optimizations and guarantees about data layout are weakened; this is not the case with C arrays.

Do you have any practical evidence to C compilers altering the data layout of an array in any way?


GP’s comment is referring to Scalar Replacement of Aggregates on the one hand. This is an optimization where by the compiler will, for a function with a Struct argument, only pass in a pointer to / copy of the actual element of the struct used inside a function. Hence the name, the compiler is replacing the aggregate struct by one or more of the individual values it contains.

The reference to non-escaping aggregates, describes the compiler optimizing a local only struct by keeping the individual elements in registers during usage, and because the struct as a whole does not leave the local scope, these elements are just discarded when that scope closes.

Both of these transformations/optimizations are performed by all three industrial C and C++ compilers (gcc/llvm/msvc).


Exactly. Thanks.


Which sections of the standard are you referring to?


Disagree about garbage collected runtimes.

A lot of widely used software is written in C or C++ and uses malloc/free extensively (C++ new/delete is mostly a wrapper around it). This results in memory layout worse than an equivalent managed heap would be. Happens because the memory allocated on C heap is immovable, while garbage collectors may move data around to defragment the heap.


Actually...https://arxiv.org/pdf/1902.04738.pdf

On a more serious note, most c programs I see don't do very much of that, preferring to allocate fixed-size buffers on the stack. Additionally, most GC'd languages that aren't java have crappy GCs, and those GC'd languages that are java lack value types. This harms spatial locality quite a bit, and compaction can't fully compensate for that.

They are working on adding value types to the jvm, though; it will be very interesting to see how performance changes. (I expect not very much in practice—those projects for which performance was important were already using whatever tricks they needed to—but perhaps it will improve the ergonomics of performant java.)


> preferring to allocate fixed-size buffers on the stack

When working set is measured in megabytes, it fits in L3 cache of modern CPUs. Memory layout is not too important for these programs.

> most GC'd languages that aren't java have crappy GCs

C# is good too. It also has value types, native memory spans, SIMD intrinsics, and stackalloc.


Don't forget that good cache locality also can cause data being pulled into cache that the prefetcher did know nothing about.

I can create you a shitty linked list that fits perfectly in L3, but still has terrible cold cache performance because each individual cacheline has to be pulled in one by one.


Indeed, there're edge cases. Still, how many people using linked lists on the stack?


Doesn't D have value types via structs since by default they're stack allocated and copied? Though I'm not sure how well it's GC performs when you do allocate on the heap.


Oh yes, plenty of GCed languages have value types. But they're not java and their GCs are garbage. D's is particularly bad, probably 30 years out of date.


Data-oriented designs like ECS implementations often use generational indices, which can support relocation and defragmentation.


There are several versions available of Scott Meyers' CPU Caches and Why You care talk:

- https://vimeo.com/97337258 - https://www.aristeia.com/TalkNotes/PDXCodeCamp2010.pdf


> You may even realize that all of that enterprise cruft is not really necessary

I will never understand why so many Java developers insist that every public value must be implemented as a private value with get/set with no exceptions.

Or that every class needs to extend an abstract class that implements an interface.

Haven't they heard of YAGNI?


It’s true that compilers won’t optimise memory layout but I would argue this is an issue for only a subset of the software engineering industry. More important is robust bug-free code with separation of concerns which these tools allow. Remember the first rule of optimisation: “don’t”. The second rule of optimisation: “Don’t...yet”.


Those two are followed by a third rule: "whoops it's too late now, we'd have to rearchitect and rewrite everything and we can't justify the cost"


At some point there's an assumption that OOP and DOD are incompatible, while they are actually orthogonal.


Back in the real world the best code is code that ships.

Occasionally I find myself doing something stupid ( if you use Unity and try to do advance distance calculation in Update your going to have a bad time ) , but usually I never have to optimize. Modern computers might as well be magic


The first approach creates a list, which means cheap insert/delete. But it's not a fair comparison, as the second approach doesn't let you do that. For that, you would need to allocate an array of objects in the first approach instead of a list.


If memory controller could be given a mask for what bits or bytes are wanted, then the memory overhead could be reduced without changing the representation in the programming language.

At that point the compiler can optimize memory access by generating the mask.


If you had the opportunity to change the entire cache hierarchy at a hardware level and make all the compilers support that, you could do a lot of things.

You don't. You control the software.


> While not having to think about the memory is going to make you more productive, that productivity comes at a heavy price.

I highly recommend "Code Optimization: Effective Memory Usage" if you want to think more about these problems


Jai can't come out fast enough. Don't think for one instance that the language is only for writing games; it will serve equally well for writing any kind of software where the performance is nice to have.


The metaprogramming available in languages like Rust and C++ could maybe be leveraged to make the struct-of-arrays model easier. Wouldn't be the same as native support, though.


Julia has a package that provides incredibility easy to use SoA. https://github.com/JuliaArrays/StructArrays.jl Julia's metaprogramming and multiple dispatch mean that it can use user structs fully transparently. Everything just works.


I've seen this done in D fairly easily, almost definitely closed source unfortunately.


There is some, lets call them experimental, C++ libraries for that. But C++ not having static reflection is the main show stopper on that frontier.


> There is some, lets call them experimental, C++ libraries for that.

Could you provide some links?


https://github.com/Yamahari/struct_array

It uses C++20 features and macros. The macros limit it to a POD count of 8.


Thanks!


>The hardware has improved, but the languages didn’t

I kinda disagree with author, because """languages""" improved

Let's take a look at recent features in C#:

Spans and CPU Intrinsics


Why wouldn't a new programming language like python nowadays decide to use a Gil? I thought the point was to make the interpreter easier to write.


A global interpreter lock made sense in the 1990s when multiple cpus was rare, and when it happened, it was usually two.

Now, it's hard to find a system with just one cpu. Maybe something really embedded; but IoT stuff sometimes is multiprocessor these days. Anything intended for regular computers needs to be prepared for 1-16 cpus, and more isn't hard to find if you need it. Building a new programming language with a big constraint preventing it from using modern computing seems like a mistake. That doesn't mean an early version wouldn't start with a big lock, but fine grained locking is probably needed before release.


> 4. Data oriented way of programming has its own set of problems.

This should be a top reason. Author is too shy to reveal the true cause.


Data-oriented programming is not hard by itself, rather as a consequence of how programming is done and what sort of flexibility you get in return.

The traditional (or should I call "inverse") approach is to define what the program should do and make it modular enough so that parts can be changed easily. For this to work well, functions are given a state snapshot to read/operator or change. Encapsulating this state into a struct/object allows functions operating on the same data to be extended easily, without having to think which functions do what, when and where they're called.

Thinking about locality though does force you to think not only about program structure, but overall program flow, which can be much, much harder to implement in a flexible fashion.

Struct-of-arrays or arrays-or-structs has little to do with it, although it's a common pattern seen when working with some kind of highly structured datasets.

Look at high-performance graphics pipelines to see how this affects the structure of the code. There's a lot of duplication which is required to handle data which is structured in a different way for efficiency, something which is not solved by just SOA/AOS swapping, but requires thinking of what data is used, what are the access patterns of the algorithm and how to pack the data so that locality is optimized. It's also harder to change as a consequence.


I have experience with performant cache-friendly code. Yes, data oriented approach can be useful in some fields. But author blames the whole industry having low demand of performant code and wants python to accommodate cache-friendliness. This is a high level BS.


Data oriented programming isn't a framework or paradigm. It's simply a principle that says that programs should be written to solve particular problems on particular hardware. If objects serve a purpose to this end, then they are also data oriented programming.


Extremely well written and communicated.


It's not a fun language to work in, but SQL does a lot of what the author is asking for.


Why aren’t hardware designers designing hardware for the languages and methods we have?


They are (through prediction, reordering, etc in the CPU), but manual approaches like hinting or VLIW have always turned out to fail, and the automatic approaches we're using now turn out to have security issues like Spectre.


There are physical limits. E.g. clock speed can’t go so high that information must travel across the cpu faster than the speed of light. E.g.2. As caches get bigger, it takes more time to index them.


This article makes a bunch of invalid (sometimes implicit) assumptions or claims:

*A 20 yro language's design decisions don't make sense today because of the hardware perf situation change.*

1. Most languages were not designed for optimal utilization of the specific hardware situation during their conception.

2. Even languages designed with performance in mind somehow are still incredibly general and flexible things.

3. The article itself shows that CPU hardware has been changing along a mostly predictable path for decades. Everyone knew Dennard scaling would end, even if you couldn't predict it to within the single year. Everyone saw CPU latency was decreasing much much faster than memory latency, already in the 1980s.

--

* The main problem that we have today lies in utilizing the CPUs to their full potential.*

This is _a_ main problem, not _the_ main problem:

* There are other, specialized processing hardware (like GPUs).

* There's computation happening elsewhere on the system, e.g. on disk controllers and NICs.

* And actually, if you look at desktop environments and apps: They really don't need to utilize the CPU to its full potential; they're a mess of entanglement of things which get in each other's way and it's difficult to figure out why your system ends up being sluggish despite your brand-new shiny hardware.

--

*Another problem with the current programming languages is that they didn’t evolve beyond the C-style type of programming. *

* C-style type of programming is just fine for C-style kinds of programming tasks. Every language has strengths and weaknesses. Some pairs of languages have a more obvious "X is an evolved Y" or "X is a better Y" relationship, but not that many.

* Most languages do not evolve or change to exploit specific hardware capabilities. Or at least - few language changes are intended to achieve that.

--

etc.

It's not a junk article though. The data orientation of program design is a useful notion, in many programming languages. It's just that, instead of reading this blog post, which quotes Mike Acton, you can just watch Acton's talk on this principle:

https://www.youtube.com/watch?v=rX0ItVEVjHc


Another example:

> “Use const and the compiler will optimize that away!” Nobody knows what the compiler will do, if nobody checked what the compiler did.

Rust (in the Rustonomicon if nowhere else) specifically promises that const gives you the same results you'd get by calculating and manually inlining the constant itself everywhere. It even spells out all the things you thus can, and can't do in a constant so that your compiler can definitely correctly evaluate the constant from the point of view of the target architecture despite potentially different hardware.

e.g. if you tell Rust

  const BAR :isize = isize::from_be(0x1234);
You've conjured a constant BAR whose size and data layout depends on the target architecture, but the language promises that figuring this out incurs no runtime cost, the work will be done at compile time.

Indeed Rust will refuse to compile your program if it can't figure out how to do this, explaining what went wrong, e.g. that same above snippet would not have worked years ago, because isize::from_be() was not yet marked as suitable for such endeavours and so the compiler wouldn't attempt it.

If you mark a function the compiler can't evaluate as being suitable, the compiler will reject that too, even if you don't use the function, somebody else might and it won't disappoint them.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: