Hacker News new | past | comments | ask | show | jobs | submit login
Data-oriented design or why you might shoot yourself in the foot with OOP (2009) (gamesfromwithin.com)
236 points by tempodox on June 28, 2021 | hide | past | favorite | 359 comments



In all my experience with OOP, it's always been inheritance that is the root of all evil. Rust and Go got this correct by having class-like objects with no inheritance, to achieve encapsulation without fragility.

Unfortunately, all the other languages that included inheritance in their design can't wish it away. Devs are going to keep reaching for inheritance as the closest, most comfortable abstraction.


>In all my experience with OOP, it's always been inheritance that is the root of all evil.

It's not, though, and the fact that people keep repeating this meme shows that most developers don't even bother thinking about issues they face beyond superficial blamesplaining.

The reason inheritance causes so many issues in languages like Java is because they are statically typed and also use classes as types[1]. Classes must be somewhere in the inheritance tree, hence you are forced into some place of that tree. To make things worse, Java has many keywords that restrict what inheritor of a class can do (private, final, etc).

Inheritance is much less troublesome in, say, Smalltalk, since the language is dynamically typed. If someone expects you to implement Foo, you can (almost always) just implement its relevant methods without explicitly extending the class. Thus, a whole host of annoying scenarios simply does not occur.

--

[1] BTW, this breaks one of the fundamental commandments of classic OOP: you should not depend on implementation details of an object, only on its message protocol. Obviously, it's impossible to be independent of implementation details if some library forces you to use a particular class.


You're describing interface-based polymorphism, which is what go and rust use. In go, I can have a struct with methods that implements a particular interface by implementing all the methods described in that interface, but I can't inhereit from another struct. The person you're replying to called this out as a better system too.


Polymorphism is good. You describe polymorphism.

Inheritance is bad. Inheritance is patching a class and overriding some of its methods, while leaving others intact. This brings all kinds of unexpected interplay between methods of different levels of overriding. A typical example is http://www.cse.psu.edu/~deh25/cmpsc473/jokes00/joke01.html

Ideally all "concrete classes" with method implementations should be final, and the polymorphism should be achieved via interfaces / typeclasses / traits, or purely abstract classes where these are not available. Reuse of implementation should be achieved via composition; there are several ergonomic ways to express it.


> Simulator supervisors report that pilots from that point onward have strictly avoided kangaroos, just as they were meant to.

Won't fix; working as intended.


> Polymorphism is good. You describe polymorphism.

It also seems like they're describing the nominal typing of Java versus a structural approach.


Haskell and, IIRC, Rust allow you to declare that a certain data type conforms to some interface, and describe how, by listing / adding the functions with necessary signatures.

This allows to have the upsides of structural polymorphism without losing static checks.

Go, OTOH, goes all the way structural.


> Haskell and, IIRC, Rust allow you to declare that a certain data type conforms to some interface ...

I believe at least in the case of Haskell, you are referring to type classes[0].

0 - https://wiki.haskell.org/OOP_vs_type_classes


I'm unfamiliar with anything like that in Rust, beyond Rust's structural approach to tuples. I'd love to hear more about it, though!


I think they're just talking about how you have to declare what trait a function implementation is for, rather than having it derived from the type signature alone. The `impl Trait`[0] syntax. In Go, you don't need to declare that the function implementations are being implemented for a particular interface, you just have to match the type signatures and function names.

Rust's way can help avoid some errors. You can't accidentally implement an interface, whereas in Go you can if you happen to implement a group of functions with appropriate names and type signatures. It's unlikely to cause actual bugs (you'd have to misuse the resulting implementation) but can be conceptually somewhat confusing.

[0] https://doc.rust-lang.org/book/ch10-02-traits.html#returning...


> It's not, though, and the fact that people keep repeating this meme shows that most developers don't even bother thinking about issues they face beyond superficial blamesplaining.

I don't know that I'd say "inheritance is the root of all evil" (there are lots of antipatterns in OOP that are unrelated to inheritance, like Joe Armstrong's banana-gorilla-jungle observation) but I will say that inheritance is pretty close to useless in the best case and harmful in most cases. And I say this as someone who learned to program and then became a professional programmer when OOP was all the rage. I was taught OOP without the previous bias of other paradigms; it was only after learning other paradigms that I was able to articulate frustrations I was having with OOP. The implication that people who criticize inheritance in this way "haven't bothered to think" is patently false in the best case, and laughably arrogant in the worst case.

> The reason inheritance causes so many issues in languages like Java is because they are statically typed and also use classes as types[1]. Classes must be somewhere in the inheritance tree, hence you are forced into some place of that tree. To make things worse, Java has many keywords that restrict what inheritor of a class can do (private, final, etc).

Fear not, Python is dynamically typed and inheritance is a mess there as well.

> If someone expects you to implement Foo, you can (almost always) just implement its relevant methods without explicitly extending the class.

This is just structural subtyping (see Go's interfaces for a statically typed example of structural subtyping) also known as "duck typing". It seems like you're positing that the problems with inheritance derive from nominal subtyping (e.g., Java's `implements` keyword), but these things are orthogonal. Python has duck typing ("structural subtyping") and its inheritance is no less painful than Java's. Similarly, Rust has nominal subtyping (a type must explicitly implement a trait) and it has none of the inheritance-related problems that Python and Java have.


I feel like OOP always had the nerd catnip problem. Since the very beginning the various programming tutorials would have the contrived examples of animals and canines and dogs, or geometric shapes and triangles etc. which just managed to ring a particular very satisfying bell in people's heads. It was just such a neat concept with those examples that just made sense. How it turned out in practice is a different story but I feel this had a lot to do with the enthusiastic uptake.


1983 Smlltalk-80: The Language and Its Implementation by Adele Goldberg and David Robson had pretty good example with none of this animal/mammal/dog crap. Not sure when the trend for giving awful examples like this really started, but I don't think it was "from the very beginning".


To me there are about three tiers of this basic insight:

1. Inheritance causes all kinds of issues so you shouldn’t use it.

2. Actually, inheritance is fine as long as you do it right (e.g. Liskov)

3. Actually, getting part 2 right is difficult, and the heavy risks of getting it wrong aren’t worth the minor benefits of inheritance.


> Inheritance is much less troublesome in, say, Smalltalk, since the language is dynamically typed. If someone expects you to implement Foo, you can (almost always) just implement its relevant methods without explicitly extending the class.

Sorry, I don't understand this sentence. Isn't inheritance simply a way to avoid writing duplicate code? If you write the code to implement methods, isn't that not inheritance anymore?


Inheritance conflates code reuse ("avoid writing duplicate code") with polymorphism (allowing for multiple different instances to implement the same interface). It also allows for trampolining method calls up and down a hierarchy (a method in a base class might call another method which might be overridden by another class in the hierarchy).

Outside of OOP, we use composition for reuse and interfaces for polymorphism, and we don't trampoline method calls up and down a hierarchy because it's (probably?) always a bad idea. When we really need reuse and polymorphism, we can use both composition and interfaces, since the two are correctly orthogonal.


> Inheritance conflates code reuse ("avoid writing duplicate code") with polymorphism (allowing for multiple different instances to implement the same interface).

Note that languages like C++ allow for inheritance without polymorphism, i.e. pure implementation inheritance.

However, I also think that composition should be preferred whenever possible.


What the grandparent post means is that in dynamic languages you can just implement one of the "base" methods yourself instead of inheriting from a class that's bigger than you need, in order to avoid problems. I personally don't have an opinion on that, but it's not something I'd do myself.

Also, like the sibling said, inheritance is a tool that does multiple things: code reuse, which we call implementation inheritance, being the one everyone hates (the age-old advice is to use composition for code reuse instead), and interface inheritance being the one everyone loves.


> In all my experience with OOP, it's always been inheritance that is the root of all evil.

I have this "theory" in the back of my head that trees are usually the wrong things to model thing in life but it's what come to us naturally. For example, a blog with categories and sub-catogories for articles (a tree, inheritence) can often describe the content better by using tags (a graph, composition). I think that's because trees are easy to deal with and understand, but graphs are more "open" with what you can do.


Data modelling in OOP is an exercise in coming up with Platonic ideals, resulting in a hierarchical (tree-like) ontology as you try to choose of the atributes as the categorisation dimension, and leaving everything else as properties.

    class Animal
    class Mammal inherits Animal
    class Feline inherits Mammal
    class Cat inherits Feline
    ...
This is different than just asserting facts with data, which can lie in multiple dimensions.

    Is Feline
    Is Mammal
    Is Fluffy
    Is White
    Does Meow
The later is a much more flexible data model as it more closely mimics observed (subjective) reality, and is less disturbed when a new (counter-)example is introduced, but is also harder to reason about than idealised categories.


To your point, there are programs that deal in ontologies. These are the only times that it makes sense to care about the relationship between things. For example, an ontology might have a concept of a city and it might know that Munich is a city. But this is all data, it isn't about "types". It never makes sense to write `class Munich extends City {}` for the purpose of your program. Rather, you might have:

    struct Entity {
        name: string,
        parent: Option<Entity>,
    }

    let city = Entity { name: "city", parent: None };
    let munich = Entity { name: "Munich", parent: Some(city) };
That said, if you really wanted to make life hard for yourself, you could use types as data provided your language has a runtime type system and reflection (you could dynamically generate `class City` and `class Munich extends City` when deserializing `[{name: "city", parent: null}, {name: "Munich", parent: "city"}]` or something). But this is the kind of Rube-Goldberg territory that "Kingdom of Nouns" thinking leads us toward.


> Data modelling in OOP is an exercise in coming up with Platonic ideals, resulting in a hierarchical (tree-like) ontology as you try to choose of the atributes as the categorisation dimension, and leaving everything else as properties.

Only if that is how you choose to model the problem domain.

> This is different than just asserting facts with data, which can lie in multiple dimensions.

The "Is ..." examples you detail can just as easily be modeled "in OOP" as:

  class Animal {
    private knownFacts = ...

    def is (fact) ...
  }
Without the need for "a hierarchical (tree-like) ontology", since obviously this would be a poor choice in this situation.


Not the same thing, as the facts are now a property of Animal. My second example doesn't even mention Animal. You still have the problem of "putting things into a category" vs. just asserting facts.


There is an implicit subject you are asserting facts about though. You clearly are talking about a fluffy, white Cat in your example. It’s essentially structural rather than nominal typing. The “fluffy, white Cat” is defined by its traits. We could define a type Cat which has a subset of those traits and then be able to use our “fluffy, white Cat” anywhere we can use a Cat. We only name them to avoid having to name all the traits all the time. Doesn’t make it any less object based.

Structural typing is really cool though. An object built from a named, saved recipe will work just as well as something cobbled together on the fly and at runtime you won’t even know which is which. It’s the basis of a lot of general purpose game engines composition based game object interface.

I’ve also found it extremely fun to use with TypeScript.


> There is an implicit subject you are asserting facts about though. You clearly are talking about a fluffy, white Cat in your example.

No, I'm not, because I didn't assert this fact (it's a Cat).

See how hard it is to break free from this mindset of objects.


Right as I said it’s structural typing. It’s implicit that the traits are grouped together to describe something. And individual traits could be grouped together with completely different ones to describe something else. That you choose not to name it doesn’t change that the trait collection applies to fluffy, white Cats whether you like that or not. I can even choose to call it one thing and you can choose to call it something else and the types will remain interchangeable. You can even leave your type anonymous and it will still be interchangeable.


Uh, no. Either of those are possible without leaving the OO paradigm, and only very poorly taught and inexperienced students model data as an object inheritance hierarchy.


If only.

Spoken like someone who has never seen any kind of representative sample subset of real world code...


No, the problem is that code by poorly taught and inexperienced developers is rife. Heck it’s on display throughout the threads on this topic; for maximum irony, often as an example of “why <concept> is bad”, the author not realising this merely telegraphs their own limitations.


If f(x) is implemented for white xs and separately for fluffy xs, how do you disambiguate f(x) for x that's both white and fluffy?


Hence traits?


And ECS is fundamentally a trait system!


I know all the terms are overloaded nowadays so everything’s kinda unclear but I always wished that ECS components had been called traits, because adding a component to an entity gives it some a trait like “this thing has a position in the world” or “this thing can be drawn” (and perhaps have systems named “behaviors”, because systems add behavior to entities based on the traits/components that they have)

Years ago, when ECS was just starting to be talked about (after Adam Matrins blog posts), I wrote a toy ECS where I used that naming convention. Nowadays I stick to the mainstream terminology since that’s what other people know.


They have, but there is a certain tendency to ignore CS literature.

"Component Software: Beyond Object-Oriented Programming"

https://www.amazon.com/Component-Software-Beyond-Object-Orie...

1st edition, 1998


Yeah I've commented before that "entity component system" is roughly synonymous with "thing piece thing" or something like that - it's a really bad name because it's so ambiguous. Anything including the term trait would be 1000x better because at least "trait" means something.


I think it's named ECS because before then entity+component designs were common in games. ECS generally took the behavior off the C and puts it in the S.


There's a great old (2005) blog post on this topic:

Clay Shirky - Ontology is Overrated: Categories, Links, and Tags

https://web.archive.org/web/20191117162526/http://shirky.com...


Folks may enjoy some of these old posts too.

https://en.m.wikipedia.org/wiki/Pyrrhonism

For example, putting regions under a cold climates category, might not make sense to someone living near the pole where they would consider the same regions to be warm climates.


Thanks, that was a great read. Tangentially related, but I wonder what's the impact of Windows having a really bad search. Maybe people are thus relying more on the folder hierarchies, and that influences how they think?


Humans have a natural tendency to refine a single idea by splitting it into two based on a differentiating factor. This ends up looking like a tree when applied repeatedly. Dichotomous keys for species identification are another example of this.


The problem being that they don't categorize things into one tree, but many. One can view the same thing in different ways. OOP tree hierarchies do not allow that.

It's like trying to categorize your photos in a directory tree. Do you categorize by year first, by person, or location? There is no correct answer. What people want instead is a photo album with tags. The same problem applies to OOP.


> It's like trying to categorize your photos in a directory tree

A problem that made me think about tags instead of categories was precisely that: I have photos that I want to organize. I started by organizing them by person with a folder for each person. But how do I handle a photo where multiple people are in it? Tags don't have this problem. Unfortunately file systems don't support tags.


> What people want instead is a photo album with tags

What you really want is hierarchal tags :)


No, no. We want tags. Later we want hierarchal tags (with some complex workarounds to maintain some legacy thing included in the implementation).


  Silly monkeys
  Give them thumbs, they forge a blade
  And where there's one they're bound to divide it
  Right in two
SCNR :)

For everyone that doesn't know the text: I recommend listening to Tool's "Right in two". Although the text originally talks about war and strive, not programming. ;)


Oh man, I have a rabbit hole for you.

https://en.wikipedia.org/wiki/Rhizome_(philosophy)


Here's a near-decade old talk from me on this exact topic: https://www.youtube.com/watch?v=YfKAScYkGlk

I haven't watched this in like... a long time, so maybe I'd think it's bad now.


I recommend Manuel de Landa (stylizes his name to Delanda these days), particularly “Intensive Science and Virtual Philosophy” if your flavor is Anglo style analytical philosophy.

The relevant Deleuze texts (A Thousand Plateaus) can be infuriating if you’re not open to this whole other style of thinking, but Deleuze is no postmodern, he’s a realist and a materialist and sort of a science worshipper, albeit from an angle that would make Neil deGrasse Tyson start bleeding from his nose until he passed out, if he ever grokked it. Start with Delanda, probably.

[googling a little, this isn’t great but isn’t bad for something readily accessible: http://dar.aucegypt.edu/bitstream/handle/10526/3534/DeLanda%... ]


Heck yeah, in full agreement with all of that.


A tree is a graph where all vertices have exactly one path through them.

Trees are often implemented using composition, and inheritance can be graphical (the Diamond Problem is not game over).


    A
   / \
  B   C
 / \ / \
 D E F G
This is clearly a tree, but doesn't B have 2 paths through it? A->B->D and A->B->E.


You're right. The correct definition is that any two vertices are connected by (i.e. are the endpoints of) exactly one path.

Fun fact -- since graph-theory trees are undirected by definition, an inheritance graph is more properly called an arborescence (for single inheritance). For multiple inheritance it's a DAG (with diamond-pattern) or a directed tree (without).


> any two vertices are connected by (i.e. are the endpoints of) exactly one path

No.

You just described a connected acyclic graph, not a tree.

In addition to being connected and acyclic, a tree must also have a root, and is thus implicitly directed.


> I have this "theory" in the back of my head that trees are usually the wrong things to model thing in life but it's what come to us naturally.

Have you by any chance read the relevant passages in SICP? It has some things to say about OOP ontologies.


I don't think I did, I didn't finish the first chapter of SICP. I'm sure that I'm not the first one to come up with this though.


Could you link this?


https://mitpress.mit.edu/sites/default/files/sicp/full-text/...

Possibly that section. It's not about OOP specifically, but about type hierarchies generally.


You might enjoy this essay from 1965, "A City is Not a Tree": https://www.patternlanguage.com/archive/cityisnotatree.html


Except that trees are by definition graphs with specific conditions on direction and cycles.


Perhaps it is those very conditions which make trees less useful than they at first appear.


I would argue that its those restrictions that make trees a useful simplification, but also a simplification


In general yes, but in certain case they force you to simplify in a way that's later painful.


I've previously suggested that initiation is the "root of all evil." See my essay:

Object Oriented Programming Is An Expensive Disaster Which Must End:

http://www.smashcompany.com/technology/object-oriented-progr...


Funny to see the author of one of my favourite blog posts on the internet get downvoted.


Thank you. I believe some people downvote it based on the title, rather than the argument, but I believe the title is also accurate.


The weirdest thing is that the ECS as a way of building a game is inherently object oriented. You take a set of components and compose an object called an entity. The components on the entity define not only it's data but also it's behavior by the set of systems that act on the corresponding components. And you can take these object definitions and inherit them to add additional behavior or change the existing behavior by adding more components to the new definition.

Then if you solve the entity communication conundrum with message passing and don't allow entities to directly access one another's data you basically have all the elements.


> You take a set of components and compose an object called an entity.

That’s an overly broad definition of “object”, since under that same definition a record type (C struct) or any other blob of memory is an object.

In the common type of “components only store data” ECS, the entity is an ID (think a foreign key) that connects multiple records together and systems are independent functions (they are not tied to nor live in an entity) that operate on collections of subsets of these components.

That sounds a lot more like old school C-like procedural programming to me than it does like OOP. There’s more to OOP than the data attributes a class contains (eg the associated methods)

I suppose it depends on your game engine and your ECS, but since entities don’t contain logic, it’s the systems that communicate between each other (either by sending messages or by accessing the other entities components or by just calling functions of other systems). This isn’t all that different from different parts of a procedural program communicating. Although I do personally think that making a system be an OOP object does makes sense, but it doesn’t have to be.

With that said, it seems pretty common in games to use a component system that isn’t “pure ECS” (like the default Unity components prior to their new ECS), which definitely seems like typical OOP to me, just decomposed a bit more.


> That’s an overly broad definition of “object”, since under that same definition a record type (C struct) or any other blob of memory is an object.

I think that's because you seem to have stopped at the second sentence the rest is important as well. I'm also talking about a level above the ECS implementation. What is the running thing actually doing.

> With that said, it seems pretty common in games to use a component system that isn’t “pure ECS” (like the default Unity components prior to their new ECS), which definitely seems like typical OOP to me, just decomposed a bit more.

Yes this also models much the same thing at runtime.


How is the running thing operating on components any different from functions in a purely procedural language like C operating on records/structs?

> Yes this also models much the same thing at runtime.

In a different way, though. It also generally misses out on the data-oriented benefits of an ECS.


It's the conceptual organization. The Entity is defined by data (components) that bring along behavior (systems). So an entity executing at runtime (say you're making Pacman and it's the Red Ghost) is an object and is defined by the combination of data and behavior.

The underlying implementation is irrelevant basically. You could implement the ECS in an OOP style and the same it true. You could do it in a functional style and it would be true. You could do it in straight bytecode for some obscure hobby VM and it would be true.


Unlike traditional OOP, the data and behavior are decoupled though. Similar to data and functions.

That is, you can add components that don’t get operated on by any particular systems because the entity doesn’t have the other prerequisite components and you can have systems that don’t operate on the components. You can have many systems operate on one particular component and many components operated on by a system.

In OOP, the data and the operations are packaged to whether as one. You also typically have encapsulation and it’s considered bad practice for one class to operate on another classes data directly.

It seems that both models achieve similar things, but they’re far from the same thing. Just like how procedural or functional programming achieve similar things to OOP, and you can do OOP in these paradigms or these paradigms in OOP. There’s a lot of cross over, but that doesn’t make them all the same thing.

If anything, I’d say that ECS are a relational model but with a very limited query system compared to something like SQL.


> Unlike traditional OOP, the data and behavior are decoupled though. Similar to data and functions.

Except the data and behavior aren't decoupled. The components are decoupled from the systems, but the systems are still very much dependent on the components. Just like a method is usually dependent on the instances data or a function is dependent on the data passed in.

> That is, you can add components that don’t get operated on by any particular systems because the entity doesn’t have the other prerequisite components and you can have systems that don’t operate on the components. You can have many systems operate on one particular component and many components operated on by a system.

You can have a member that isn't operated on by any methods and methods that don't operate on members.

At the level your talking about there isn't much difference between a function and a method. It's mostly syntax.

method(instancedata);

verses

instancedata.method();

Really we're getting caught up in implementation details because a class definition isn't the be all of how to define an object. There is really no reason we couldn't define objects in a programming language through composition.

ECS very much is a relational model and you're right it's very limited in comparison to things like SQL because it's trying to model something very simple. Game Objects! The relations defined are exactly what brings data and behavior together under to create the runtime object we call an Entity under the pattern conventions.


> Just like a method is usually dependent on the instances data or a function is dependent on the data passed in.

Just like a C function operating on a C struct. So, what, in your opinion, is the difference between procedural programming and OOP?

> It's mostly syntax.

Which is why I think there is more to OOP than a classes attributes and it’s methods. There is also inheritance, encapsulation levels, the fact that an objects identity is its attributes (the object is its data, an entity has its components but is separate from them), the fact that an object is a singular thing which it’s methods operate on (as opposed to how systems operate on collections of components, imagine a class system where a method operated on all instances of that class!).

Sure at the end of the day it’s all the same and we’re just arguing semantics, but that was my point and what I lead with: it’s an overly broad definition. If definitions are too broad then they really don’t add any value, but I believe a distinction between OOP and ECS is useful because they are used in different ways.

But fundamentally I don’t disagree, I even once wrote a blog post about how all of the OOP principles exist in an ECS! I just don’t believe that thinking of them as slightly different implantations of OOP is useful because of how their properties differ.


I actually wrote way back at the start about encapsulation and inheritance (along with message passing). So I'm not sure my definition really is overly broad.

I'm also mostly talking about the runtime consequences of the things that most people worry about at the time of programming.

But thanks for making me defend my thought!


> The components on the entity define not only it's data but also it's behavior by the set of systems that act on the corresponding components. And you can take these object definitions and inherit them to add additional behavior or change the existing behavior by adding more components to the new definition.

This seems to miss what ECS actually is, unless you're just referring to the old-school way of doing entity components and not the data-oriented way.

Data-oriented ECS way of doing things is to separate state and behaviour. Entity components essentially become structs where their only behaviour is potentially some getter/setter utilities.

Behaviours are then state-less systems (just functions, essentially) which act on a set of components.

For example, a PhysicsUpdateBehaviour might take in a RigidBodyComponent and a HealthComponent to perform a physics update and apply physics/fall based damage.

The main benefit of ECS (imo) isn't even really performance. It makes code in complicated game projects much easier to manage by clarifying the game loop and by making it much more obvious how and when entity state is being modified.

It's the kind of thing that potentially complicates a smaller project, but makes larger more complex projects easier to manage.

This Overwatch GDC talk is the best breakdown/example of data-oriented ECS in a AAA game that I know of: https://www.youtube.com/watch?v=W3aieHjyNvw


I know what an ECS is. Components are decoupled from systems (not not visa versa) but the actual behavior of an entity is defined by the set of systems that run on the set of components so in that sense the set of components defined what the Entity is including it's behavior. An Entity is defined in terms of it's data and it's data brings along behavior.


Sure ECS is object oriented in the same way C99 is. Yeah, technically you are building up some OOP functionality, the same way you emulate constructors and instance methods in C by making functions to init data structures and functions that take references to a struct to modify it's data. That doesn't make C object oriented.

In ECS you are decoupling data from behavior, which is basically the entire paradigm of languages like rust and go. You could argue that by defining systems in a way that they run on certain components you are defining behavior and data in one, but I think that's a stretch.

It clearly differs from OOP when I have 2 entities with components that have overlapping and non-overlapping systems. If e1 has components c1 and c2, and e2 has components c2 and c3, and c1 and c2 are used in system s1 while c2 and c3 are used in system s2, I don't see how you would model that with OOP without adding data to classes that don't need it. In OOP both e1 and e2 would need all the logic from s1 and s2, or needlessly specialized versions of s1 and s2. Which would be solved via inheritance (either class based or interface based).

In ECS your data exists in an array of components and any part of your program can operate on any component however it wants. I've never needed message passing for anything I've worked on.

That's not to mention that the main benefits of ECS have nothing to do with language paradigm. ECS main advantage is cache coherency and easier parallelism.


You're too worried about the underlying implementation. Think a bit more about the runtime expression in terms of the resulting Entities and how the set of components linked to them defines data and behavior and what you could do conceptually to extend that.

> That's not to mention that the main benefits of ECS have nothing to do with language paradigm. ECS main advantage is cache coherency and easier parallelism.

ECS is not data-oriented by default. :)

I have a long post explaining things in this thread here: https://news.ycombinator.com/item?id=27663218


My comment has nothing to do with implementation. We're talking about ECS in the context of DoD so I'm not sure what relevance being data-oriented by default has. It seems like you're just confusing the concepts of ECS, DoD, and composition.

Your entire post on ECS is, "If you don't use DoD with ECS than you're not using DoD". Well yeah.. obviously? If you implement an ECS and then use it without data-oriented structures, then yes obviously you don't have data-oriented design.

You're creating a strawman. You're saying if you take ECS, remove the idea of storing components independently of entities, and pass them in an inefficient manner to systems, then you don't have DoD. ECS isn't inherently DoD, literally nothing is. Arrays aren't inherently cache friendly. There's nothing stopping you from making a language that allocates data randomly throughout reserved memory and every array element points to each location. No one is arguing ECS is inherently DoD, but it is a good design to facilitate DoD.

> For example we might want to do damage to another entity entirely.

Add a damaged component to the entity to damage. Consume damage component in a system.

> Or we might want to look up the properties of the piece of ground we're stood on

Use a position component on the entity standing on the ground. Consume the position in a system and look at the properties of the terrain map at that position. Even simpler for grid based maps.

> We're also ignoring interacting with other components or the world and how that might work

You interact with other components by defining interactions in systems based on those components.

You've created an ECS in a way that doesn't take advantage of any of the benefits, and complaining that all you're left with is the disadvantages.


The approach I talk about with archetypes is the one used by Unity and many open source ECS implementations. It’s a pretty standard way to solve the issue.


From a distant point of view, everything is OOP. You can treat anything like a black box that you push button on to make things. You push things on your keyboard, without knowing how it works. Your keyboards activate things on your computer, without knowing how it works. The computer ask the screen to update with the new date, without knowing how it works.

From a distant point of view, everything is data oriented. Your thought are transformed into keyboards presses by the keyboards, that are transformed into events by your computer, that is transformed into what you see by your screen.

I could do the same with a frozen pizza factory: you can see the ingredients flow in the machines (functions), or you can see the different machines passing things to others like objects. The problem is that then the classification between "OOP" and "non-OOP" doesn't mean anything anymore and is now useless.


This isn’t a distant point of view though. I make games every day professionally and think a lot about how to make making them more accessible. Both are part of my job. People building them shouldn’t just consider how an ECS looks under the hood but how it works for someone using it which is as a system for building runtime objects. Particularly if you look a little deeper than the popular Internet view of the basics to what an actual usable implementation looks like.


ECS is really just OOP with dynamic multiple inheritance: an object can inherit from multiple base "classes" (with "components" providing the data and "systems" providing the code) and this inheritance structure can be changed at runtime, by adding/removing components. Everything else (struct-of-arrays vs. array-of-structs) is just low-level implementation details.

When I implemented a variation on ECS for a game I'm building, I did exactly as you suggest, re, message passing: components receive and respond to messages but their implementation is hidden.


> ECS is really just OOP with dynamic multiple inheritance

It's composition, not inheritance.


Technically, it's neither. Unless your programming language directly supports ECS, you're not going to be implementing the relationship between entities and components as either proper inheritance or a collection of data members, because neither of those can be changed dynamically.


No, "technically" and in all other regards, it's composition, plain and simple.

There's nothing special about composition that it requires language support, or that it has to be static for it to be considered composition. You can implement dynamic composition in OOP simply by having a List of components, and that's how many games that don't use ECS did and still do. Composition has absolutely nothing to do with inheritance or with requiring static data members.

Unlike you're claiming, the relationship between entities and components definitely does exist in ECS, just not in an OOP way, because ECS is not OOP (even though ECS can be implemented in any language).


Inheritance is one way of describing it but I don't think the term really fits. An object is composed of multiple components, and the composition can be changed at runtime. Saying that the object inherits from multiple base "classes" seems like it just makes the concept less clear.

Some languages have class-based systems with inheritance: one class inherits from another, and methods implemented in the superclass can be used in the subclass. Some languages have prototype-based systems with inheritance: one object inherits from another, and methods implemented in the prototype can be used in the object.

Component-based systems don't really fit my mental model of inheritance here.


ECS (which I have not used) sounds a lot like Traits. The name and core concepts for Traits were defined in 2003 in an ECOOP paper [1]. I think traits were first implemented by Squeak Smalltalk in 2005.

[1] http://scg.unibe.ch/archive/papers/Scha03aTraits.pdf


Very similar although in an ECS the relationship is backwards, Entities get behaviour based on what data they contain rather than getting behaviour from traits and needing to add state to make them work.

The ECS approach can lead to some confusing things like adding a component to an Entity and having strange behaviour result as a system the programmer didn’t expect to be triggered is run. This can lead to systems having quite complex definitions based not just on the components the system needs to run but also on the components that shouldn’t be present and so on.


Like anything, inheritance can be used poorly, just as anyone can right poorly encapsulated code in Rust or Go. You might be able to convince me that inheritance is too dangerous for idiots, but then so is a computer, and we'd be debating where to draw the line of how smart/experienced you have to be to use it safely.

This article from Noel, and the ones from Mike he links to, get under the hood and into "what is the compiler doing" and "what is the CPU doing". Down here, we're looking at how to use the features of whatever language we're using to get the results we want, rather than "how should i program oop gud".


It's the dose that makes the poison. Of course we want nice typed collection libraries and interfaces. The kids go overboard though.


Go actually does implement inheritance, albeit in a roundabout sort of way: a struct can have one or more base members, and any method defined on the base members is accessible from the new struct implicitly, so they also implicitly implement any interface that was implemented by their base members.

Here's an example (in the playground, because it gets a bit long): https://play.golang.org/p/TblQypAbIL2


That's not inheritance, it's just syntax sugar that lets a struct delegate method calls to one of its members. In other words:

    struct Foo {}

    func (f *Foo) baz() { println("Foo.baz()") }
    func (f *Foo) qux() { f.baz() }
Given the above, `struct Bar { Foo }` is the same as:

    struct Bar { Foo Foo } // field called `Foo` of type `Foo`

    func (b Bar) baz() { b.Foo.baz() }
    func (b Bar) qux() { b.Foo.qux() }
Since it's just syntax sugar and not inheritance, we can't put a `Bar` in a list of `Foo`s nor can we pass a `Bar` into a function that expects a `Foo`. It also means that if Bar overrides its `baz()` method like so:

    func (b Bar) baz() { println("Bar.baz()") }
that calling `Bar.qux()` will still print "Foo.baz" and not "Bar.baz" (most languages with inheritance will print "Bar.baz", which is to say methods are virtual by default).


Using interfaces, you easily get polymorphism as well:

  type FooI interface{ baz(); qux(); }
  foos := []FooI{&Foo{}, &Bar{}}
Regarding overriding, you're right that it doesn't work out of the box. However, you can make "extensible classes" with just a little boilerplate:

  type Foo struct {FooI this}
  func (f *Foo) baz() { println("Foo.baz()") }
  func (f *Foo) qux() { f.this.baz() }
  func NewFoo() Foo { f := Foo{}; f.this = &f; return f }
Now, to extend Foo:

  type Bar struct {Foo}
  func (b *Bar) baz() { println("Bar.baz()") } //the override
  func NewBar() Bar { b := Bar{}; b.Foo.this = &b; return b } //this would work even if we didn't override baz

  func main() {
    foo := NewFoo()
    bar := NewBar()
    foos := []FooI{&foo, &bar}
    for _,f := range foos {
      f.qux()
    } //prints Foo.baz(), then Bar.baz()
  }
Since best practice even in C++ or C# or Java is to only allow inheritance for classes that are designed with it in mind, and since Go anyway has lots of other boilerplate, this shouldn't be unbearable if required.

Playground link for anyone curious: https://play.golang.org/p/SKGhANuBGgB


yeah, you absolutely can emulate this stuff to a large degree. My point wasn't that it's impossible, but rather you have to build it from orthogonal primitives. And even then I don't think you can get the same degree of trampolining that you can get with inheritance (for example, we can get Bar.qux() to call Foo.baz() easily enough via interfaces, but then IIRC it's trickier to get Foo.baz() to call Bar.asdf()--that said I'm too busy to think it through properly).


I think that it can work. IMO ActiveRecord is a perfect use of inheritance. You get tons of useful functionality out of the box, you don't have to worry about what that code looks like, and it's easy to extend or modify it. But often when I see co-workers come up with their own hierarchies, it saves maybe a couple of lines of code and makes it 5x more difficult to read, since you're jumping between parent and child classes and trying to keep track of the order of execution.


I don't agree with this, and I'm personally an anti-OOP militant.

Inheritance isn't the root of all evil, dynamic dispatch is. It's a remarkably powerful implementation detail but one with enormous cost, regardless of whether you're using an AoT/JIT compiled or interpreted language.


What enormous cost — slightly slower function calls? That's a pretty minor cost, and dynamic dispatch can be extremely useful sometimes.


Enormously slower function calls and zero inlining without advanced dynamic compilation, which has a number of pitfalls.


As opposed to passing function pointers everywhere, callback hell and friends? Or which language does it well in your opinion? The fact is, dynamic dispatch is needed, because not everything can be known at compile time. And as in a recent thread a HNer rightly noted (could not find it where I read it), the actually expensive thing in programming is flexibility.


Devirtualization optimizations can turn semantically "dynamic" dispatches into static dispatches, but sometimes you really just need a dynamic dispatch. Note that a dynamic dispatch doesn't have to be anything more than a branch. Further, sometimes devirtualizing everything leads to enormous binaries and compile times. Runtime performance isn't everything, and it's typically better to opt-into devirtualization rather than to opt-out of it.


You must have absurd standards if you find dynamic dispatch to be unacceptably slow. Also, yeah, not every function call needs to be inlined. One level of indirection on top of jumping into a new function isn't really much overhead at all, unless you're doing it for literally every function call.


> Inheritance isn't the root of all evil, dynamic dispatch is. It's a remarkably powerful implementation detail but one with enormous cost ...

Dynamic dispatching typically costs one pointer lookup in a vtable[0]. By "typically", I specifically mean "in any production quality run-time environment." This is not an "enormous cost" by any reasonable definition.

0 - https://en.wikipedia.org/wiki/Virtual_method_table


Disagree. Go and Rust both have dynamic dispatch and neither have the problems that inheritance has. Even in C which lacks dynamic dispatch, people will either try to build it at the expense of type safety or they will try to manage an impossibly complex implicit state machine (I've seen this in a lot of critical real time systems).


I would wager that in most ordinary C programs it is worse. You can’t even reason about code anymore with all the callbacks.


There is some middle ground between data-oriented design and OOP: just organize your objects in such a way that:

a) objects of the same type occupy continuous blocks in memory,

b) messages are passed to objects of the same type, then to objects of another type etc.

In this way, you don't lose the advantages of encapsulation, inheritance, polymorphism etc but you also don't sacrifice cache coherence much.

OOP does not enforce a 'random' memory access order, you can very will organize your objects in such a way that speed is not sacrificed much.


This is kinda what Entity Component Systems do - they implement in-memory relational database for game objects, handle dependenceis and allow your game logic code to run efficiently over them while still keeping the pretense of OOP :)

Why pretense? Because behaviors (Systems in ECS terms) are completely separated from data (Components) and data for different game objects (Entities) is kept together in regular or sparse arrays.

Encapsulation is nowhere to be seen, code is written to specify the components it depends on and run on these arrays.

ECS is very fashionable in gamedev lately as it allows for efficient multithreading, explicit depencencies for each subsystem, cache locality and trivial (de)serialization. Used together with handles (tagged indexes instead of direct pointers) it reduces likelihood of dangling pointers and other memory management bugs.


ECS ist Standard for enterprise web apps as well


I have seen some enterprise web apps, but they never used ECS. Can you please share more details about your experience?


May be referring to the common 3 layer architectures (see Fowler's PoEAA) which map closely to ECS:

Top layer is for "frontend", whatever that means for the product (UI, sound, simulation, etc.), the stuff with side effects. "Systems".

Middle layer is purely functional, for business/domain logic AKA utility functions. The most liquid layer, but should not be confused as trivial.

Bottom layer is where state (or a way to access & modify it) lives. Data access layer, component layer, etc.


I don't think they really map that closely... But you might be right. Thanks.


I am curious as to what you are referring to. Are you thinking of redux-like architectures?


> objects of the same type occupy continuous blocks in memory,

Depending on the language, a single object may have a lot of overhead that adds up in an array. What you often see is one ArrayObject with arrays of properties, kind of like a transposition.

A problem there is that in memory the arrays are of course laid out one after the other, which actually destroys cache locality if you need to access more than 1 property inside a loop (it will need to load back and forth to the different property arrays), so it's a somewhat dumb approach. But, at least it saves the overhead, so maybe not too bad. And in a high level interpreted language like php you likely weren't gonna get cache locality anyway.

The point is to group all properties you are going to be accessing in a hot loop together in a small-ish array.

C has structs for this, 0 overhead "entities" (although they may be padded to multiples of 4 bytes, so keep that in mind). You have compiler specific keywords to forego padding ("struct packing"), or maybe you're lucky and the data just fits exactly right. Either way, in such cases an array of structs is imo the most sane way to go.

In fact, C++ offers classes and structs. In my opinion, struct should be used for entities like "weapon" or "car". CLASSES (or objects) should be unix-philosophy adhering miniprograms that do one task and do it well (oh hey, it's the single responsibility principle!).

They way most programmers write OOP is a pretty convoluted way to model actual entities anyway. car.drive()? Oh? The car drives it self? No. agent.drive(car) should be the actual method. Agent, mind you, can be a driving AI, or a human driver, or whatever. Maybe the agent is a part of the car? In that case, use composition, not inheritance. (oh hey, entity component system!)


> A problem there is that in memory the arrays are of course laid out one after the other, which actually destroys cache locality if you need to access more than 1 property inside a loop (it will need to load back and forth to the different property arrays), so it's a somewhat dumb approach.

Caches are perfectly capable of dealing with more than one stream of data (there are some very specific edge cases you may have to consider), accessing multiple arrays linearly in a loop is generally more efficient than accessing a single array of structs when you don't use almost all the struct elements.


I've seen a lot of ECS implementations that store components in hash maps, keyed by entity ID. They iterate over one hash map in a linear way which is fast, but then they do a bunch of slow lookups like GP is saying.

In those situations, GP's suggestions are wise.

If you can iterate over arrays in parallel like you say, that's also a good approach.


There are tradeofs between regular arrays, sparse arrays and hash maps in ECS - it's very similar in concept to storage hints in relational databases, and similarly to relational databases you can add indexes if needed.


> They iterate over one hash map in a linear way which is fast, but then they do a bunch of slow lookups like GP is saying.

Usually all but the most simple "systems" will need to access more than one component, which means you have a choice between a) store component data in regular arrays (and potentially waste huge amounts of space if relatively few entities have those components) or b) store component data in some kind of hash table (and then you use cache locality for all but the "primary" component of a system).


> In fact, C++ offers classes and structs. In my opinion, struct should be used for entities like "weapon" or "car". CLASSES (or objects) should be unix-philosophy adhering miniprograms that do one task and do it well (oh hey, it's the single responsibility principle!).

Please no. The only thing that matters is the language rules ; any non-computer-encodable arbitrary rule like this on top of the language rules just causes an additional lava layer.

There is one difference between class and struct and it's default visibility. Use one or the other according to which causes less tokens to appear in your code


Fair point


OOP doesn't force you do do car.drive().

You can have an abstract agent class/interface with virtual "drive(Car c)" method. The method would be overriden by AIAgent, HumanAgent etc.

The car itself would have more basic behavior, such as "accelerate()", "turnLeft()", "turnRight()"


> You can have an abstract agent class/interface with virtual "drive(Car c)" method. The method would be overriden by AIAgent, HumanAgent etc.

Is that a convoluted way of saying "use multiple dispatch", or am I reading it wrong?


I think you are over engineering it. The way I imagine it:

Car {float throttle, float brake, float wheel} inherits PhysicalObject {velocity, mass, position}

Agent.accelerate(Car c){ c.throttle++; }

Agent.drive(Car c){ ... accelerate(c); ... }

Human inherits Agent


Interface is the best way to do a clean job in this case.


Kill me now


> A problem there is that in memory the arrays are of course laid out one after the other, which actually destroys cache locality if you need to access more than 1 property inside a loop (it will need to load back and forth to the different property arrays), so it's a somewhat dumb approach.

This is actually why memory layout != DoD. You need to account for this in the architecture of the program, so that the systems only operate on a small amount of data that are relevant to them at one time.

The tradeoff is paying for all the data, all the time, and some of the data most of the time. For a large class of programs that can be architected around mostly non-unique, trivially copyable fields with few relations, the tradeoff between AoS and SoAs is obvious.

For other programs where your entities need relational information and form trees or graphs, it can be less obvious whether the data representation is going to be faster. However in these cases you store the relationship as your data (for example, as an adjacency matrix), but implementing any kind of textbook algorithm over it is basically reverse engineering pointers with indexes.


> agent.drive(car) should be the actual method

That's equally object-oriented so I don't see how OOP is a nonsensical way to model. A sensible model is up to the designer, not OOP.


The reason OOP kills cache locality and multithreading opportunities is walking the object graph depth-first through nested method calls and pointers.

Doesn't matter if it's car.drive() referencing driver through a private pointer or driver.drive(Car c) calling methods on Car through the provided parameter - in both cases you will jump from class Car to Driver and back and then again for the next car and the next driver.

In real life the callstack is rarely 2-levels deep - I've seen stacktraces that had hundreds of levels. So your code will jump 100 levels down then 10 levels up then another 10 levels down, and so on, and then finally back up through 100 levels of nesting only to advance to the next top-level object and do the whole ceremony again for each of them :) It boggles the mind when you think about it :)

When the object graph is big enough and doesn't completely fit in cache this slows the code by orders of magnitude each time you jump through the border.

And because dependencies are implicit and execution order is accidental (and programmer doesn't actually know what other execution orders would be correct) - you cannot easily parallelize that code.

The alternative is to specify the dependencies explicitly, split the data according to functions that use it not according to metaphysical Classes where it belongs and walk the data graph in levels - starting from the level that doesn't depend on any other code being run, completing the level first then going to the level that now has all dependencies satisfied, and so on.

Of course there might be cycles that require special treatment, but at least they are explicit so you won't introduce them unless you actually have to.

End result is basically "relational programming". In case of gamedev it's called Entity Component System.


The point I'm trying to make is that OO design principles are one thing, how these are implemented by various systems or languages is quite another.

OOP does not kill cache locality for the simple reason that these are orthogonal concepts.

> split the data according to functions that use it not according to metaphysical Classes where it belongs

Well, of course and as mentioned, picking the best model, and thus the best object representation, for the job is paramount. I read "split the data according to functions that use it" as "come up with objects that make the most sense for what you're trying to achieve", not "forget about OO design".


You're right. It's not OOP, but rather the way most tutorials teach it (and most programmers write it).


You might be looking for "Entity - Component - System" design, common in video games. Entities are still virtual-world objects like you might expect, but none of them would dare keep track of something like their position or temperature or whatever. Instead, they register a component with the appropriate system, which keeps all the data colocated for efficient physics and the like.


If we are speaking of C code, it's not quite so bad as it looks to have somewhat fat structs across multiple arrays, since you can fit 64 bytes in a cache line on contemporary desktop CPUs, and that sets your real max-unit-size; the CPU is actively trying to keep the line hot and it does so (in the average case) by speculating that you're going to fetch the next index of the array. Since you have multiple cache lines, you can keep multiple arrays hot at the same time, it's just a matter of keeping it easy to predict fetching behavior by using simple loops that don't jump around...which leads to the pattern parent suggests, of cascading messages or buffers in groups of same type so that you get a few big iterations out of the way, and then a much smaller number of indirected accesses.


If you loose vectorization, you might be loosing a 4x, 8x, 16, ... 32x perf difference by organizing your data in such a way that memory operations and data manipulation can't be vectorized.


But you usually can't achieve vectorization by just simply changing your data layout, the compiler's auto-vectorization features usually doesn't work that well. SOA or AOSOA layout for vectorization only becomes important when you begin to explicitly write SIMD code in intrinsics or pure assembly.

And explicitly writing in SIMD is quite a hard feat in itself: it's okay when you're accelerating small, simple, and isolated algorithms in hot-code paths, but when you're doing much more complex calculations the time you need to invest in it to make it work goes out of hand pretty quickly.


> But you usually can't achieve vectorization by just simply changing your data layout, the compiler's auto-vectorization features usually doesn't work that well.

Please don't build a straw man.

You (or the compiler) can't achieve vectorization if you have the wrong data layout. Period.

How easy / hard is for you or the compiler to vectorize something depends on the application.

It can "just work", it might require a one line `pragma simd`, it might require you to use portable `std::simd` types by hands, or use SIMD intrinsics, or write assembly manually.

But none of these are options if you have the wrong data layout.


When you say vectorize, are you referring to loop unrolling? Or SIMD or something?


I have never heard vectorization to refer to anything other than SIMD. Loop unrolling is usually only a useful technique to enable SIMD, as far as I know (at least on modern processors, where branch prediction has greatly decreased the cost of jump instructions).


> as far as I know (at least on modern processors, where branch prediction has greatly decreased the cost of jump instructions).

What about ILP? Can't that benefit from an unrolled loop in some cases? For example if there's a fairly long dependency chain but you might still be able to go through two loop bodies at once instead.


I don't know for sure at all, but I don't think it's impossible that speculative execution could also achieve the same at the processor level.


I don't see how this has anything to do with speculation? In most cases where you care about this you don't have to speculate if all the loop iterations are needed. For example in matrix multiplication all of those iterations will be needed.


What I'm thinking is that the processor has an instruction stream that looks like this:

  loop: 
    instr_1
    instr_2
    ...
    instr_n 
    jcond loop
Now, assuming the loop is not unrolled, it would need to speculate that `jcond loop` will jump to be able to execute 2 copies of instr_1 in parallel - I'm saying that it may be able to do that, though I am by no means sure.


Oh, I see what you mean -- I was talking (and thinking) about the unrolled version so it didn't make sense how speculation could help there. But I imagine that typically the kind of long chains that you might want to do in parallel in a single basic block are perhaps something that wouldn't get executed that far after a branch, if the only purpose is to not waste time after a branch misprediction. Plus from what I understand you'd still be wasting execution units here, just not by idling them but rather by speculating the "I'm done" branch repeatedly.

EDIT: I just found that the idea that I had in my head actually exists and is called "modulo scheduling".


I find in simulation codes that lack of awareness of (a) is an absolute performance killer. Generally, it's better to use a pattern for an object that's a container for something - so don't have a 'Particle' object but a 'Particles' one that keeps things stores the properties of particles contiguously. In my old magnetics research area you have at least 8 and more frequently 10+ spatially varying parameters in double precision that you'd potentially need to store per particle/cell.


Quite so. There’s a false equivalence in this article between data and encapsulated state, but if that were so then the flyweight pattern and its ilk couldn’t exist.


Only in C++. Most other OOP languages do not allow controlling allocation that way.

Also, OOP only allows array-of-structs continuous data. Struct-of-arrays and hybrid forms are usually awkward or impossible. And with everything except maybe C++ and Rust, those "structs" in OOP-land do have quite an overhead compared to C structs.


OOP does not say anything about memory allocation.

OO principles are one thing, what specific languages do is quite another.


There are no real OO principles. Ask ten people and you will get ten different answers. OO is defined by the languages and tools claiming to implement it, and the set of principles derived from those is inconsistent and contradictory.


I think that, while most people can't really articulate this well enough, there is a pretty good common understanding of what style of programming is OO: it's a style of programming where code is quite deeply tied to data, especially modifications of persistent state (encapsulation), and where subtyping is commonly used to model program behavior (interfaces, inheritance, virtual dispatch, polymorphism).

This would mostly contrast with procedural code, where code and data are much more separate - procedures often manipulate and pass around complex data structures -, and subtyping is not commonly used for program behavior; instead, flow control is usually explicit (e.g. switch()'ing on an enum value).

It is also commonly contrasted to Functional Programming, where data is also loosely tied to code, with functions often reading (but usually not modifying) deep parts of complex data structures; and where higher order functions and sum types are used to achieve dynamic dispatch.


It's obviously not so.

There are OO principles, which are indeed well-known, and each OO language has its own take on how to implement them.

It's not even needed to use an OO language to follow OO design principles. My day job is pure C and we follow OO principles as much as practical.


You conspicuously don't actually name any OO principles. If you did I'm sure we could find "OO" languages that don't conform to them.

My personal definition of OO has been backed down to directly connecting some concept of "method" to a data structure, and some form of polymorphism of those methods depending on what data structure you pass in to some function/method. You may note this is incredibly weak, but it does have the virtue of usefully distinguishing between two sets of languages, and that those two sets will have real differences in how you program them. Beyond that it's hard to create a definition of OO that has the second property; you may be able to split the world into "languages that implement OO visibility rules (private, protected, public) and those that don't", but you'll fail the second criterion, in that languages that just leave everything public aren't meaningfully different to program in than ones that implement the visibility rules.

I could create several different sets of "OO principles", which wouldn't be mutually exclusive necessarily but certainly would be distinguishable. Especially the distinction between the silly principle that OO objects should somehow reflect real-world entities, which was the major failure in 1980s/1990s OO principles and has, mercifully, all but died in the modern era but most certainly was at one point an "OO principle", and any of the several sets of OO principles I could name that actually function in the real world.


Well, this is HN so I did not want to sound condescending by stating the obvious.

OO Programming 101, OO principles: encapsulation, abstraction, inheritance, polymorphism, SOLID.

Whatever a specific language adheres to and how is beside the point.


That must be some kind of "new OOP", since the "old OOP" is messaging, local retention, and protection and hiding of state-process, and extreme late-binding of all things. At least according to Alan Kay, who wrote this verbatim.

To wit, encapsulation and abstraction existed outside of OOP (for example, Modula had it before), inheritance is not a necessary feature for OOP (Self doesn't have it), and the O in SOLID doesn't apply to Smalltalk and Self.


That's standard OOP as it stands today (versus the 60s when Alan Kay coined the term).

Alan Kay considers that inheritance and polymorphism are not essential, fine. He does consider encapsulation essential, though. Specific languages have their own take, fine.

The point being is that there are well-known OO principles. Claiming otherwise is either disingenuous or ignorant.


> versus the 60s when Alan Kay coined the term

It was in the 70s, and his description that I quoted is from the 2000s.

> Alan Kay considers that inheritance and polymorphism are not essential, fine.

Polymorphism is a logical outcome of his requirements. So in a purely logical sense it is essential, although I imagine that saying that might be a little bit like saying that CO2 is essential for a campfire (as in that you can't get a campfire without emitting CO2, even though that is strictly a matter of consequences).

> He does consider encapsulation essential, though.

Yes, because biological cells are encapsulated.

> there are well-known OO principles. Claiming otherwise is either disingenuous or ignorant.

There surely are some "well-known principles" but whether the "known" in that phrase has the same meaning as in "knowledge" (justified true belief at a first approximation) seems debatable.


The tree under your reply proves my point. There is no one set of OO principles. This thread identifies at least two, the original Kay principles and what I wasn't sure you were going to name, which is what I'd call the outdated 1990s ideas of OO. Then there's today's idea, which is probably pretty close to what I said in my post and is exemplified by duck-typed dynamic languages and a lot of modern languages like Go and Rust. That's at least three, and that's staying fairly broad; if we start quibbling about arcane details the count only goes up.


> what I wasn't sure you were going to name, which is what I'd call the outdated 1990s ideas of OO.

I'm sorry but this is getting surreal.

I named the standard OO principles and concepts which are very much valid and alive today, though of course how they are applied (or if they are applied at all) varies from language to language. Claiming otherwise is absurd. If anything this whole article and thread show that too many people are confused by the concepts of OO principles (if they know what that means at all), programming languages (that may or may not implement some of these principles), design practices/patterns (how to come up with a model of objects): these are all different things. Certainly selecting objects that reflect real-life entities is not an OO principle, for instance, but rather a design practice (good or bad, it depends).

In my team we do C exclusively and follow OO principles as much as practical. Any software engineer worth their salt has a good idea of what that means.


Those aren't exclusive to OO, though.

C, for all its faults, has encapsulation at a module level: any functions you don't define in your header file aren't exported and are thus private. Go and rust do the same thing.

Abstraction is even more common. Functions are abstractions. And any language with typeclasses (like haskell) or function overriding (like Julia) uses a form of polymorphism

Really, the only essentially object-oriented things here are inheritance and (by extension) inheritance-based polymorphism.


> C, for all its faults, has encapsulation at a module level:

No language that supports memory access for the entire address space of the currently running program can ever support something like encapsulation: you can pass pointers to objects and functions outside the currently running module, or you could somehow derive this info from outside the module and so access functions and objects that were not declared in header files. Thus the language cannot give you the isolation guarantees that memory managed languages can. What it can do, is put up some roadblocks or barriers that require effort to cross. But there is a big difference between correctness guarantees and roadblocks.

There really is a qualitative change when you are working in a memory managed language as that allows the language to assign fine grained control over which memory addresses are available to which data structures, which is something that you cannot do with C.


Well then, C++ doesn't have encapsulation, therefore C++ isn't OO. Hell, even Java isn't OO if you allow JNI.


I don't want to get into a debate defining what language is OO and what is not. My point was rebutting the notion that C has private data structures by pointing out only languages in which memory is managed by the runtime (e.g. VM) can offer isolation guarantees. Attempts to switch the topic to OS level memory protections are not really what we're talking about here, as the OS doesn't provide language level protections. So yes, if your code leaves the VM then you lose those VM protections.


> I don't want to get into a debate defining what language is OO and what is not

I mean, that is the discussion we were having. We weren't talking about language VMs.


I was replying to a statement that C had isolation and I pointed out that it didn't. The response was a non-sequitur: "So then even C++ isn't OO", and I responded that the question is not whether C++ is OO but whether it's memory is managed. Not sure how any of this is hard to follow or why these arguments should trip you up.


The statement was specifically that C had encapsulation, within the context of a discussion about whether OOP should be defined as "encapsulation, abstraction, polymorphism, and inheritance."

You interpreted that as meaning memory isolation for some reason (even though plenty of clearly-OOP languages do not implement that), and when someone asked you how that definition of encapsulation squared with the fact that C++ is generally considered object-oriented, you said you didn't want to have that conversation.

It's not hard to follow and it didn't trip anyone up; you just changed the subject out of nowhere and for no discernible reason by injecting a contextually-inappropriate definition of "encapsulation."


If we're talking about making guarantees about blocking the programmer's ability to modify parts of the address space, we're no longer discussing programming paradigms. We're discussing security proofs. The MMU does not play a core role in object-oriented programming.


Historically, this is not entirely correct. Segmented MMUs (as opposed to the more common, currently used concept of paged MMUs) were intended to provide the hardware support for the protection levels and the data/code mixture in OOP. I.e. each object would have executable, readable, r/w and inaccessible parts. Protected by the MMU, depending on the currently accessing context, that is, a subclass, friend class, other class, etc. But creating a segment descriptor for each object or even just each class was, of course, far too expensive in the end.


That's actually really interesting, I hadn't heard of that.


We're not talking about the programmer doing something, but about the code doing something, which is absolutely all about security proofs. And while the OS protects an address space, the OO runtime protects memory within that runtime, so a private variable isn't available to code running outside the class while the same cannot be said for C code. That's the benefit of offloading memory management in interpreted languages.


OOP is a set of principles. C is a language. These are not the same things.


Then OOP is no true scotsman, because no language implements all the principles.

Or in other terms, without an implementation, OOP isn't even usable, it isn't even real. Just maybe a desirable ideal somewhere.


"Abstraction" as a principle is something we've been doing since we came up with function calls. Encapsulation as a principle is something we do when writing C code. The only one of the listed OO principles which is in any sense exclusive to OOP is inheritance.


> Encapsulation as a principle is something we do when writing idiomatic C code.

That's clearly not the case. C obviously does not enforce encapsulation, and it's extremely common for devs not to follow this principle, in fact it's pretty much the default not to and it takes discipline to enforce it.

"Encapsulation at module level", as you wrote earlier, is not encapsulation. If you implement your object as a struct (which is really what objects are) then encapsulation means not accessing the content of that struct/object directly.


Encapsulation is information hiding, where the internal components of a unit of code are inaccessible to its consumers (by fiat or by convention — see python's _private methods). This includes hiding procedures, fields and types. Context objects are a form of data encapsulation, for instance, because their contents are meant to be inaccessible, and they're not uncommon in C.

I also gave the examples of rust and go, which have private struct fields but are not really object-oriented, and encapsulate at the module level. Point is, OOP does not by any stretch have a monopoly on encapsulation, and OOP should not be defined in terms of it.


Sorry but I no longer understand what you are arguing about, nor do I understand your point.

Encapsulation in the context of OOP means effectively hiding the data within an object from the external world and not allowing direct access to these data.

OOP may not have a monopoly on this but this is indeed a defining feature of OOP (which you know very well if you ever took a programming 101 course): You may have encapsulation without OOP, but in OOP you must have encapsulation. It's not OOP if there's no encapsulation.

Encapsulation is not something enforced by C (access to struct's fields is free for all). And this is not a principle generally followed in C code (most C code does directly access fields within whatever struct). Hence my rebuke to your claim of the contrary.

Now, obviously this can be done in C, this is a matter of choice. OOP can be done in any language. There seems to be confusion in many comments between OOP and specific languages.

Lastly OOP are a set of principles. Principles are rarely followed in their entirety and indeed many languages pick and choose which, if any, principles they implement and how they implement them. It's the same when 'practising' OOP in a language where you have to do everything "by hand", like C: You pick and choose as needed.

I'm out.


> this is indeed a defining feature of OOP (which you know very well if you ever took a programming 101 course)

Setting aside the fact that any 101 course is necessarily reductive and inaccurate, "encapsulation + abstraction + polymorphism + inheritance = OOP" is something that gets regurgitated a lot without ever really being argued in favour of.

Since the first 3 of those 4 points are not at all limited to OOP, it really doesn't make sense for them to constitute ¾ of the definition. Are they really OOP principles if basically every modern language follows them? And now that we largely agree composition > inheritance, OOP often ignores that fourth principle too.

I know you hate C as an example here, so let's use rust instead. If a rust codebase can exercise encapsulation, polymorphism, and "abstraction" (still the vaguest and weakest criterion imo), and OOP code is discouraged now from using inheritance anyway, what stops it from being OOP? Most of the rust I've seen hasn't fit with any conventional notion of OOP, but it still technically matches the definition. Doesn't that make it a bad definition?


ML supports encapsulation via modules and abstract types.


But not destructuring objects into SoA memory layouts is a "principle", since the availability of pointers to objects is rather fundamental for all OO "specific languages".


> But not destructuring objects into SoA memory layouts is a "principle"

No, not really. There are languages that allow inside-out-objects where the "object" is a tuple of (class, index). The class holds a bunch of arrays containing each object's properties at the index-position that the object indicates. Totally destructured, yet holds all the usual OO "principles" like implementation hiding, abstraction, access via objects, etc. https://metacpan.org/pod/Object::InsideOut

This is exactly what I meant with "there are no OO principles". Noone has a clear-cut set of those and almost anything can be made to fit some set of "OO principles".


> since the availability of pointers to objects is rather fundamental for all OO "specific languages".

I don't see how that is the case. For example some implementations of Smalltalk used object tables, so there were no "pointers to objects", just numerical object IDs. The physical interpretation of such IDs could get very arbitrary.


This is not specific to OO and is not an OOP principle. As soon as you have data dynamically allocated in memory and you start passing pointers to them around you have to be careful.


Overuse/misuse of inheritance has triggered hatred of OOP among many software developers...


I’ll go out on a limb and posit that there are virtually no valid uses of (implementation) inheritance. Perhaps one valid use is getting rid of delegation boilerplate (e.g., normally you would compose one object inside another but you want the outer object methods to delegate to the inner object methods but you don’t want to have to write N function definitions that just call the same methods on the inner object so instead your outer object inherits from your inner object). This problem is better solved by something like Go’s struct embedding since it doesn’t do anything more than this kind of automatic delegation.

And if you get rid of inheritance, there is very little left to distinguish OOP from procedural programming like one would do in C or Go. And this is the semantic problem: no one really agrees on what OOP is and proponents will rebut any criticism with “that’s not true OOP”. Any definitions of OOP that aren’t easily assailable are also indistinguishable from other existing paradigms.

Downvoters: i’m very interested in your opinions about why I’m wrong and specifically when you think inheritance is appropriate. Everyone says “there’s a time and a place!” but no one articulates when/where beyond cat/dog/animal toy examples.


Alan Kay spent the last 40 years educating people on OOP and system design. His talks and research papers are now widely available on the internet. There are free, modern and easy-to-use versions of Smalltalk. Anyone who still remains ignorant about fundamental ideas behind classic OOP and the paradigm's history is willfully ignorant.


Lots of OOP proponents disagree strongly with Kay’s definition of OOP, and his definition certainly doesn’t reflect the way the most popular self-described OOP languages are written today. Notably, Smalltalk has a negligible share of the market, so why should anyone waste time debating Kay/Smalltalk’s notions of OOP when they are at best niche?

Further, and more relevant to the thread at hand: it’s not clear to me that Kay’s notion of OOP considered inheritance to be a critical feature. To quote him:

> I felt somewhat the same way about inheritance as I did about types, in that both needed to be a lot better than they were in order to pay for the overheads and pitfalls of using them.


> Notably, Smalltalk has a negligible share of the market, so why should anyone waste time debating Kay/Smalltalk’s notions of OOP when they are at best niche?

Objective-C[0] is C with Smalltalk's "notions of OOP." Objective-C has been the dominant programming language for making macOS and iOS programs since OS-X was first released. Swift[1] is taking over the role Objective-C once held alone, but Swift's roots in Smalltalk's "notions of OOP" are easily discerned.

0 - https://en.wikipedia.org/wiki/Objective-C 1 - https://en.wikipedia.org/wiki/Swift_(programming_language)


I think one problem here is that you can't really compare Objective-C to let's say Java as they are used for different purpose. Swift and Objective-C have negligable market share outisde of the Apple ecosystem, and Java or C# have a negligable market share inside. So it's not a Alan Kay OOP/not Alan Kay OOP split, but a rest of the world/Apple split.


I think I agree with you. However to the parents point: i think the implication is we might be enlightened about why Alan defines OOP the way he does when we contextualize it with Smalltalk, the language in which he used it. That's a fair point.

But again, you're right: most of us aren't familiar with Smalltalk and so find the very idea of reading such papers daunting at best. I think I'll finally try it though ...it can't be that hard of a language to grasp and it may well lead to some insights about why OOP, as defined by Mr. Kay, is defined as such.


> I think I agree with you. However to the parents point: i think the implication is we might be enlightened about why Alan defines OOP the way he does when we contextualize it with Smalltalk, the language in which he used it. That's a fair point.

I absolutely agree that understanding Kay and Smalltalk can help one become a better programmer and give context into the history of OOP. But it can't be interpreted as anything other than a semantic deflection in the context of a response to substantial criticism.


I've never heard this brought up before. What's the distinctions? The thing I've noticed and found lacking in modern OOP is that it tends to be class-based without metaclasses or metaprogramming. Is there something else? Static typing is also something that not in Smalltalk, but that shouldn't change the network shape of objects.


Here are some of the definitions I've heard:

* OOP is about message passing (where message passing is NOT method invocations)

* OOP is about message passing (where message passing can be method invocations)

* OOP is about encapsulation (never mind that most/all paradigms make extensive, idiomatic use of encapsulation--some OOP proponents suggest encapsulation implies constructors that do lots of work, take ver few arguments, and make the class virtually untestable, others argue that this is an "abuse" of OOP or "bad programming")

* OOP is about inheritance

* OOP is a Kingdom-of-nouns programming style (effectively Joe Armstrong's "You wanted a banana but what you got was a gorilla holding the banana and the entire jungle" observation)

For all of these definitions, I've heard many OOP proponents argue that these things are not true OOP (typically without rebuke from other OOP proponents in the forum, bizarrely).

In my opinion, OOP must be defined by the things that distinguish it from other paradigms. Considering encapsulation and method calls are both fundamental to other paradigms, these cannot be defining characteristics of OOP. Additionally, any defining characteristic of OOP must be shared by languages that are virtually universally recognized as OOP, which means that message passing in a non-method-call sense must be excluded. That generally leaves inheritance, "extreme encapsulation" (untestable constructors), and kingdom-of-nouns programming styles.

I don't think the "class-based" thing is meaningful because apart from inheritance there's not much to distinguish a "class" from a struct in Go or Rust (in both cases you can associate methods to the struct for interface polymorphism) which are generally not considered to be "OOP languages" (and Go certainly doesn't have metaclasses or metaprogramming).

> Static typing is also something that not in Smalltalk, but that shouldn't change the network shape of objects.

I agree that static typing is not a defining characteristic of OOP, and I've never heard anyone argue that it is.


A thing common to OOP that's missing from this list is localizing data and behaviour together, and the tell-dont-ask way of getting things done in OOP.

I meant that the newer less-pure-OOP languages tend to be statically typed vs Smalltalk etc where objects have behaviours but not compile-time shapes.


* message passing in Smalltalk is implemented as method invocation (the same is in Java, C#, C++, ...) * encapsulation in Smalltalk: all fields are private/hidden (but: all methods are public)


This only seems true for the case of simply defined methods. Differences arising from being able to do late binding is better described on Dynamic Dispatch wiki page[0].

[0] https://en.wikipedia.org/wiki/Dynamic_dispatch#Dynamic_dispa...


In my case, I'm using Django on several projects. It uses inheritance to implement the ORM, the views, filters, etc. etc.

When you say "there are virtually no valid uses of (implementation) inheritance" ... how do you expect the thousands of us using Django to respond to that? A link to the Django framework? To defend Django? What is the point?

You're right, maybe python's object model could have instead been implemented like Go's struct but it wasn't.


IMHO there's quite a difference between what's technically inheritance in an ORM (I am assuming you mean writing "class MyModel(orm.model):", not in-database-inheritance) and Java'esque tree hierarchies of classes.

The former is mostly about invoking a type operator / metaclass [1] to construct a model class from your declarative specification.

I don't know what the latter is about. I think deep inheritance (where deep means like "more than 2") are virtually always a mistake. Stuff like toolkits that go Object>Widget>AbstractButton>PushButton might be an exception but I'm not entirely sure, there's probably a better way.

[1] I think metaclasses aren't type operators in the strict sense, because they're not handed a finished type, but rather the declaration of a type, and then create a type. Maybe there's a word for that.


I don't know enough about Django's API to speak intelligently, but I have 15 years of Python experience and (except when APIs require it) it's perfectly easy to write Python code without using inheritance (and your code base will be better because of it). If Django or whatever requires using inheritance, you have my sympathy, and I'm not arguing that you should go to great lengths to avoid it--I'm arguing that from a language design perspective inheritance is a mistake.


And yet here I am being productive.


And I’m sure whoever told you that you couldn’t be productive in spite of inheritance feels ashamed. I hope they see this post!


I see why you use a throwaway account.


I’m not brave enough to use my legal name, unlike you Mr. Ensorceled.


> I’ll go out on a limb and posit that there are virtually no valid uses of (implementation) inheritance.

  class Iterable { ... }

  class Tree extends Iterable { ... }
  class SortedTree extends Tree { ... }

  class Map extends Iterable { ... }
  class HashMap extends Map { ... }
  class InsertionMap extends Map { ... }
  class BiMap extends Map { ... }
Need more examples of "valid uses of (implementation) inheritance"?

> And if you get rid of inheritance, there is very little left to distinguish OOP from procedural programming like one would do in C or Go.

No. To make OOP indistinguishable from "C or Go", you would also need to eliminate at least; encapsulation, composition, access control, and compiler provided dynamic dispatching.


> Iterable

That’s easily achieved with plain interfaces. It’s not obvious to me at all why I would want to use inheritance instead of interfaces, especially because you elided the class bodies.

> No. To make OOP indistinguishable from "C or Go", you would also need to eliminate at least; encapsulation, composition, access control, and compiler provided dynamic dispatching.

C and Go have all of those things except that C doesn’t have “compiler provided dynamic dispatch” and I’m not sure how “access control” differs from “encapsulation” (access control is just public/private/etc, right?).

* encapsulation: in C things in the header files are “public” while things in the c files are “private”. In Go we have private/public struct members.

* composition: both C and Go have structs

* compiler-provided dynamic dispatching: Go has interfaces and closures. C doesn’t have this provided by the compiler but you can implement them yourself easily enough. You lose out on some type safety, but that’s no worse than a dynamic OOP language and if you care about safety you probably aren’t using C anyway.


Some cases where inheritance may be useful: UI widget libraries, graphical/drawing objects, (very similar) stream implementations. All this can of course be done without inheritance, but with inheritance it is much more elegant.


Disagree. See reagent[0].

[0] https://reagent-project.github.io/


Thank you for responding.

I probably agree with the widget library example, but even here a widget library is just one way of implementing a GUI toolkit and perhaps it’s just the emergent result of an OOP-based approach (and consequently there are some really brutal tradeoffs in using a widget-based approach for a toolkit). To that point, with respect to general drawing libraries, I don’t think that OOP/inheritance yields a more elegant design than a reactive or intermediate mode approach (perhaps these terms only apply to GUI, but I imagine there are parallel terms for graphical APIs in general).

Even though I concede narrowly on the widget GUI library point, these libraries are so very rare that I don’t think it’s a very compelling case for OOP proponents. I suspect they’d like to argue that there are appropriate uses for inheritance in most applications and not just the odd GUI library. Otherwise I don’t think it even merits first-class language support (why bother with an `extends` keyword if it’s only going to be useful in the rarest of libraries?).


Well, since you ask, I downvoted you for restating the false equivalence between OOP and inheritance, for the abjectly incorrect (and also, wholly unconstructive) straw-man statement that no-one agrees what OOP is, for the frankly absurd and wilfully ignorant claim that no-one articulates practical examples for the circumstances when inheritance rather than composition might actually be worth considering, and for bitching about downvotes, particularly that explicit assumption that they’re coming from people wedded to inheritance, and not from people who dislike disingenuous arguments and circular reasoning.

Read a fucking book, instead. David West’s Object Thinking, for example.


> Well, since you ask, I downvoted you for restating the false equivalence between OOP and inheritance, for the abjectly incorrect statement that no-one agrees what OOP is, for the frankly absurd and wilfully ignorant claim that no-one articulates practical examples for the circumstances when inheritance rather than composition might actually be worth considering, and for bitching about downvotes, particularly the explicit assumption that they’re coming from people wedded to inheritance, and not from people who dislike disingenuous arguments of any form. Read a fucking book. David West’s Object Thinking, for example.

Thank you for clarifying that your downvote was definitely not an emotional overreaction. :)


> Thank you for clarifying that your downvote was definitely not an emotional overreaction. :)

Smileys don’t turn snide into humour, so now you can add “naked ad hominem” to the litany of downvote attractors.


I thought it was the italics. In which ever case, this conversation has definitely been fruitful and enlightening. Thank you for your contributions.

.

.

.

.

.

:)


This line of vacuousness merely continues to highlight how brittle and indefensible the original arguments were.


Could you argue your point a bit more than directing people to a book? You've listed what you disgree with, now's the time to back up your claims.


Bad-faith trolls demanding “why did you downvote me” get at most list of their sins and references to improve. Read it, don’t read it, up to you, but I don’t spoonfeed sealions. Reading books is the best inoculation against being suckered by Barnum statements like “no one really agrees on what <concept> is” or “no-one says <thing readily found in books>”.

Further reading: Refactoring (Fowler), Smalltalk Best Practice Patterns (Beck).


Sorry I hurt your feelings friend. I hope you’re able to work through this. Best of luck.


No-one's "feelings", whatever that entails, are hurt, and we're not friends.

I recommend not making any more false statements.


Yeah, I've loved classes and OOP for a long time, but lately I've been on a project that is going into more and more contortions and complexity to make everything fit a theoretical ideal. It's revived my interest in learning FP languages to avoid the arcane complexity that some people make out of OOP.


I feel complexity creep is often inevitable regardless of the paradigm or technology you use. Programmers have varying levels of complexity they can handle and the systems generally naturally grow to match the complexity that the people that work on them can support.

Experienced programmers will manage to keep that complexity creep under control for longer and smarter programmers will manage to keep working on complex systems that lesser peers would have no chance to understand.

But eventually, unless the software has a very clear functional boundary which is often not the case for business software, software will start to become increasingly complex and dev velocity will slow down, quality will drop... I've seen this happening at all skill levels regardless of the languages and paradigms used.


> I feel complexity creep is often inevitable regardless of the paradigm or technology you use.

This. I always love toy examples that look so nice, concise, and elegant. Until you add proper error-handling and corner cases that is; then all of a sudden it doesn't look half as concise, elegant, and beautiful anymore...


The problem with inheritance-heavy OO is that complexity creep rapidly makes the code completely unreadable and incredibly hard to work with (and basically impossible to refactor), rather than just making it more branchy and complex.


> Overuse/misuse of inheritance has triggered hatred of OOP among many software developers...

"It is not the tools we use that make us good, but rather how we employ them."[0]

0 - https://en.wiktionary.org/wiki/a_bad_workman_always_blames_h...


Honestly I find ECS dogmatic and difficult. Data oriented design as a general practice is a good thing, but there’s layers to all things. OOP is a very broad category of practices and designs, and ECS usually refers to one specific architecture.

The specific architecture in question tends to be full of soft dependencies - most ECSes don’t allow you to simply store a piece of data without opting in to all systems matching the data type executing arbitrary code on it. So much for separation of data and behaviour. No, now they’re even More dependent than they are in the usual OOP sense and you might not even know it.

Furthermore, usually when you want to think about in-game entities, you want to look at a single class. Now all entities’ data is split into numerous components and all entities’ behaviour is split into numerous systems and you don’t know frame by frame how they’re gonna interact, provided you even know about all systems in the first place. It’s total spaghetti code.

I’m much more inclined lately towards a shallow actor system . If I want to know how player’s behaviour functions, I need only look at player.cpp and nothing else. Then, to reap the benefits of data oriented design, certain objects can use a certain allocation scheme that makes sense for the object in question. In the general sense, any Component<T> can just have a vector<T> or an unordered_map<T> of all components and the memory access is abstracted away without it being detrimental. That’s C++’s whole deal actually, zero cost abstractions.

I wouldn’t call an entire paradigm shift in which one rewrites everything from memory allocation all the way up to ‘use WASD to move’ zero-cost in any sense.

In C++ it is trivial to overload new, or derive from some class which does, or to write a custom allocator, and frankly there is zero need for the same person who’s writing ‘use WASD to move’ to know about memory management.


The way I view it is that you look at components like a database -- a bunch of tables that don't really tell you much about the business logic; the main thing they do is offer data coherence.

And like a webserver, the business logic has been entirely moved out of the data storage -- you construct an "object" out of the raw parts from the database, and you operate on that. The main thing is that you can construct multiple objects from the same raw dataset -- different views (as in MVC).

I think however it's a mistake when you assume most game's design -- where largely there are a very small amount of entities, and a small amount of behaviors, and really the game design is about careful placement of these fairly rudimentary entities on the map. In that scenario, the flexibility of ECS does you no good -- the game design is itself inflexible, so making your logic flexible is largely a premature optimization. If there's only one reasonable "View" of the entity, being able to construct an infinite set of alternative views is pointless.

ECS appeals to me more when you start talking about simulation-style games, where the game is far less hard-coded. Dwarf Fortress is ridiculously flexible in its game design (at runtime), and ECS would be a natural fit for that (entities in DF are literally defined by tags, and groups of tags, and those tags get modified at runtime[0]). It's not spaghetti code then -- it's really the only reasonable way to approach the problem.

Defining each entity uniquely makes a lot of sense when your entities are largely unique (perhaps with a common base, e.g. for physics). ECS makes more sense when your entities share of lot of logic, but random subsets of it, and especially so when the game itself treats that random subset as dynamic.

[0] http://www.dfwk.ru/Creature_standard_1.txt


Dwarf Fortress is one of my favorite games if not my favorite so props for citing its internal representation. I definitely think ECS makes sense in many contexts, and I think it's at its most powerful in tandem with a more encapsulated actor system. Example -

  struct Player: public Actor {
    Component<Sprite> sprite;
    Component<Transform> transform;
    void update() override {...}
  };
Where Component<T> is a handle to some backing storage indexed by an Actor's ID. This way, you can go the traditional route of having Player update itself in its own update() method as well as being able to Component<Sprite>::iter() along with other components for the render loop ECS-style.

My point now and my point then was going all-in on ECS as the basis for your entire architecture rather than taking a more principled approach drawing the strengths of OOP and DOD is dogmatic and difficult.

It's interesting that you mention web services - I have a lot of respect for databases and I enjoy the process of using them, however, I wouldn't ever program a web app in SQL. That's what pure ECS feels like to me. I agree that having objects manipulate the state via upholding internal invariants is the way to go. Whether you call them Actors, or Systems, or Controllers, it's kinda one and the same. That's why I love DOD as a base architectural layer to be abstracted upon, but dislike ECS as a programming paradigm.


>I wouldn't ever program a web app in SQL

My point is that ECS isn't writing SQL -- it's storing data/state in SQL [components], but operating [systems] with whatever normal programming language, in a stateless environment (at least, stateless between HTTP requests / ECS system definitions). It specifically separates the data from the business rules, which is exactly normal anytime you use a DB, but highly unusual in the context of a self-contained program (where OOP defines an object as pretty much precisely the conflation of the two -- to benefit and detriment. A class definition stores both the data, and the logic that operates/maintains it). The ECS system's query fetches the relevant dataset to work on (ala SQL queries), and the system's definition defines the business logic (ala JS/python/etc webserver).

>Where Component<T> is a handle to some backing storage indexed by an Actor's ID. This way, you can go the traditional route of having Player update itself in its own update() method as well as being able to Component<Sprite>::iter() along with other components for the render loop ECS-style.

The main problem with your model, versus e.g. bevy, I think is:

1. You lose the cache coherency gains -- objects and their handle location have no relationship to each other; so you end up hopping across the arrays randomly to find the relevant data as you call update() per object. This can probably be solved regardless, and anyways I don't care much about this -- the data modeling is more interesting, and performance just needs to approach "sufficient"

2. Defining update() per class, which happens to use the components, makes it significantly less flexible -- adding a Component<weight> to multiple classes means duplicating the logic to each class as well. You can move the logic out to a general function, and add the one call to each class that wields the component, but to make that function shareable, you're going to drop the reference to the originating class, taking only the component as input. And now you're back to systems (components alone tell you the operation to apply). Taking multiple components as input, or optional components, gives you the same structure. One difference is you can choose to not call a system despite having the components, but I think the normal ECS strategy would be to have a marker component that gets checked by the system's query for not-set.

I think ultimately, data modeling wise, they're largely the same. The update() call should be largely defined by the components the entity has -- ECS just enforces that it must be defined by it. The loss is that in your model, you could read update() alone to tell you all the logic in play, but the gain is that the update() doesn't have to be defined repeatedly (where components/logic is shared), and changing update() is done by changing Component -- which your model prefers but does not enforce.


After studying ECS for all of a week, I was left wondering if there wasn't a way to reintroduce strong typing to an ECS system (without reintroducing all the problems of inheritance). So you have a player_entity factory that ensures that a player_entity only wrap entities that are actually players. Then you can pass that around to strongly typed functions, but keep the overall design reasonably inheritance-free so it was more like rust/go's strong typing systems.


I've thought about this in the past too, and come to the conclusion it is too difficult. Part of what I don't like about ECS is that it's too dynamic, you can't add static typing like this.

Sure, you can say that a "Skeleton" entity has an HP component, an AI component, etc. But there's no way of enforcing static typing on this.

Say you have some function that takes an Entity, validates it's a Skeleton, and then returns a SkeletonEntity which is just a wrapper around Entity for static typing purposes. Perhaps add some helper methods for fetching components, allowing you to operate on an OOP-like API with very little runtime cost. Seems like it works.

But there's no way to guarantee the skeleton STAYS as a skeleton for the future. You might add another thing later on that converts an entity with AI into FriendlyAI, and your SkeletonEntity implicitly relied on the AI being a SkeletonAI. Heck, you can't trust _any_ Entity. That handle to an entity you have might not even point to an Entity anymore, it could've been deleted.

ECS is very dynamic, which is great for designing open-ended games where you don't/can't plan every interaction in advance. It also has great performance, and is typically the _only_ sane way to implement games in a language without inheritance (I don't think Composition + Interfaces is very scalable). But for heavily structured games that want tight coupling between entities, it relies on you implicitly keeping your promises about what an entity means. You can't go and add new behavior that modifies existing entities later on - The compiler won't warn you, and you may end up crashing your program at runtime, unless you were very diligent about adding checks in your code for the coupling you assumed existed, and having good fallbacks for one those assumptions are violated.


> But there's no way to guarantee the skeleton STAYS as a skeleton for the future.

It seems like there needs to be some guarantees made about this. You can't have background jobs asynchronously changing players into spaceships and vice versa while there's other jobs running systems against those.

I would guess that most games made with ECS systems that are threaded would deal with this by having a queue of requests to change entity components that would get processed once at the start of an update cycle and then a consistent state would be shown to all the systems in that update.

That's really a state change in a finite state machine and I would imagine there's some work out there on systems of FSMs and concurrent updates and keeping things sane.

Deletion could similarly be scheduled until the next tick and since systems should be stateless they shouldn't be saving those handles (or if you allow them to save the handle for some reason, you require them to check if the handle has been marked as deleted every tick).


Back when I was trying to design an ECS engine I had a class Prefab<T...> which you would inherit from using CRTP to define entities with stronger static typing. For example, Player would be

  class Player: public Prefab<PlayerController, Transform, Sprite> {};
or something like that. C++ is infinitely flexible in that regard. I was never quite able to work out what would happen if you were to add/remove components at runtime to such Prefab classes. The semantics never quite made sense to me in terms of static typing, because it's no longer static.


In the case of C++, you can do that via templates and static dispatch, now with C++20 is even better as each entity can be modelled as a concept.

something like,

    template<typename T>
    concept Player = requires (T p) {
        p.jump(); 
    };

    class Factory {
        public:

        template<Player p>
        void register_entity(p player);
    };
Just as seed for the overall idea.


You can get some pretty unbelievable performance gains out of a single writer and arrays of structs.

Bonus points if you figure out a way to have an array per type of struct pre-allocated with more elements than you will ever need. Even if you use a GC language you can almost eliminate collections with this approach.


Even the array of structs is a non-ideal approach, as structs are usually viewed as a static collection of data.

But if you look at the hot loop, it usually boils down to a funnel - not unlike a furnace. Lots of highly spacious needing raw materials are gathered and passed through, to be condensed into relatively small output.

So the ideal structure is a sort of union-struct, that compresses the results down each step of the algo, keeping it all in cache, while keeping it slim..


Would love to hear more about that kind of compressing, how does that go?


People forget what the intent of OOP originally was.

OOP was envisioned as a way to manage software projects with many contributors at a time when we didn't have half the tools for hiding context that we do now.

Micro-services and micro-kernels are far far far more prevalent these days.

Garbage collection was also far less of a thing in that era, as all programmers were squeezing every last iota out of the hardware.

Hence rogue pointers were far more of a risk.

Multi-core? Haha.

I know this is not particularly relevant to the original article, but if you don't know the history and the intent behind why something exists, you are reasonably likely to misapply it.

Most of the mistakes of OOP are from a lack of understanding of why things got invented in the first place.


> People forget what the intent of OOP originally was.

Not really.

> OOP was envisioned as a way to manage software projects with many contributors at a time when we didn't have half the tools for hiding context that we do now.

No, the purpose of "OOP" is specifically for "hiding context" by encapsulating implementation logic exposed via a collaboration contract.

> Micro-services and micro-kernels are far far far more prevalent these days.

Non-sequitur.

> Garbage collection was also far less of a thing in that era, as all programmers were squeezing every last iota out of the hardware.

This literally has nothing to do with a programming paradigm.

> Hence rogue pointers were far more of a risk. > Multi-core? Haha

Again, this literally has nothing to do with a programming paradigm.

> I know this is not particularly relevant to the original article, but if you don't know the history and the intent behind why something exists, you are reasonably likely to misapply it.

This is a wise statement, one which I hope you say aloud whilst reading this reply.

> Most of the mistakes of OOP are from a lack of understanding of why things got invented in the first place.

A programming paradigm is not the source of mistakes. Its practitioners certainly can be however.


Also, the article does a really poor job of describing any drawbacks of Data Oriented Design. It's a real pet-peeve of mine.

> Drawbacks of Data-Oriented Design Data-oriented design is not the silver bullet to all the problems in game development.

Ok, they don't view it as a silver bullet. This seems promising for an evenhanded discussion. I'm curious what the author thinks the drawbacks are.

> The main problem with data-oriented design is that it’s different from what most programmers are used to or learned in school.

So the first drawback is that nobody knows your silver-bullet? That's a cop out.

> Also, because it’s a different approach, it can be challenging to interface with existing code, written in a more OOP or procedural way.

And the second drawback is that code was written without using your silver-bullet? Seriously?

If the only two things you believe are drawbacks about your tech are that not enough people know it, and not enough people are using it then it's not an even handed discussion of your tech.

Discuss the actual trade-offs you've learned from using it. Not nonsense like nobody knows how wonderful it is, nor is using it.

And that's coming from someone who agrees that OOP has huge flaws and with the most common applications of inheritance creates many flawed program architectures.


The original intent had nothing to do with "many contributors".

The main ideas: encapsulation, message passing and late binding i.e. dynamic binding.


You're talking about solutions when they were talking about problems. What problems does encapsulation, message passing and late binding solve?


> OOP was envisioned as a way to manage software projects with many contributors at a time when we didn't have half the tools for hiding context that we do now.

> Micro-services and micro-kernels are far far far more prevalent these days.

I think that's a good analysis. If OOP was a solution to an organization problem, then microservices are the "new" way to do it. Microservices respect late binding, message passing, encapsulation. I don't really know how inheritence would fit into the equation, as I don't know exactly how companies with hundreds of microservices do it. And since we don't care about what's inside the objects (services), we're now free to write them in Java, C++, Smalltalk, Erlang, Haskell or Pascal.


I was expecting to see some example code, or some actual performance metrics to show why data-oriented design is better.

I actually have written a game that was pure functional style with a single giant state object for the game data and it worked well for me. But I'd want to see some evidence for this approach before changing the entire architecture of my game.


This graph made the rounds on Twitter last week and I think encapsulates the answer really well: https://twitter.com/eric81766/status/1407393532562841607

Most games won’t benefit. Most AAA games won’t benefit for their gameplay code and are otherwise very data-oriented already.


ECS isn’t the same thing as data-oriented programming is it? I’ve worked on AAA games and this whole discussion is quite confusing, lol.


Nope! If you want a concrete example I’d recommend looking at the Unity DOTS stuff which is their data-oriented stack and does include an ECS as part of it.


Sigh... This again.

I am going to repeat this for what seems like a ten thousandth time.

"OOP is a tool to solve a particular type of problem. It is your responsibility to know the tool, to understand its strengths and weaknesses and when it is applicable and when it is not. If the tool does not work it is not the tool that is faulty, it is you who are the problem -- by using the tool in a wrong situation or incorrectly."

In particular I detest "we are OOP shop" type of approach. This immediately advertises they have absolutely no idea how to use stuff -- by saying you know only one tool and sure you are going to use it to solve every kind of problem.

Those languages that were supposed to be "everything is an object" like Java? Now are learning that maybe that is not the most sound approach and trying to evolve to allow other paradigms under one roof.


This is boilerplate that can be used to defend any idea. First of all, the article is trying to explain a domain for which OOP is not well-suited. Secondly, it’s unhelpful to write-off this article with “there are places for which OOP is well-suited” without any specifics about when it is the better approach, especially how it compares and contrasts with other approaches.


What I object to is this "OOP is good / bad" type of approach.

OOP is neither good nor bad. What is good or bad is your selection of technique for the problem at hand.

That's the same misguided discussion as on whether strong typing is good or bad. It is neither. What you want to select depends on the kind of project you are trying to use it for.

Also, just because this template can be used for practically any idea doesn't make it less valid. Greatest laws tend to have universal applicability.


> Also, just because this template can be used for practically any idea doesn't make it less valid. Greatest laws tend to have universal applicability.

I don’t think it’s invalid, but useless. “There’s a time and a place!” is not very enlightening. It doesn’t help anyone understand when to use it nor does it indicate how frequently it is helpful (e.g., in every application or only very rarely?).

> That's the same misguided discussion as on whether strong typing is good or bad. It is neither. What you want to select depends on the kind of project you are trying to use it for.

Isn’t this a straw man? TFA illustrates in detail the problems with OOP in game development (e.g., performance), so presumably for applications that care about those things, then you have an idea about when it should be used. This is far more helpful than “but there’s a time and a place!” protestation or the “but OOP is neither good nor bad!” protestation nor for that matter the “But that’s not true OOP” protestation.

TFA substantially criticized OOP—if we can’t rebut the criticism substantially, instead resorting to pithy sayings, then maybe we should consider the criticism more carefully?


I have not criticized TFA's arguments, only general approach.

You can be correct with details yet completely wrong about overall findings.

Of course abuse of OOP is causing memory layouts that are bad for performance. But you can't jump from this to saying that OOP is bad.

I will give you an example to show how absurd is this argument.

Python is order of magnitude more damaging to app performance than OOP. Surely, that must mean that "Python is bad" and projects should not be using it.

This is absurd, invalid way of coming to conclusions.

> instead resorting to pithy sayings, then maybe we should consider the criticism more carefully?

I don't care much about insults, certainly not the ones coming from anonymous people.

A lot of contemporary bloggers were not even alive when I started working in development and I seen generations of people making same mistakes.

This is discussion you don't solve by getting more into details but rather zooming out to understand general truths.


> Python is order of magnitude more damaging to app performance than OOP. Surely, that must mean that "Python is bad" and projects should not be using it. This is absurd, invalid way of coming to conclusions.

It is, but I don't think anyone is coming to this conclusion by way of this line of reasoning. Rather, the conclusion one can arrive at is that Python is bad for performance sensitive applications.

That said, if no one can articulate a class of applications for which OOP is indeed beneficial, but rather only says "but there's a time and a place!" without asserting those times and places specifically, then one can reasonably conclude that OOP likely isn't a very good paradigm.

> Of course abuse of OOP is causing memory layouts that are bad for performance

It seems to me that the performance criticism holds for just about any code base that is discernibly "OOP". It also seems to me that for any term that is so poorly defined as "OOP", someone could defend that term from any criticism by arguing that the criticism applies only to abuses of the term. In other words, a no true Scotsman deflections.

> I don't care much about insults, certainly not the ones coming from anonymous people.

I wasn't making an insult, I was claiming/observing that your argument lacks substance, that it's empty rhetoric that can be used to defend any position. I don't mean it as an affront to you personally.

> This is discussion you don't solve by getting more into details but rather zooming out to understand general truths.

I don't think you're observing "general truths" but rather making pithy statements (again, I mean this literally). In particular, you're arguing that there are use cases for OOP without proposing any, and now you're arguing that we don't demonstrate OOP's utility by demonstrating use cases but rather by "zooming out" (presumably to vapid rhetoric). It seems to me that I could say that Bigfoot exists, and on being asked for evidence, I would just argue, "you don't prove Bigfoot's existence by way of evidence" or similar. How can anyone in good faith interpret this as anything other than a dodge?


This article was "originally printed in the September 2009 issue of Game Developer." Any news since then?


Since then, it has definitely become mainstream. For games, it kind of "merged" with Entity-Component-System architecture, which is used by lots of mainstream engines and is kind of popular these days.

IMO, DOD+ECS is not only a good performance hack but also a great architectural pattern for organising game code in general, compared to more traditional techniques.


I think it is popular mainly _because_ how it can work in game engine editors (unity, UE, ...).

You can't really do much OOP from the graphical editor but ECS is basically drag and drop


> Since then, it has definitely become mainstream. For games, it kind of "merged" with Entity-Component-System architecture, which is used by lots of mainstream engines and is kind of popular these days.

This isn't really true, it's extremely popular on the Internet but much less so in commercial game development land. Data-oriented design on the other hand is extremely common as things need to run fast. But that's almost exclusively in the underlying engine rather than for gameplay code.

Outside of that there are a lot of in-progress implementations in popular engines (likes DOTS in Unity), lots of very early open-source general game engine projects using them (like Bevy) and loads of open source implementations of which I believe one has actually shipped in a commercial game (EnTT in Minecraft Bedrock Edition). The other famous shipped game using an ECS was Overwatch.


For most game developers in the scene these days, DoD + ECS is the traditional technique. The dogma also requires chanting in unison how much better than OOP it is.


As a react developer I find the OOP prevalence hard to tolerate when I try to learn unity. I'm really hoping that the Dots architecture can make things more enjoyable. I've tried it a bit but there's a lot to learn and from what I understand the APIs are not to be considered stable yet (?).

Meanwhile I've discovered react-three-fiber which feels like the way I want to build 3d stuff.


There's been a few talks: CppCon 2014: Mike Acton "Data-Oriented Design and C++": https://www.youtube.com/watch?v=rX0ItVEVjHc

CppCon 2018: Stoyan Nikolov “OOP Is Dead, Long Live Data-oriented Design”: https://www.youtube.com/watch?v=yy8jQgmhbAU

Building a Data-Oriented Future - Mike Acton [2019]: https://www.youtube.com/watch?v=u8B3j8rqYMw

And a blog/book has been created: https://www.dataorienteddesign.com/


Something I really enjoy about golang was the design decision to make structured data stay separate from function, but then provide a simple mechanism for defining functions that worked in the context of structured data. It feels like a good split between the two desired uses for structured data... On the one hand, it's easy to encapsulate the data for common use case, and on the other hand I can generally trust that if I need to bit-bash a data structure (i.e. use only a piece of it, or introspect it, or serialize it), I can do that with a minimum of care as to ny object-like metadata it might be carrying.


In my opinion, the most important aspect of data-oriented design is to always consider collections instead of so-called "objects".

It is then logical to optimize for access pattern instead of the processing of a single entity.


Can someone please point me to an example of a well-designed (modular, maintainable) FP project (e.g. on GitHub)?

I've only had negative experiences so far and I can't imagine how to use FP in a modular way (high cohesion, loose coupling) so I'd like some examples to look at. A simple game would be nice.


Discussed at the time:

Data-Oriented Design (Why You Might Be Shooting Yourself in The Foot With OOP) - https://news.ycombinator.com/item?id=1004569 - Dec 2009 (28 comments)


What is “flat” codebase, if you don't mind? Is it opposite to the hadouken-style nested code?


Somewhat related: it's time to toss out file trees as our primary module management technique:

https://news.ycombinator.com/item?id=25347043

We've outgrown file trees.


One thing that is interesting about this, is that people sometimes end up building an (incomplete) implementation of relational algebra to achieve this, where any given system in the game logic pipeline might join over multiple components.


Thats what made ECS click for me: its an in-memory, natively typed database with a feedback loop.


I've been "shooting myself in the foot" for 40 years already to, thank you.

OOP, FP, Data Oriented, insert your favorite here are just a friggin' tools in the arsenal and all are fine when used appropriately. One does not negate the other.

You just do not use the approach used when writing firmware for low power microcontroller for writing Solidworks. And it is ok to mix and match if type of software to be written benefits.

There are no silver bullets in this world and one must learn what to use and when, Trying to convince people to stick to just one is a great disservice and looks more like a religious propaganda: if I do it this way the others should do the same disregarding.

Oh and btw OOP can be cache friendly as well. Nobody forces one to organize internal data representation in any particular way.


> Oh and btw OOP can be cache friendly as well. Nobody forces one to organize internal data representation in any particular way.

This is part of the argument of the article, that although nothing forces you to, there is a culture that strongly suggests a way of doing things.


>"there is a culture that strongly suggests a way of doing things"

Sorry but I would not take "strong suggestions" coming from prophets with vested interests for granted. Or if said culture comes from generic ignorance / lack of knowledge.


Isn’t the middle ground writing custom allocators ? Allocators allocate blocks continuously and developers keep developing with standard OOP?


How you allocate data is only half of the solution. The other half is also organising access patterns.

In games, for example, DOD requires you to break up a hypothetical Update method into multiple methods that get called at different times (first do all the collision for all objects, then do all movement for all objects, then all rendering, etc). If you skip this step, you get the neatly organised memory but the same random access as before.

Changing the data structure to the way presented in the article without changing the way you access it might even degrade your performance compared to what it was with normal OOP memory organisation.


To echo your point which is why ECS isn’t data-oriented by default. Merely stuffing component data into flat arrays is not enough. You need as best as possible to have those arrays organised to fit access patterns. Which turns out is quite tricky. For example organising arrays by entity archetype (shape of entity from its components) or some other form of grouping.


I've never heard this, but it seems so obvious now!

Just to be sure, can I get an example?

Is it like: we want to group a spaceship's fuel tank and engine together, because theyll be accessed at the same time?


This is going to be a long post. :)

This is partly why a lot of ECS demos have a lot of homogeneous elements (they share all components in common). For example particle systems have long been written in a data oriented manner when running on the CPU. So if you implement it in the ECS style you can just run through the arrays in order and its all good. Or Unity's city sim example. But games tend to have much more heterogeneous entities (they share less or few components in common).

The most obvious example I can think of to dispel the myth of ECS's inherent DoDness is an ECS wherein each component storage is a linked list with each element individually allocated. Even iterating through the homogeneous entity example is likely to be extremely slow in comparison to flat arrays. So there is nothing about the pattern that demands it be implemented in a data-oriented manner.

But back to a more heterogeneous example. I'm going to try to explain it generally because I think a worked version would be enormous and maybe cloud things more? Typically component storage is indexed by the entity ID. You want to look up the component in the storage associated with a particular ID. If all your storages are flat arrays where the entity ID is just an index into the array the more heterogeneous your entities the more gaps you will have to iterate over and correspondingly more memory your game will take up. This isn't great for cache locality or memory usage and we have to iterate over every entity for all systems to find the valid ones.

So the next step uses a dense array and a secondary backing array that is indexed by the entity id. So we can keep our components packed nicely but still look them up easily. Instead of iterating over all the entities for every system we can find the shortest component storage for the set of components the system uses and iterate directly over that and lookup the other components in their storages by the current entity ID. Now we iterate over potentially many fewer entities but essentially do a random lookup into the other component storages for each one. So we're introducing cache misses for the benefit of less things to iterate over.

So what we want is the benefits of blazing through arrays without the downsides of them being pretty sparse and ideally minimizing cache misses. Which is why the concept of an Archetype was invented. If we keep our components in flat arrays but crucially change our storage so we're not keeping flat arrays of every component but keeping separate component storages for each archetype of entity we have right now.

Going from:

AAAAAAAAAA BBBBBBBBBB CCCCCCCCCC

To:

(ABC) A B C

(AB) AAA BBB

(AC) AAAAA CCCCC

(C) CCCCC

If we have a system that just iterates C's it can find all the archetype storages and iterate straight through the C array for them one by one. So ideally we only pay a cache miss when we change archetype, have good cache locality and are iterating the minimum set. Similarly a system that uses components A and C will only iterate the archetype storage of ABC and AC and blaze straight through the A and C arrays of each. Same deal.

This comes at a cost of making adding and removing components from an entity more expensive.

We're also ignoring interacting with other components or the world and how that might work. For example we might want to do damage to another entity entirely. Or we might want to look up the properties of the piece of ground we're stood on. So there is a whole other layer of places we can ruin all this good work by wanting to access stuff pretty randomly. Relationships in games tend to be spatial and stuff tends to move around so it's hard to see a general case solution to the problem.

Then there is other axis to think on like ease of creating the game, how flexible it is to change the game, iteration speed, designer friendliness and so on. Rarely IME has the gameplay code itself been the bottleneck outside of stupid mistakes.

In games this level of optimization is really great when you do have a big mostly homogenous set of things. Then it's well worth the time to structure your data for efficient memory access. City Sims, games like Factorio and so on are classic examples.


Thank you for the beautiful explanation! I understand now, and am kind of excited to try this out in my next gamejam.


Before you do I’d add the implementing this isn’t necessarily straightforward and that your game might not benefit from anything more snazzy than sparse arrays. It’s definitely intellectually satisfying though.

For example if your game has only tens or hundreds of entities a sparse array approach might work fine even if they’re very heterogeneous. There simply isn’t enough to iterate over to matter. That said the overall performance difference between the two at that level is unlikely to be much but the archetype approach introduces a lot of complexity.


Imagine if compilers could detect a usescase for custom allocation by flag and speed up the OOP mess


What I wish to know is how apply this in a normal CRUD-like scenario against a db.

It truly will help for, like, a shopping cart app?


I always hated OOP. thank god for this.


Me too. Lot's of superfluous concepts called "abstractions" to make simple things complex.

Interestingly all discussions about other paradigms on HN end very fast in "Oh, I can solve this somehow with [put in your favorite design pattern in here]" (Big deal, both paradigms are obviously turing complete). "Design Pattern" was for me always short for "Complex workaround for a problem you would not have if you wouldn't use object oriented paradigm".

But more generally, OOP is for developers whose mental model of the world is categorizing things into object hierarchies. For them it is the most intuitive approach to model the world.

For me this is as counter-intuitive as it can be. My mental model of the world just does not work like that.


Looks like Fortran programmers have been doing data-oriented design since forever.


[flagged]


It's a meme from older media for instance "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb"


Do you know where that tradition in older media originated from? I always wondered why books would have two titles.


There's some good information about that here:

https://en.wikipedia.org/wiki/Subtitle_(titling)


You have an article that falls under some broad category and an issue that the article aims to address. There are not too many ways to put these together.

One of them is to use an established, recognisable pattern that makes an article look like it's from a newspaper or a book. I'm no expert in this area, but my suspicion is that recognisable title patterns bring higher viewership (or at least writers think so).

Another thing with recognisable patterns is that they are being more firmly imprinted in the back of your head the more you see them. Perhaps it's not the views the author have pursued, but a mere usage of whatever seemingly suitable pattern popped out of his head first.

My vomiting reflex kicks in when I see repeating title patterns too frequently, too. They make for an impression that the articles are shallow. It's up to you to use this as an indicator. I'm, for instance, rather fine with occasional false-positives.


I think it's user submitted vs mods editing the title


I see I misunderstood the question


Why are we always so dogmatic?

Struct of Arrays is a great pattern if you need the speed and have enough items to make it worth it. Otherwise just use OOP. Are you going to make an array of singleton data? No, of course not.

>How many places in the code did you have only one of something?

One socket, one host, one pool, and assuming a single player game, there are a lot of ones...

Just relax and use the strengths of both patterns.


> Just relax and use the strengths of both patterns.

That's exactly what he says in the conclusion...


Agreed. I'm saying we as in the HN comments and to some degree programmers in general.


I still firmly believe the compiler should transpose your code from array-of-structs to struct-of-arrays, and optimise for temporal and spacial cache utilisation. After all, it is the tool that knows every CPU arch on the planet and can run millions of code transformations per second.

This could be achieved with C/C++ compiler directives, then pass some PGO.


Rearranging memory accesses is very difficult due to usually not having enough info to do the alias analysis. Also, C/C++ compilers really don't know that much about their target architectures, especially not their memory performance - they barely do scheduling anymore since it's better to let the CPU reorder it.


Transposing the data structures is trivial, but transposing the code that access them might be impossible: inside your methods you're still doing random access. You'd have to not only break your "Update" method into many, but also call those broken parts separately.


Many of the claimed advantages of "data-oriented design" and especially drawbacks of OOP in this article have nothing to do with data-oriented design or OOP... They are symptoms of bad design.

For instance, good design, and in fact a key concept of OOP, will get you modularity. Especially "When you write code specifically to transform data, you end up with small functions, with very few dependencies on other parts of the code" is the objective of good OO design! Likewise for testing, I don't see how OOP is a problem if you've designed your system well and kept objects nicely encapsulated (likewise a badly-designed system will always be problematic).

Cache utilization is neither here nor there. It all boils down to memory allocation and, again, choice of objects.

Do data-oriented design to work out your dataflow, and then apply OO principles. A key issue is always to choose your objects 'wisely' and a data-oriented analysis will help towards that goal.


This is one reason that I’m always skeptical of dogma. My only tool is a hammer…

But that said, many new techniques (I clearly remember when OOP was the “new paradigm”) can offer radical improvements to the status quo.

I use OOP all the time, but I also don’t do data processing engines. Most of what I do is GUI and/or device control/communication. For these types of things, OOP is still very much a mainstay, and probably always will be.

I won’t sniff at DDD, but get really tired of being lectured about the way I do things; just because it doesn’t involve “buzzword du jour.”

BTW: I remember a guy at a conference, telling me about exactly this problem with OOP, in the late 1980s. That was back when OOP was a relatively new kid on the block, and he was arguing against using it.

He was correct, but it did not apply, in my use case. The massive improvement in complexity management and quality, offered by OOP, far outstripped any speed advantages of classic procedural programming (which is what he was arguing for).


> This is one reason that I’m always skeptical of dogma

Yes, this is because too often people follow buzzwords instead of trying to understand the concepts.

The issues OO design aims to solve are still valid and still the issue people want to solve. Encapsulation, single responsibility principle, even the concept of object/class (i.e. interacting with data through a set of specific methods) are good design principles, there is no reason to throw them away.

It makes sense to use data-oriented design in applications that are data processing intensive, and this is nothing new, but that is orthogonal with using OO principles.

Likewise, immutable data can have benefits. This does not mean the concepts above are no longer valid. It means using them with immutable data.


Personally, I have come to the conclusion that object-level encapsulation is not a good design principle, but rather an antipattern because it complects data with code [1].

OOP tries to manage global mutable state by partitioning and encapsulating it into objects. However, the only way to prevent two unrelated objects from manipulating the same part of the state is to create a tree, i.e., a strict hierarchy between all objects in the system. This combination of data and code in a strict hierarchy means that all data access patterns are baked into this dependency tree, making later, unforeseen changes to the software extremely difficult without taking shortcuts in the dependency tree or having to refactor the entire application.

If, on the other hand, you treat your data as just data, preferably flat and immutable, and keep the code that acts on it separate, then you won't run into this problem. You will be able to change the parts of the code that act on the data structures independently.

[1] See Rich Hickey, Simple Made Easy: https://www.youtube.com/watch?v=oytL881p-nQ


One of the aims of OO design is specifically to make future and unforeseen changes easier by hiding data and enforcing interfaces.

If you treat your data "as just data" with free for all access you revert to the mess that led to the emergence of OO design. This is much worse for maintainability.

> keep the code that acts on it separate, then you won't run into this problem. You will be able to change the parts of the code that act on the data structures independently

That's exactly what should happen with OO (that's one of the purposes of encapsulation).


OO aims at making future and unforeseen changes easier, I just don't think it achieves its aim.

I am also not sure that a discussion on HN is the best medium to discuss the problem in depth. Nevertheless, I will try to give a practical example:

Suppose you have parts (part_id, description, quantity_on_hand) and suppliers (supplier_id, name). Also, each part is manufactured by multiple suppliers and each supplier manufactures multiple parts. How do you model this? Do you let parts reference suppliers or suppliers reference parts, or do both reference each other? Or do you define a third class PartsSuppliers that manages the references? There is no formal method in OO that tells you what is a sound design choice and what is not. Let's say you chose the latter option (PartsSuppliers) and you need to write a method that computes statistics about the parts. Where do you place this method? You need to add it to PartsSuppliers, because no one else is allowed to have private references to Parts, otherwise you would break PartsSuppliers' encapsulation. No matter what design decisions you make in OOP, you will always have to make a tradeoff between encapsulation of state and extensibility.


> There is no formal method in OO that tells you what is a sound design choice and what is not.

System architecture is hard. OO design is a set of principles that helps you design a system better by making it easier to maintain and modify. It does not tell you how you should model your objects. To come up with a good model is usually not straightforward.

In fact your example is not an issue specific to OO design. This is a general issue of relationships ('many to many') and there are a number of design principles to help (see database design principles as that's a typical scenario in databases).

> Where do you place this method? You need to add it to PartsSuppliers, because no one else is allowed to have private references to Partts

That's not true, but as you say, this is too vast a discussion.


I try to put it another way, because it is a much more general problem of OOP: If you have an object A that references an object B and object B references C and A wants to know something about C, we always have to go through B, regardless of whether we are actually interested in B or not. This is because C is part of the private state of B, and if A had a direct reference to C it could mutate C and would therefore break B's encapsulation.

> In fact your example is not an issue specific to OO design.

This is a specific problem with nested data structures. OO design leads to nested data structures to allow encapsulation. The relational answer to this problem would be to break everything up into flat sets of tuples that can be joined as needed, but if everything is just flat data, you can't have encapsulation.


But this is not a good example.

Either this should be modelled so that A can directly reference C to start with, or indeed A has to go through B but can do so to get a reference to C (this does not break encapsulation in itself, it depends on the specific relationships)

It's impossible to avoid nested data structures because these are simply the natural consequence of the system's complexity. For instance, a book is made of sheets, pages, chapters, sentences, illustrations, etc. entities with nested relationships.


> Either this should be modelled so that A can directly reference C to start with, or indeed A has to go through B but can do so to get a reference to C (this does not break encapsulation in itself, it depends on the specific relationships)

If A has a reference to C, B cannot, and if B has a reference to C, A cannot. To keep encapsulation intact, references between objects must form a tree. OOP depends on the partitioning of mutable state for maintainability reasons. There is no other way to keep this partitioning intact than to have a tree of objects.

> It's impossible to avoid nested data structures because these are simply the natural consequence of the system's complexity.

This is a purely conceptual view. But you don't have to query it that way at a logical level, or organize it that way at a physical level via memory references. HN comments, for example, are conceptually contained in their parent comments and also conceptually contained in the users who wrote them. In a relational database, on the other hand, the tuples would be contained only in their relations/tables. A query could then associate comments with users at runtime, but comments can be queried on their own, since they are not encapsulated in anything. In OO design, on the other hand, all access paths are baked into the object trees, making later, unforeseen changes to the software extremely difficult without taking shortcuts in the tree and thus destroying the encapsulation.


> If A has a reference to C, B cannot, and if B has a reference to C, A cannot.

That's not what encapsulation means, and again, it's up to you to come up with a model that makes sense.

> In a relational database, on the other hand, the tuples would be contained only in their relations/tables. A query could then associate comments with users at runtime, but comments can be queried on their own

Sure but they are still 'nested' by way of relationship. Of course you don't have to physically nest structures within structures. Both make valid OO implementations. Nothing in OO prevents you from querying comments on their own.


> That's not what encapsulation means

Then what does encapsulation mean? If encapsulation means that an object protects all of its private state behind methods, then another object must not be able to manipulate that private state. So if an object A holds a reference to another object B, then B's state becomes part of A's state and only A must be able to do things that change B.


OOP only strives to encapsulate private state (though don’t forget that too strict rules never make sense. In the end, every design pattern needs an escape hatch). All the methods of the object should modify these in a way that upholds the class invariants. In your example, B can easily be part of the “public state” of the object, or we can make even more gradual distinctions, like only B’s identity is relevant for A’s state. For example, if A only needs B as optional cache, its modification or even removal will not be a problem.


This concept of "public state" makes no sense to me. If the behavior of A, i.e. the implementation of A, depends on B, then changing the state of B will also change the behavior of A as a side effect. Otherwise, if B were "public state", then B would be nothing more than a glorified global variable and you would have exactly the free for all access that OOP is supposed to prevent.

So the only way to prevent this is that each object must have only one parent in the object tree, which coordinates all modifications to that object.

Regarding escape hatches: Yes, it's great to have them, but it's not so great when they are used all the time, either by accident because it's so easy to break OO rules, or on purpose because the object tree gets in the way when new requirements need to be implemented. Let's be honest here: The more mature an OO-designed project becomes, the more shortcuts there will be and the cross-connections aka "escape hatches" will turn the object tree into an object spaghetti.


You're not wrong about what you describe.

But maintaining relational state between things is really annoying in general. You've mentioned one of the trickiest jobs there is, it's really hard even with purpose built databases.

Doing it in an imperative environment is just hard.


> However, the only way to prevent two unrelated objects from manipulating the same part of the state is to create a tree, i.e., a strict hierarchy between all objects in the system.

I see this claim from time to time, and perhaps it's true in typical Java, C++ or even Python, but I don't think I've ever seen anything close to a proof of it. I suspect it is false, since an interface boundary that leaks no state is possible to implement and can give freedom to the designer regarding internal state.

> If, on the other hand, you treat your data as just data, preferably flat and immutable, and keep the code that acts on it separate, then you won't run into this problem. You will be able to change the parts of the code that act on the data structures independently.

This claim is true only insofar as the underlying state being tracked doesn't change much through the lifecycle of the program in development, which is a reasonably good assumption for a game, but not for most other applications.

The problem is that "data" by definition does not capture all of its own invariants. I have personally witnessed long-lived codebases suffer from brittleness when multiple areas of code must read from and manipulate the same underlying data structure. Inevitably, some programmer on the team forgets one of the invariants, since they aren't specified in code (which would make it OO), and then we have a production bug. The solution is then usually to add another "if" statement somewhere. The solution thus makes the code harder to understand and thereby increases the likelihood of this kind of bug related to this particular structure recurring.


> [...] an interface boundary that leaks no state is possible to implement and can give freedom to the designer regarding internal state.

Your suspicion is justified. As long as the objects only send immutable messages to each other, encapsulation remains intact. But then you have something closer to an actor system than what people typically think of when they say OOP. Once you pass references to mutable objects, all bets are off.

The ability to specify which states are allowed and which are not, is not a special feature of OOP. In the functional and relational paradigms, there are types and constraints that specify in a declarative way what states should be possible. Types and constraints are enforced by the runtime and are not based on (leaky) encapsulation.


> But then you have something closer to an actor system than what people typically think of when they say OOP.

This is fair. I tend to think of OO per Alan Kay's description, i.e. message passing, encapsulation and extreme late-binding.[0] That does look closer to actors or even FP than it does the so-called OOP languages, which is what most might think.

> The ability to specify which states are allowed and which are not, is not a special feature of OOP. In the functional and relational paradigms, there are types and constraints that specify in a declarative way what states should be possible.

Types and constraints are still code that are tightly bound to the data they describe (since in a real sense, they are executed, either at compile time or runtime, and the most powerful type systems are Turing complete). The data oriented advocates sometimes forget this when decrying code tightly coupled to data. As types and constraints are added to specify only the proper behavior, the data gains more signal and less noise, and thus becomes more like information and less like data.

[0]http://www.purl.org/stefan_ram/pub/doc_kay_oop_en


Agreed, but, for me, I am not really a data programmer. I tend to work in state and identity (classic UI and communications). For these, that characteristic is actually an advantage.

Nowadays, centralized data processing is the big deal for software engineering (as it was, fifty years ago). I'm just a humble app developer, and I'll license stuff that real data programmers do, if I need it.


Exactly, the biggest single advantage of OO is the ability to create and manipulate insanely complicated data structures simply and easily.

Because it's a way to define a whole bunch of methods AND assign all aspects of thinking about the state of the data to those methods to the author... it's a brilliant tool for designing some insane complexity. Which is exactly what you need for MVC paradigms.

There are multiple tools and they are for different jobs, imagine that.


I've programmed in OOP for many years and recently have been using a functional language at work. I don't see a major in advantage to coupling data and methods because something similar can be accomplished based on how you organize the code.

If you put all the functions in one file and the data structures in another file, things become quite a bit more flexible to extension and composition.


Probably true for data processing programming.

The thing about UI programming (and a lot of comms programming, too), is that the ability to completely abstract all the factors relevant to an entity is pretty important.

Good UI is very complicated. As someone above pointed out, it can be insanely complicated. Comms programming is basically the same thing.

It's pretty much a requirement to be able to abstract all the particulars of an interface behind an identity/state wall. I used to write procedural programs that ran an entire GUI (and device control), and OO was like a gift from Santa. Applications that used to take weeks, and were bug farms, suddenly took days, and hardly had any bugs.

We can, arguably, do without inheritance (the part of OO that everyone loses their bottle over), but that encapsulation is pretty damn important. This means that we can add a button to an interface with a few drags on an interface description screen, and a copy/paste (if we're eschewing inheritance) of a simple class or struct definition.

I like inheritance, though. It helps me to refactor the code down to a fairly manageable (and debuggable) scope, speeds up development, and helps me to keep quality very high.

It's like everything. We need to be good at what we do, and have a good command of our tools; whatever they are. FP is certainly not for the faint of heart.

The tech industry is absolutely obsessed with taking almost completely unskilled, junior, developers, and getting them to produce release-quality code, unassisted by experienced architects. Vast resources have gone into trying to make this work. It is not new. It has been going on for as long as I have been in the field.

And it always ends in tears.


One of the cool things in C is the space it leaves for the compiler to perform optimisations.

Beyond what is observable, C compilers have almost free reins to do whatever they want, so long as the observable things are kept the same (output/memory address values, etc...).

Data oriented design was a smart-kid anti-pattern thing back then, but I wonder if compilers have evolved enough so that it is useless nowadays? It is by all measures a useless optimization, since it can be placed hidden from the observable object properties. (i.e. who cares if it is an array of structs or a struct of arrays, so long as the vec3 is still a { x, y, z }?)


One of the core principles (truths?) behind data oriented design is that compilers can never and will never be able to compensate for unoptimized data layouts.


This is true today, but I don't see that it must always be so. I'm not a compiler researcher, but isn't there a realistic chance that good things could happen if this became a major research focus?

Somewhat related: I believe Jonathan Blow's currently unreleased Jai programming language is meant to do some interesting new things in this area, enabling (or rather, greatly simplifying) automatic transformations relating to memory layouts.


Jon's language is not about the compiler being smart and fixing your code. It's about letting the programmer write good code without the compiler getting in the way.

I'm not aware of anything in his language that would make it easy for the compiler to automatically decide a better memory layout.

It's rather the opposite: preventing the compiler from doing such things.

For example, struct layout is always the order that you declare. There's no "feature" in his language that would re-order struct fields to minimize padding or anything like that.


All good points. To put that another way then, it's aiming to introduce language features which make it much easier for the programmer to transform their data layouts.

gnuvince mentioned that the Zig language seems to be making inroads here too: https://ziglang.org/documentation/master/std/#std;MultiArray...


The important point about Zig's implementation is that this is all userland code. No extra syntax was added to the language to enable it, which is a great display of how flexible comptime can be.


Very cool, Zig must have some powerful compile-time introspection facilities.

> this is all userland code

I imagine you meant library code here?


I meant to say code that isn't part of the compiler or special cased in any way. So, yes, code that could very well be a normal third party library.


The compiler can automate some things (take a look at Zig's MultiArrayList for example), but ultimately the programmer must understand the data in their application, how it needs to be transformed, and how to lay it out to be processed efficiently by the hardware.

The compiler is a tool, you can set the field to help it do its job, but it's no magic wand, it cannot think and understand your application: only you can do that.


> take a look at Zig's MultiArrayList for example

Thanks, I'd not seen that before, very neat. [0] I thought it was just going to be an option to be row-major or column-major, but no: Instead of storing a single list of items, MultiArrayList stores separate lists for each field of the struct.

Could this be done in C++, perhaps with template metaprogramming?

> The compiler is a tool, you can set the field to help it do its job, but it's no magic wand

A paraphrasing of a familiar Mike Acton quote. Fittingly it was mentioned in a blog post that mentions the Jai language. [1][2] I find it frustratingly insubstantial. Given we all presumably accept Rice's theorem and how it applies to compiler optimization, it's a rather empty quip, especially considering we're discussing future possibilities.

Today's compilers are capable of some impressive optimizations. It doesn't do to just dismiss the idea that tomorrow's compilers might be able to do significantly more with memory layouts.

Consider if, decades ago, a sceptic of optimizing compilers had said: Compilers are useful tools, but they are not magic wands, and cannot achieve highly optimized instruction-selection and register-allocation. It cannot think and understand your application: only you can do that. The baseless suggestion that efficient register-allocation is impossible in the absence of a strong AI, would seem laughable today.

> it cannot think and understand your application: only you can do that.

Today's compilers are not strong AIs, sure enough, but that doesn't speak to the point here.

Optimizers and static-analysis tools are capable of reasoning about program behaviour. Again, you haven't justified dismissing the suggestion that future optimizing compilers might be much more sophisticated at this kind of transformation. Perhaps I'm an optimist for arguing for the sufficiently smart compiler, but it doesn't strike me as beyond the realm of possibility.

Perhaps the closer answer is to adjust (or indeed replace) our languages to be more amenable to memory layout transformations than C/C++. This would presumably be comparatively easy to implement.

[0] https://ziglang.org/documentation/master/std/#std;MultiArray...

[1] https://blog.royalsloth.eu/posts/the-compiler-will-optimize-...

[2] https://news.ycombinator.com/item?id=27010965


> But isn't there a realistic chance that good things could happen if this became a major research focus?

It has already been a major research focus the past 50 years or so. We have made great strides, yes, but we are still very far from where it needs to be to compete with lower level languages.


It's not going to happen for C.


but they do, from the simplest automatic structure padding to the more complicated flow analysis and packing/vectorization


How could a compiler possibly optimize for cache hits in an array of structures? The only way it could do so is by disobeying the programmers intention with the described memory layout.


How can you? Do you know of a CPU that has specific instructions to handle cache allocations?


If you have an array of small structures, all you have to do is access all of them in one go with a for() loop and you're already taking advantage of the cache. This is enforced in ECS frameworks, btw: it's how a "System" is implemented.

However, if your code has random access, there's no point in using arrays of structures. The compiler would have to modify the order of execution of instructions inside your method to take advantage of how the data is laid out.


Sure there are great benefits in sequential access, some of these can even be calculated to some extent.

However your reply does not answer the question.

Do you know of any CPU that has specific instructions to handle cache? How can you be sure that you are gaming cache lines when even the mnemonics are mostly virtualised through all the pipeline and jump/memory pattern predictions?


My reply is answering the first question, How can you?. Not the second.


You simply take advantage of the fact that an entire cache line is fetched when reading from memory, and keep data that is frequently used together close to each other in memory.


The compiler isn't even smart enough to switch for-loops that iterating over a grid when they would greatly improve cache usage. Swapping the lines `for (x=0; x<width; x++)` and `for (y=0; y<height; y++)` can give insane speedups on typical modern hardware.


Both GCC and clang are able to do some level of loop interchange optimization


They do have that optimization in some cases; it helps on SPECint.

It often requires UB to optimize well, which many people aren't into letting it do.


Are you sure about that? It must depend on the compiler.


Last I checked, field declaration order still mattered to structure size and cache usage because the defacto packing and padding rules preserve the order. I admit that I haven't done any C optimization lately so I am curious. It is possible to make this optimization, but it may disrupt codebases that take shortcuts based on an assumed order. And it would be especially difficult to do the analysis of what should take precedence in the cache.

Could you link an example of compilers accommodating these optimizations?


The clang documentation on vectorisation has a few examples

https://llvm.org/docs/Vectorizers.html#slp-vectorizer

Cache precedence and cache line optimisations are black magic, either you know specifically the cpu that you are targeting, or rely on hopium techniques like cache oblivious algorithms that try to reap some benefits.

The baseline is to measure, always, before and after optimisation(s). These "Data oriented design" approaches are very hard to measure and change rapidly because they have a profound impact on a codebase, rarely ever change "just one thing" and they err to the less intuitive and less readable side.


It's not that simple. Like I said in another comment, organising the data is only half the battle. This optimisation also depends on your code accessing the data in an optimal way.

If the code itself is not organised for DOD, then the "optimal" organisation is what we currently have. A per-entity Update method that's accessing different "kinds of data" will perform worse with DOD-organised data.

This is why we have an architectural pattern that automates all that, called ECS.


No, compilers cannot do that. Who says that it is not the case that in one cpp file the data access pattern is completely different than in another cpp file? Hence, it may be that for one ccp file the most efficient way would be to have an array of structs and for another cpp file the most efficient way would be a struct of arrays. The compiler cannot possibly know this so it has to follow the data layout that the programmer has specified.


What optimizations would hypothetically be allowed and what optimizations the compiler actually has enough information to be allowed to perform are very different.

A compiler is hypothetically allowed to convert array-of-struct into struct-of-array, but to actually be allowed to do that it would need to understand every single use of pointers in the entire program. That is extremely challenging, if not impossible.


The great mistake is to believe there is something useful called "Object-oriented", or "Data-oriented", or "Functional", or what-have-you-oriented programming.

Carpenters don't learn "hammer-oriented" building; or "saw-oriented", "screwdriver-oriented", "chisel-oriented", or "glue-oriented". They learn to use hammers, saws, screwdrivers, chisels, and glue, and use them all at different times, sometimes one more than others. Machinists don't learn "lathe-oriented", "drill-oriented", "mold-oriented", or "welding-oriented" fabrication. They do all or any of them strictly according to what they are making, and what they are making it out of.

It is as utterly stupid to design a whole language around one of them as it would be for a carpenter to try to run a screwdrivering business. A language is useful exactly to the degree that it enables building whatever you might be called upon to build.

That is not to say any generally useful language is equally good for any purpose. Wood is better for some products, metal for others. A metal violin would be weird, a wooden gun action would be stupid. But violins often have metal bits and guns often have wood stocks.


This is just a rephrasing of a boilerplate response to any criticism at all: "there is a time/place for OOP!" followed by precisely zero articulation about when one should use OOP over a rival paradigm. It is one of the weakest arguments I can conceive of (perhaps following "but that's not true OOP" with no assertion about what characterizes OOP, or a definition of OOP that is only esoteric).

Why do you feel the need to talk to programmers using a construction metaphor rather than talking about programming paradigms directly? If OOP has merit, shouldn't it be relatively easy to articulate as to programmers? What is OOP good at? When should I use it rather than a data-oriented approach? Why talk in tenuous metaphor?


You just read that the whole notion of object-oriented (or anything-oriented) programming has no merit, and are now asking me for examples where it has merit.

You are usually constrained to a particular language or, sometimes, mix of languages. You have access to a set of language features. Use them in any mix that solves the problem. If runtime polymorphism would be useful, use it. If arrays would be useful, use them. If recursion would be useful, use it. If compile-time type matching would be useful, use it.


Thank you, I always appreciate a thoughtful critique.


I have to disagree with the sentiment you express here.

Your metaphor is bit confused. You are comparing individuals tools to methodologies. For example, object oriented programming is a methodology which involves the use of a variety of the tools like inheritance, composition, interfaces, dependency injection, etc. So, object-oriented programming does not correspond to always using a hammer in every situation. It rather corresponds to particular theory about which tools to use in which situations.

There have been a variety of methodologies of framing buildings. There was traditional framing, then balloon framing, and now platform framing. These different approach to framing are what would actually corresponding to things like procedural programming, objected programming, and functional programming, not the individual tools.

It seems to me that it is pretty clear that there is something usefully termed data-oriented programming or functional programming. Neither is appropriate to all scenarios, but that doesn't prevent them from being useful concepts.

Crucially, though, your thrust seems to be that we should use the right approach for the task at hand. That's true to a degree. But what remains is that some paradigms are either bad ideas or frequently get applied in domains they aren't suited to (I'm looking at you OO!).

The consequence is that I don't see it as particular helpful to say "use the right tool for the job." Yes, it is true, you should use the right tool for the job. But I think its also true that some paradigms are better then others, just as some framing techniques are improvements on anothers.


If you start building a skyscraper by nailing up 2x4s, you won't get very far. But that says nothing about the merits of nailing up 2x4s.

If you start a project thinking, "I'm going to make an object-oriented program", or "I'm going to make a data-oriented program", you are just working with one hand tied behind your back. A mature programmer mixes elements freely with no thought to "orientation".

If you demand simple advice: Don't Be Stupid. It's not helpful, but neither is TFA. At least mine is correct.


> A mature programmer mixes elements freely with no thought to "orientation".

Seriously, you advocate freely mixing radically different architectural approaches? Well... I don't want to share a codebase with you.


I advocate freely using language features anywhere they are useful.

Architecture has nothing to do with seminar-peddlers' "paradigms". Architecture is about organization. For each problem, some architecture is uniquely suited to addressing it. If you constrain yourself to this or that paradigm, you commit at the outset to failing to arrive at the optimal organization for the actual problem.


> Architecture has nothing to do with seminar-peddlers' "paradigms".

No, these paradigms all have ideas about how software should be organized. OO thinks you should organize everything into interacting objects. Functional thinks you should organize everything into pure functions.


OO doesn't think. Functional doesn't think. Paradigms don't have ideas about anything.

Advocates for those, and others, characteristically think badly. People who build things that work do not limit themselves with pat labels. Nature has no respect for pat labels.


Obviously, when I say OO thinks X, I mean that people working within the paradigm think that.

Given that you've decided to pretend that I meant something stupid instead, you aren't a person worth talking to.


You have presented your credentials.


There are different methods of framing a house and roof and builders do tend to specialize in just one. Balloon framing is different than platform framing which is different than mortise and tenon framing. To learn a new system, a carpenter would have to learn a whole new system of rules, measurements, suppliers, attachment methods, and seasonal tolerances. Here's an overview of some of these techniques: https://www.hometips.com/diy-how-to/house-framing.html

It's actually pretty analogous to programming methods.


Different methods of framing correspond to different materials available. Long 2x4s are too expensive now for balloon framing to be cost-effective, just as posts and beams, and lath and plaster, became too expensive (for different reasons), earlier. None is abstractly better than the other, even for identical end products.

In some places bricks are, or once were, cheaper than lumber. A carpenter who can't build a house with brick walls, or can't add a platform-framed room to one, is no carpenter at all.


The weird thing about this article's category of OOP criticism is that it makes all sorts of assumptions about class design and memory allocation which are absolutely not intrinsic to OOP.

Like, there are plenty of good points to be made here without discarding the entire idea of object-oriented design. It really just needs a little tweak.


I don't have a substantive response except to thank you for giving me a helpful analogy in future discussions on the topic.


> The great mistake is to believe there is something useful called … "Functional", or what-have-you-oriented programming.

It’s a remarkably substantial mistake — “the great mistake”, even — to blithely discard a concept so fundamental to computer science and programming language theory that we literally could not progress in the field without it.

You may as well be an earth-bound carpenter denying the existence of “gravity-oriented design”.


Yet, we did progress without it.

Learning a program organization style is helpful for beginners, if the toy problems they get are more easily solved using the style. But you progress, ultimately, by transcending it. A mature programmer mixes elements freely without labels.

Crawling-oriented locomotion is fine for babies. Adults walk, run, swim, drive, pilot, ride, sometimes even crawl. Learning to run does not make you worse at crawling, or make you run when you should crawl.


> Yet, we did progress without it.

No, we didn't, and claiming so merely demonstrates your profound ignorance regarding the most basic fundamentals of our field.


We must be working in wholly different fields. I make software that, you know, works.


Says the carpenter to the architect.


Peddle your snake oil where it sells.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: