It looks as far from a data oriented language as it can be. It's yet another algol, specifically javascript/typescript variation. How does it help with joins, group bys, pivots, explodes, etc? No mention of those, just some javascript like operations.
No mention of data layout or storage, so it's just an in-memory pointer chasing thing. If I load it with a million records is it going to run out of memory?
It mentions null as the billion dollar mistake. The only problem with null is how SQL treats it in joins and rollups, otherwise there is no problem with it. Another sign that the language is targetting the wrong audience.
The syntax seems to lean on JavaScript (C-like with data literals, long keywords, dot notation for de-referencing etc.), the types are put at the front of an expression (like C, Java) as opposed to the back (Pascal, Go, Rust, Typescript).
The type system seems to take inspiration from Typescript. With structural typing, optionality and so on.
The query language looks very similar as Clojure's list comprehensions (for macro).
But I think the interesting stuff is things like this:
Ballerina is quite interesting and its flexible structural typing seems to make it a great fit for solutions that weave together multiple services.
One thing that the article does not highlight is the interop story [1]. It is currently hosted on JVM, but that is supposedly intended to be an implementation detail. So unlike Kotlin explicit ffi bindings are needed. This may create some friction for early adopters, but I guess is necessary if the type system is substantially different from the java type system. This will hopefully also make ballerina applications more portable if alternative implementations emerge in future.
In C++ I very often find myself wanting to have the ability to define a type inline - e.g. my ideal syntax would look like
int z = 1;
auto x = [=] { .foo = z, .bar = {4,5,6}, .baz = "" };
Combined with the ability to introspect things at compile-time available in c++20, I think that this would get 99% there for common data-oriented design designs, without the need for a completely new programming language such as Ballerina.
I imagine this creates a std::tuple, which causes a lot of template instantiations and is the main offender for debug compile-times though. But yes, that's pretty much what I'm looking for :-)
It is not an std::tuple, but the moral equivalent. I haven't tested the compile time beyond trivial tests, but it not going to be great. The $ macro itself expands to a non-trivial amount of code.
It is called data-oriented, but most part is about typing, a clever mixture of static and dynamic. Note that also a language like C# has a mixture of dynamic and static types. At least you can define all sort of values without having to specify an explicit type. I do not know how far C# goes into defining methods and functions over those types.
What are the semantics of assignment? Is is creating a reference or does it 'clone' the value. In some languages, it is always a reference, except if you explicitly create a clone. In some languages with immutable data structures, it is always a copy (or at least, it behaves like that).
I am asking this, because I have found this a common source of errors, which are often hard to debug, because at someplace in the program a reference is created where a copy was intended and that in some other part of the program, a change is being made with unexpected side effects.
In some relational database it is possible to define existence constraints and define cascading deletes. For example, if I remove an author from a database, it also causes to remove all books written by that author. Or you could define the constraint that an author cannot be removed, if there is still a book written by that author in the database. I feel that a 'data oriented' language should also focus on these aspects of data.
I started to develop a (still dynamically typed) data oriented language myself. It is still very premature without any implementation, but a description can be found at https://github.com/FransFaase/DataLang
edit: I should have read further. The more interesting parts happen very late in the article. A lot of what is described is still possible (including the querying) in C#, but other things only in very awkward ways.
Original post follows:
It seems this is almost possible in C#. Missing is the required property proposal [0], so currently the choice is between having everything be optional, or using explicit constructors (which loses most of what is wanted here). But that does not seem like a huge difference:
public record struct Book
{
public string Author { get; set; }
}
public record struct Member
{
public string FirstName { get; set; }
public List<Book> Books { get; set; }
}
This gets brought up a lot. Data orientation was a term coined in two different programming communities: game engines and information systems. The terms have overlap and some of the implications are similar. But there are also significant differences as the game programming term focuses on data layout and performance, while the information system term is more about simplicity and leverage. They both are divergences specifically from mainstream OOP.
The game programming flavour of data-oriented design is often described as an optimisation technique, but I feel this is a misconception. Performance and maintainability are not mutually exclusive.
My understanding based on talks by Mike Acton and others is that DOD is biased against adding unnecessary abstraction, which in turn makes it easy to understand (i.e., maintain) the software. If you can reason about what your program is doing - which bytes are read and which are written, adding things on top might be unnecessary and hurt readability. You don't necessarily need to care about optimising for cache hits or data alignment; if your software needs to 1. read a "name" from the network, 2. make sure it's lowercase and 3. write it to the database, then just do that. There's no need to have a "name" object with constructors, getters and setters; there's no need to split your functionality into 1000 "reusable" little modules. The only time you'd use a class is when it helps prevent resource leak, so you'd use RAII when opening files in C++ to make sure they close when leaving the scope.
This pales in comparison to Pydantic in terms elegance, flexibility and batteries being included. But pydantic is quite slow.
I discovered pydantic-core [1] which is a backend of sorts for pydantic written in Rust. It is yet to be integrated into pydantic, but once the integration is done, the author expects a potential 10x speed improvement.
This is a great idea (with implementing them myself resulting in limited success). I do wonder if such languages are better served as either embedded libraries in a target language (like lua runtime) or as a front-end for cross compiling to a target language? Ie better served as DSLs?
A hosted language that has the idea of symbiotically interweaving into existing ecosystems.
I've done a lot of clojure in my past, and while I love the language and it's heigh ceiling, I've found the floor not quite low enough.
I do indeed think that there is room to bring a more data driven approach to existing ecosystems, especially when it comes to interoperability between them. Something living in the area between SQLite, protobuf and a hash table.
And I don't think there is actually much needed to achieve this.
* A dead simple binary graph representation format.
* An easy to implement immutable datastructure to represent that graph.
* A simple conjunctive query DSL to that translates between the data graph and the tree semantics (structs, maps, arrays, e.t.c) of the host language. Implicitly providing join and filter operations (similar to GraphQL).
These tree declarative primitives would probably be enough to pull any host languages 80% over to being data-driven.
All the other fragments of relational languages, like recursive queries in Datalog or negation and optionals in SQL, would be recoverable by using the capabilities of the host language.
No mention of data layout or storage, so it's just an in-memory pointer chasing thing. If I load it with a million records is it going to run out of memory?
It mentions null as the billion dollar mistake. The only problem with null is how SQL treats it in joins and rollups, otherwise there is no problem with it. Another sign that the language is targetting the wrong audience.