Hacker News new | past | comments | ask | show | jobs | submit login
Beyond SQL: A relational database for modern applications (fauna.com)
59 points by freels on Aug 22, 2023 | hide | past | favorite | 57 comments



I buy the need for something like FQL.

But I don’t buy that this should be connected to a proprietary service. Or that the alternative query language should be a proprietary language. For something long-term or critical, it seems like a pretty large business risk. We don’t have to use any imagination to see how this plays out — look at Oracle. Though it could be worse — imagine you build a successful business on it and they go belly up in ten years.

I think I’d like to see one of the various “XQL”s out there emerge as the modern query language and start getting adopted as a native alternative in existing engines.


This feels like a promising angle for pursuing alternatives: https://substrait.io/ - "Cross-Language Serialization for Relational Algebra"


One could implement a FQL -> SQL compiler and then it could be used anywhere. The ability though to safely encode logic in SQL in a single round trip would be really nice. FQL could be a better way of doing "ORM". Although you can find similar tools for SQL that help map joined results into arrays, etc, but don't obscure the SQL (for example Jet for Golang).


I have to disagree with the following paragraph:

“SQL is particularly inflexible in terms of control over response: A query result is always a set of tuples. A query is incapable of returning a rich structure that is aligned with how the application needs to consume the result. ORMs solve this partially at best at the cost of hiding the capabilities of the underlying database. They also obscure the structure of generated queries, making it difficult to understand the resulting performance implications.”

The ORMs or later mentioned GraphQL are not the only approaches to solving object-relational impedance mismatch.

SQL is perfectly capable of serializing sets of tuples to XML (part of the SQL standard), and most SQL RDBMS implementations now support working with JSON data.

Serializing using SQL to XML and in the past couple of years to JSON, and deserializing the said XML/JSON to an object model in the programming language of choice is something I’ve seen used fairly often in my career.

Heck, I’ve even seen entire business logic for a very complex system implemented via thousands of stored procedures that always returned XML. Prior to 2010, this was not the blasphemy it is made to be today…

So to reiterate my point, SQL is not inherently as inflexible as it was made to be here, thus neither ORM nor GraphQL are a necessity for dealing with SQL output inflexibility (and both can be very useful tools, as always, completely depending on the context).


I'm clearly biased, but at least in my experience, while this is technically true, you're still dealing with XML (and now JSON) shoehorned into a tuple-based context. In other words, there is still a (lossy) translation layer, it just happens to be in the RDBMS rather than in-app.

Fauna's advantage here is that this way of structuring queries is deeply integrated with the language (and underlying wire protocol) itself. For example, Fauna's response format supports returning independently iterable result sets (supported by cursor-based pagination under the hood), allowing you to lazily populate the result graph in your app based on further user interaction.


> In other words, there is still a (lossy) translation layer, it just happens to be in the RDBMS rather than in-app.

It's not lossy if your application can guarantee a json <-> datatype roundtrip and the json is validated with jsonschema (generated by your application)

In Rust it's something like this

https://serde.rs/ to do the data type <-> json mapping

https://docs.rs/schemars/latest/schemars/ to generate jsonschema from your types

https://github.com/supabase/pg_jsonschema to validate jsonschema in your database (postgres). with this setup it's interesting (but not required) to also use https://docs.rs/jsonschema/latest/jsonschema/ to validate the schema in your application


I agree with the sibling; the problem is that requires that you shoehorn all your rich data types to JSON.


> SQL query performance is opaque and dependent on an optimizer to make the right decision. The developer has imprecise control and a given query pattern's plan of execution is subject to change at unpredictable times.

This is really hilarious framing to me.

The entire point is to make the whole process opaque to the developer (unless they insist upon explaining a query plan). SQL is about maximizing productivity and focusing on the information of the business. Being able to write a SELECT statement that has never been seen by any other computer on earth and have it execute ideally 95%+ of the time is pretty close to pure magic.

Why would you reset 20+ years worth of experience on an existing SQL query planner/optimizer/et. al.? Wouldn't you rather build & ship a working product?


Sql is one of the coolest languages ever developed and I hate the resistance to it. It's basically applying relations over (multi)sets, and set theory can be at the root of basically all today's mathematics if you squint or translate hard enough. So much of programming is taking data from one place and putting it in another, having a consistent language to do so such as ANSI sql is so powerful.

Until cloud formation/terraform/infrastructure as code came around it was basically the only declarative language that most developers would come across (unless you count markup), and I think that was the cause of some of the resistance.


> Sql is one of the coolest languages ever developed and I hate the resistance to it.

I agree, but the resistance, I feel, is due to really poor tooling.

No compilation step (so, how do you know you have an error? You run it). Run your query on test which has mostly live data, everything works, run it on production and suddenly that one query has abysmal performance and it's hard to determine that it's because of some missing index. The query explainer or describer is often understood by devs because they spend so little time with SQL itself, due to using ORMs.

There's no debugger. Hell, sometimes even the printf-equivalent is a poor substitute. You cannot step through a large SP, you cannot put in breakpoints, you cannot add watches, etc.

Unfortunately, the "tooling" for most devs are ORMs, which makes SQL even more opaque and more hard to debug, and reduces the pressure for popular DBMSs to develop decent debuggers.

I feel that if teams/projects insisted on "Only parameterized SQL queries in the program is allowed", SQL would actually get less maligned as people got more familiar with it.


Declarative languages are abound, except maybe not as sophisticated as SQL. Any math expression is declarative. A regular expression is declarative. Grammars are declarative. But although the declarative way is inherently simpler, it is also inherently limited. If it the only way, sooner or later users will bump into these limitations.

When a developer first get that vision of a beatiful declarative notation to handle his case, he often tries to make it the only way to use the system. But a more flexible way to design a system is to always allow to supply an imperative handler (a callback, a subclass, an interface, etc.), and then present the elegant declarative option as such a handler, built-in, but replaceable, one of many.

With SQL the situation is more complex; it is not the elegance of a declarative notation, but also the complexity of the underlying task. The idea was to give a kind of relational calculus plus make it distributed, atomic, durable, and, most of all, multi-user with an illusion that each user is the sole client of the system. It is not only that the user is not interested in setting record locks manually: even if he is, there must be no such option. It is very much like an operating system running multiple programs where each program runs under an illusion it is the only one. I would say that it is the requirement of independent concurrency that made SQL what it is.

But although this is an important use case, it is not the only one. Sometimes there is no concurrency or all of it is under control. E.g. it can be a multi-threaded program that works with a complex model of something. Can we model that something relationally? I would love to. But in our case there is no independent concurrency and no ad-hoc queries. Given that simplification, do we still need SQL? Could we maybe arrange the same relational operations differently? Maybe imperatively? Could we gain something as a result? At the very least these are interesting questions.


That seems super interesting. Do you have any book/article recommendations that dive into this mindset?


The set theory? Look up relational algebra. These discussions get murky because people always conflate SQL the language with the underlying relational algebra, so it’s near impossible to criticize the language alone — like exactly in this case. It also gets conflated with the RDBMS engine, which doesn’t help either.

The article specifically calls out issues with the language, and specifically seeks to maintain the algebra with a different language… and somehow we’re back to talking about how sql is great because the algebra is great.

Codd created something beautiful; ibm created something significantly less so.


> Codd created something beautiful; ibm created something significantly less so.

IBM made it practical to apply these beautiful ideas to boring-ass business.

Without IBM, we wouldn't be talking about SQL today.


One of my side ideas is to write a postgres plugin offering an alternative syntax for queries, but one that requires you to name the index you use (or to explicitly mention a table scan).

I do understand the value of SQL + the planner for adhoc querying. But so many times I find myself reworking SQL to hint at the planner to use certain indexes, or to add "spurious" filters to make sure an index is used (spurious for application logic reasons). For applications with a relatively low cardinality of queries, some extra tedium might be worth stronger performance property guarantees.

Added advantage of the tedium is it would make a lot of "accidentally quadratic" stuff much clearer. There is no magic in the planner after all!

We can write inline assembler in C or Rust, it feels like it would make sense to offer something similar in SQL. Big problem is how to offer something that isn't too tedious.

Anyways FQL's index model seems to align with my idea, just unsure if I want the rest of it.


If the query planner doesn’t want to use an index, it’s either for a good reason, or your table statistics are woefully out of date.


This is a thing that people say, but this is incorrect.

Classic example: Multitenant DB. I have a user table. I have a document table. I have a client table. documents and clients are key'd to user. In particular, there's a property of the system that document.client.user == document.user.

I do a "naive" query to count the number of documents per client, like:

- select count(*), client_id from document where document.text ilike %searchstring% and client_id in (select id from client where user_id = 1) group by client_id;

Now, it turns out that since I have this multitenant data, I have indexes alongside things like (user_id, client_id) on document. I also want to support text search so I have an full-text-search index on (user_id, document.text).

Without properly "guiding" the filtering (usually re-repeating the "where user_id = 1" fragment on every joining table) I will not be able to take advantage of my FTS index, because Postgres decides that's not slow (and _doesn't know about the relation between document.client.user_id and client.use_id). It instead tries hard to implement this either by pulling all of the client documents into memory and then doing an ad-hoc search there, or doing a FTS across all tenants (because I have an index on document.text for cross-tenant searching in an admin), and then scanning through that and removing by client_id.

This is a query plan I would not write if I wrote it myself, because it would be "obviously wrong". It would be a lot of work and clearly incorrect for what I want.

Meanwhile people writing features like this add indexes specifically to support certain workflows, and are often blissfully unaware that those indexes are not being used when it makes the most sense. Of course you gotta check experimental data and measure etc etc. But I like the idea of doing more data design before you start having performance issues, and I think that explicit index usage in particular would be a huge benefit on that front.

(there is a "fix" for the original query, involving repeating the user_id filter at other levels)


If you run a prepared query a few times, postgres will switch to a generic query plan. Sometimes the generic plan is much worse. I've never been able to fix this by running ANALYZE. I have been able to fix this by adding completely new statistics or changing the query (e.g. adding an unnecessary sort or removing a filter that we can apply in the app).


MySQL has a "use index" hint. https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

I may have used it at most once. But I'm not mr big data over here.


Fauna is really cool technology, but I'll note that it lacks interactive transactions as a core design constraint. That does enable a bunch of nice things. However, that means users lose one of the most useful parts of transactional SQL -- simple, safe read/modify/writes.

The only way is to ship all of the "modify" logic into the database via FQL. Given how many times I've seen code that does this unsafely with SQL, probably safe to say that if the bar is raised to "rewrite the logic in a foreign query language" developers will opt for thoughts and prayers instead.


While, making it easy (and possible in the first place) to write procedural code directly in a transaction is a core part of FQL's design, it is certainly possible to implement safe read-modify-write via FQL, it's just less efficient.


This sales pitch sounds like they really just want to build a better ORM, but doesn't want to admit it, so they embark on the foolish task of reinventing SQL.

> ORMs solve this partially at best at the cost of hiding the capabilities of the underlying database.

This is because all the popular ORMs target SQL instead of a specific RDBMS technology, like Postgres or SQLite. That doesn't have to be the case.

> They also obscure the structure of generated queries, making it difficult to understand the resulting performance implications.

Not quite true, Django for example makes it easy to do raw queries, and every generated query has .query() and .explain() methods to analyze the resulting SQL. This has always been a lame excuse to dismiss ORMs.


Good luck though, you're going up against giants (as much as this article portrays SQL as outdated, anyone who tries to go against it is very much the underdog). It's hard to get people to buy in on a custom database stack that is made by only a single company. And the fact that this article reads like a one sided praise for Fauna doesn't inspire confidence either, I want to read some real negatives from people who have used the thing.


A new DB, with a new query language that's like "SQL done right"? This immediately reminded me of EdgeDB: https://edgedb.com/

Is there anyone here who knows enough about these two products to do a compare/contrast?


I tried to learn Fauna and FQL a couple of years ago; their documentation was hard to trudge through with few examples, and their pricing was hard to grasp.

I think they also recently nerfed the free tier.

I've been using CouchDB and Pocketbase.io much more successfully for my mini web apps.


Pocketbase seems like fun; I had not heard of it before. Separate, but related: whenever I contemplate transitioning from a relational DB to something like CouchDB, or a similar option, to simplify the initial setup and SQL-related hassles, I inevitably realize that, as appealing as it sounds and as much as I'd love to, the challenges and limitations down the road become more apparent and I stick with SQL. I know this is obvious, but the choice really does heavily depend on your specific use case. SQL makes some simple things hard, but the reverse is also true :( I wonder if we'll reach a stage where we can get the advantages of both ¯ \ _ ( ツ ) _ / ¯


Getting the best of both worlds is what we're trying to achieve with Fauna, at least for OLTP use-cases. I'd be curious to hear what challenges with you ran into with other NoSQL databases, though.


People re-inventing SQL reminds me of the same people that keep trying to re-write standard Staff Music Notation.

Yeah, it's not perfect, but the suggested replacements are marginal gains, if anything.

In order to replace SQL, you probably need something like a 10x improvement, so it sells like hotcakes. Nothing about those examples screams that sort of improvement.

I mean, cool they put some stuff on the left instead of the right. And I guess the select clause is in a little better order now. Yay.


I agree with what you say.

On the other hand, it shows how when something gets popular enough it gets stuck in place, unable to progress slowly towards better, because we keep calling it good enough and all marginal improvements that accumulate to something significant over time are rejected.

And this friction can be a significant problem, because it can cause systems to die overtime by being suddenly replaced, disrupted, from below by something that is a major improvement but completely incompatible with the status quo. Think of it as having a devastating earthquake after a long period of complete calm.

This is not good for anyone, we will be much better served by small gradual improvements to existing systems. This is how we came to be ourselves through evolution.


Additions and improvements to SQL standards happen all the time though. My favourite rdbms drops a new version every few months with exciting new features. This sounds more like the slow and steady improvement you speak of than all these proprietary newfangled languages that show up every few months, never to be heard from again.


Not quite. SQL and the web are examples of standards that evolve by accumulating cruft. They try to address more and more unaddressed edge cases by constantly adding new slightly different ways of doing the same thing without addressing the core problems that cause this complexity in the first place. Evolution is not about adding features alone. It’s about also removing and changing features. If you read carefully the linked site, it offers way, way more than a fancier way to do a SELECT.

The SQL standard is also adding features that go ignored about most SQL databases, because the client libraries won’t support to them and they’re out of sync with the server and so on and so on.

Entropy is eating SQL and the web.


Be that as it may, every time I see a new database product I open the page, and if it doesn't support SQL, more often than not I never think about it and don't think about it again. I am never going to invest in some proprietary query format for marginal gains over SQL, seeing as I'm yet to run into a situation where having to write SQL precludes solving a problem I had. I really don't want data stores to adopt the frenetic rate of change seen in other areas of the field (like JS, shudder).

I think Postgres does an excellent job staying up to date with ANSI SQL; all other implementations (other than sqlite) are largely irrelevant to me.


Consider the paradox. You don’t care about proprietary standards, but you use a database that stands mostly alone in implementing the standards you track, and also it has many proprietary extensions on its own.

I’m curious if PG supports heterogenous nested results (data trees) this one of the biggest omissions from the standard on all dbs I use to the point I can’t use this at all.


Of all the SQL flavors/derivatives, I like PRQL the most: https://prql-lang.org/

Previous HN discussion from a month ago: https://news.ycombinator.com/item?id=36866861


I wrote my first self taught LAMP based miniSaas a few years ago. It has 34 paying companies using it with about 400 users total daily.

I want to write another soon. A free webapp saas type for consumers let's say.

Is there a reason I should learn this product instead of just continuing with modern mysql methods that seem to work and seem secure enough?


If your current stack is working for you, that's great. The main advantages of Fauna vs say MySQL are going to be:

- Fauna is distributed and multi-region and therefore more resilient to hardware or regional outages (for example we barely noticed the last AWS us-east outage, except for the fact that it affected customer traffic to Fauna).

- You gain a lot of flexibility in terms of where and how you deploy your compute layer. Fauna works very well in concert with serverless platforms or edge-based compute like Cloudflare Workers. It's also possible to connect directly from your client/front-end, using Fauna to enforce end-user permissions.

- Even if you know SQL, it's worth checking out FQL. Simple queries in SQL are also easy in FQL, but more importantly, FQL gives you much greater control over the shape your query result, meaning you don't need an ORM to reconstruct your object graph. If you have ever used GraphQL, the experience is similar. Or you can see a few examples and comparisons with SQL on our FQL product page: https://fauna.com/fql


Is your primary objective to expand your experience and knowledge, or is it to ship product that delivers user value?


Obviously the answer is to deliver user value.... But at the same time, better performance is valuable to a user. So if this new, whatever it is I don't even know, will perform better at an atleast noticable metric, then it does deliver value to the user. If all it does is make MY life easier, then it's basically worthless to me IMO.


How much of a problem is performance right now, has anyone mentioned it to you? Has it gotten worse?

I often think about perf as thresholds. It's not totally accurate but if you have reasonable perf, generally going to lightning turbo / "Please the HN crowd because we hate bloat" isn't going to move the needle.


With 400 daily users in the CRM, mysql is not a problem at all. The only problem I have is loading huge tables sometimes without using pagination, but that's from a connection speed problem to the user not a mysql problem. Sometimes the table makes it towards 10+ megabytes and thats a bit goofy.

My question was more based on should I learn the new tech, or any new tech, if I'm now building something I hope will have atleast several thousands daily active users.

Deep down I know the answer. It's pretty obvious mysql won't have problems even for a million users for most types of sites.


If your high end user estimate is in the thousands, scaling is less of an issue. If users haven't mentioned frustration with latency, it's probably not a problem.

That said, you still have to feed your creativity and curiousity. I try to carve out time for "not totally necessary but skill growth and potentially impactful" time.


I happened to be sitting in front of a JavaScript book as you wrote that.

I finally opened it. I was always curious if I should learn js/node to replace PHP and maybe even get away from server side rendering in php. Right or wrong, from what I've read I think I can put that to rest for now. I see just as many issues with doing things in JS as PHP. Just spending a few hours of reading let my brain move on from that curiosity for now.

Good tip you gave. Thanks


Isn't there one of these every 6 months on here?

Dread it, run from it, SQL arrives all the same


I disagree. This is a hard problem that take time to solve. I hope the world do not stop SQL.


It always disappoints me when the example code (in the heading image) looks like it has a mistake:

SQL: WHERE category = "electronics"

FQL: where(.product.type == "Computers")

Also it changes case of USER to user between the two examples.

Minor mistakes like that hardly give one confidence in their product.


I have yet to come across a replacement for SQL that makes it easier, or more concise, while also remaining feature rich. Not knocking this, but SQL is a pretty solid and time-tested technology for querying structured data. Alas, if we don't experiment then I guess we wont get better.


I think like 85% of the pain points could be solved by (a) allowing statements to start with a `from` clause, (b) some kind of shorthand for field sets, and (c) having some kind of syntax for returning some things as nested rows/objects. Without thinking deeply at all:

from users u join purchases p on p.user_id = u.id select u{defaults}, u[p{defaults} as purchases] limit 100 [purchases: 10]

returning rows that make `purchases` into a row-nested collection of up to 10 items instead of creating that many more overall rows and repeating the same user info multiple times.


You can easily do nested rows/objects in sql.

E.g.in postgres:

SELECT u.*, ( SELECT JSON_AGG(p) FROM ( SELECT p.* FROM purchases p WHERE p.user_id = u.id LIMIT 10 ) p ) AS purchases FROM users u JOIN purchases p ON p.user_id = u.id GROUP BY u.id LIMIT 100;

Starting queries with from - what is the benefit of that beside personal preference?

And you can simply use a cte or a view instead of a field set.


I agree my off the cuff syntax is probably not great, but I do think there's benefit to a more native version of json_agg that doesn't return strings/json, but typed data.

The "start queries with from" is because you get easier and better auto complete and error detection for queries when the tables being used are declared. When you start typing "select some_field" there's basically no way to have good auto complete or correct until the from clause. IMO there are more philosophical reasons too.

Views and CTEs have their place, but also have their shortcomings. That I don't think this solves. But I also don't think they really fill the gap I'm thinking about either.


Look at Datomic, you can have Datalog, Pull based access and you can use programing languge functions directly in the query all in one.

Typical pattern us you find a list of ids and pull on them with a particular view that you want them in.


I’m having a hard time understanding this line:

“Developers cannot take advantage of the full power of the underlying database for fear that the complex, opaque nature of SQL query behavior will overwhelm their ability to understand how queries perform, and quickly address problems when they do come up.”

What does that mean? Developers are somehow sacrificing database performance (“full power”) because they’re too scared to write SQL?


Right? I’ve written tens of thousands of sql queries, some of which were thousands of lines long. I can’t remember ever feeling overwhelmed. It’s just a tool. If you are feeling overwhelmed by a tool, just practice more.


is it pronounced fackle? fuckle? fickle? feeckle?


Fecal. Like Sequal.


Thank you for stepping up while others stood back.


This bit:

> maladapted to modern cloud computing abstractions such as serverless, and is difficult to manage at scale.

Got me.

There’s little money in it but working in an infra role for a while now - I don’t really see rolling your own infra as becoming any easier which is so sad.


> Connection centric: SQL's session-based transaction model is maladapted to modern cloud computing abstractions such as serverless, and is difficult to manage at scale.

There's nothing connection-centric at all to the language.

> Inflexibility of result structure: SQL is particularly inflexible in terms of control over response: A query result is always a set of tuples. A query is incapable of returning a rich structure [...]

Many modern SQL-based RDBMSes provide JSON support for this. Some support row/record values, which is -in SQL- more natural than JSON.

> Complicated, irregular syntax:

Well, few like the SQL syntax -- there is that. I wouldn't say it's complicated syntax though.

> [...] and allows only limited forms of composition.

CTEs and subqueries are the main methods of composition in SQL.

> Opaque, unpredictable performance:

Yes, well, yes, of course, because SQL is declarative. That the programmer gets little control over the query planner is part of the point of SQL. The query language really needs to be like this. What could and should be done however is to have a) ways of addressing parts of the query, b) ways of specifying out of band (i.e., not in the query) hints or requirements for parts of the query's planning.

I really would like to be able to:

  - specify indexing for CTEs
  - be able to pass in hints from outside
    a query, like what table source to
    make the "first" in a query plan,
    what indices to use for specific
    table sources, where constraints, etc.
> Rigid tabular data model: Despite the theoretical adaptability of the relational model to many domains, the flat, uniform structure the traditional SQL database imposes on records is especially rigid, forcing a significant amount of upfront design.

I don't agree that this is a problem. You have to design your schema? The horrors.

> Introducing new query patterns and schema evolution is fraught with a significant amount of operational risk. Combined, these hinder iterative application development and risk locking in bad data model design decisions.

This is not exactly not true of alternatives to SQL-based RDBMSes...

TFA makes very strained arguments.

There are good arguments for a new language, but there's no need to make strained arguments along the way -- it detracts from TFA.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: