I agree with your colleague, and I insist on pushing my car everywhere because I...

autarch · on Sept 6, 2022

The "R" stands for "relations", as in "relations", which is a mathematical concept. SQL calls a "relation" a "table". The "relational" is RDBMS has nothing to do with relationships.

But I still agree that OP's colleague is an idiot.

hu3 · on Sept 6, 2022

That might have been how it started (https://www.ibm.com/ibm/history/ibm100/us/en/icons/reldb/).

But it's definitely not what it means for the great majority of contemporary contexts.

Relations in modern RDBMS are usually aliases to foreign keys unless otherwise specified.

goto11 · on Sept 6, 2022

To be really pedantic, tables are relations but a join between two tables are also a relation. Base tables, queries and views are all relations and therefore interchangeable in relational algebra.

mkeedlinger · on Sept 6, 2022

I agree that using foreign key constraints is the right choice, but the tone of your comment comes off as very condescending and dismissive, and I don't like it.

Spivak · on Sept 6, 2022

Eh, there are a lot of people who don't like using FK constraints, calling them all idiots is just bad faith and ignores the reasons they did it. Just because you can enforce a constraint at a specific layer doesn't mean you have to. DB people love shoving all sorts of application logic into the DB and there are good arguments to do it as well as downsides. App people sometimes prefer to do everything in the app and just let the DB be a dumb data store and there are good arguments for that too. But it depends isn't a hot take.

FlyingSnake · on Sept 6, 2022

If you're using an RDBMS and not using FK or other relational constraints, how do you plan to maintain referential integrity?

bdavisx · on Sept 6, 2022

I'm not arguing one way or the other wrt to FK, I generally use them.

You can maintain the integrity through code though.

pif · on Sept 6, 2022

> DB people love shoving all sorts of application logic

I agree that application logic goes into the application, but data integrity is NOT application logic.

Spivak · on Sept 6, 2022

Sure it is! As a thought experiment consider evil dba whose job it is to crash your application or make it do wrong things just by manipulating the data in the DB but following the constraints. Totally trivial, right? So data integrity is at all times the responsibility of both the app and the db. And the set of constraints you can enforce with the app will always be a superset of what can be enforced by the db. And for some cases (usually when the db is private to the application) it's easier to build that logic into the app and for other cases (multiple apps sharing a db) it's easier to build that logic into the db.

CRConrad · on Sept 9, 2022

> So data integrity is at all times the responsibility of both the app and the db.

OK, so what you're saying is that data integrity is never the responsibility of just the app, right?

Spivak · on Sept 9, 2022

I get where you're leading but I don't think it follows. FK constraints are an optional nicety with tradeoffs rather than something fundamental -- cascade and set null are footguns (and business logic which shouldn't live in the db), and no action exists to catch bugs in your code. If your app isn't getting errors from the db saying it's trying to delete stuff with references then you could, in theory, turn off the constraints without any need to change the code. And if a relational database simply didn't have FK constraints at all, such as vitess/planetscale, you can still maintain data integrity.

What I've done in the past with prod dbs that lack FK constraints is add them back for development, testing, and CI.

jonnycomputer · on Sept 6, 2022

Mathematically a relation is a set of tuples; which is exactly what a table is.

jongjong · on Sept 6, 2022

I think the author is talking about 'foreign key constraints' - You could have foreign keys without enforcing a constraint.

Personally, I don't use foreign key constraints because:

1. It makes schema migrations and other data-management operations more difficult.

2. On insertion, the database needs to perform an additional check to verify that the record exists at the foreign key; this carries a performance cost; IMO, this is something which should be enforced at the application layer anyway.

3. It makes it more difficult to scale the database later because you can't separate tables onto 2 different hosts if one table references another using a foreign key.

BTW, about #3, the same argument can be made against using table joins. Once you start using foreign keys or table joins, you will be forced to run those two tables on the same host in the foreseeable future; it's very difficult, error-prone and time consuming to migrate away from such architecture if you have a lot of data in a live environment. Personally I prefer to design all my tables and front end applications to not rely on foreign keys or table joins. There is a good reason why databases which are focused on scalability (like MongoDB) do not support foreign keys or joins (or at least they try to avoid them).

I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular; each one only refers to a single kind of resource; this helps to simplify caching and real-time updates; it also uses fewer resources on the server side and I find that it makes the front-end code more maintainable. Also, I like to design my front ends to mirror the natural separation of resources within the database. When the user wants to open up a related record, they need to click on a link (the foreign key ID/UUID is used to construct the link to the related resource); this loads up the other record as a separate step. This creates a very smooth (and fast) user experience - I also like it because this approach does not overload the user with information; collections of items don't show much details, on the other hand, individual resources may show a lot of detail.

The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view... Sometimes the reason why they want to do that is because they didn't design their tables correctly; maybe the tables which they use to generate list views don't contain enough columns/detail to be useful on their own so they feel forced to do joins. I find that drawing ER diagrams helps a lot with that. It's very important to get the cardinality of relationships between the different tables exactly right. Also, I find it very helpful to represent any many-to-many relation between two tables as a distinct table.

P5fRxh5kUvp2th · on Sept 6, 2022

On point 3 it should be noted that it's almost always a mistake to optimize for scale at the start of a projects lifetime. There will be exceptions, but in general this is true.

You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view

This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.

jongjong · on Sept 6, 2022

This perspective only makes sense of you assume that designing a scalable system requires MORE work. My experience is that designing a scalable system requires LESS work if you and your team have the right skillset.

In most cases, I can build a scalable system faster than I can build a non-scalable one with the same feature set.

It would make no sense for me to implement the lesser alternative if it requires the same or more work.

P5fRxh5kUvp2th · on Sept 6, 2022

I'm always leery of people who claim to be senior and have never spent 3-5 years on the same system, and this attitude is why.

It takes at least that long to really start surfacing the design errors that were made that kills productivity long-term in a system. As a result I very often will claim the difference between a skilled and unskilled developer is the ability for a system they built to be reasonable after 5+ years without everyone involved wanting to rebuild the entire thing from scratch.

IOW, this is a fundamental difference in perspective. I was speaking to creating systems that are maintainable over the long haul by actively trying to control complexity. You're speaking speed of initial development.

Rich Hickey went on a small rant in one of his videos (I think the one describing datomic, but could be wrong) in which he pointed out many things that are fast initially will hurt long-term. I agree with that sentiment wholeheartedly.

The fact that you called the less complex alternative the "lesser" alternative speaks volumes. It honestly feels like the whole "mongodb is webscale" devbro culture rearing its ugly head.

jongjong · on Sept 6, 2022

I tend to prefer combining data at the last moment on the client side rather than having it pre-combined on the server side (I prefer REST philosophy over GraphQL). It's probably because I'm web-application focused and so scalability and concurrency is far more important to me than raw execution time. Maybe if I was a data scientist or embedded systems developer, I would care more about execution time. I've met people like that. But IMO performant scripts tend to be the result of more optimizations which makes them harder to maintain as the underlying engines or hardware changes.

P5fRxh5kUvp2th · on Sept 6, 2022

This has nothing to do with raw execution time.

pif · on Sept 6, 2022

> I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular

It's clear you have never work with a lot of data.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen

I hate this illusion that web programming is the whole of software development.

jongjong · on Sept 6, 2022

> It's clear you have never work with a lot of data.

Sure, I only wrote an open source distributed pub/sub system with channel-based sharding which has been used by thousands of companies to support hundreds of thousands of concurrent users, but I guess 'lots of data' is a relative term.

P5fRxh5kUvp2th · on Sept 6, 2022

That has nothing to do with data or data modeling.

jongjong · on Sept 6, 2022

Well that was just my hobby and side-gig... As part of my day jobs, I also worked on many projects with different databases including MySQL, Postgres, SQLite, MongoDB. I also implemented a side project (a distributed financial transaction processing system) using RethinkDB with per-table sharding and replication which runs on Kubernetes with statefulsets for persistence with automatic deployment and autoscaling and automatic database shard re-balancing with high availability with eventual consistency; I used a 2-phase commit algorithm for certain operations to achieve reliability in the event of write failure; as not to rely on atomic database transactions. I also did a course on relational database modeling at university (focused on ER diagrams and database normalization). I worked in the blockchain sector. I wrote a stateful, quantum-resistant blockchain from scratch including the cryptographic signature algorithm which uses an improved Lamport OTS variant suggested by Ralf Merkle and which uses a Merkle Signature Tree for key reuse and I contributed to the front end too. I also wrote a deterministic, fork-resistant, idempotent, heterogeneous multi-chain, chain-to-chain decentralized exchange. I also lead a team which wrote a P2P networking library with decentralized routing and efficient propagation of messages to peers belonging to the same subnets - Nodes in the network organized themselves into an unstructured, partial mesh topology with peer shuffling to avoid eclipse attacks but still retained the ability to form subnets based on the features they supported. But still, "a lot of data" a relative term.

P5fRxh5kUvp2th · on Sept 7, 2022

None of that is about data, it's about distributed computing.

No one is saying you're not a smart guy with skills, just that you're obviously not familiar with working with lots of data.

wahnfrieden · on Sept 6, 2022

postgres has a shitload of useful features that are unrelated to relations