Ask HN: Do you use foreign keys in relational databases?

mtts · on Sept 6, 2022

Fear of RDBMSes is quite common. I used to suffer from it too. It’s just so annoying to have to switch your brain to a different programming paradigm every time you need to do something with the database that you start to make up all sorts of excuses as to why it’s really just better to “do it in the code”. Your coworkers argument about FKs making data migrations difficult is one of them.

Another classic is the “joins are slow” argument, which I believe goes back to a period in the late 1990s when in one, not highly regarded at the time, database, namely MySQL, they were indeed slow. But the reason “everyone” knew about this was precisely the oddness of this situation: in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway, or, god forbid, whatever you cobble together on your own.

There is, in my mind, only one valid reason to not use foreign keys in a database schema. If your database is mostly write only, the additional overhead of generating the indexes for the foreign keys may slow you down a little (for reading, these very same foreign keys in fact speed things up quite considerably). Even in such a case, however, I’d argue you’re doing it wrong and there should be a cache of some sort before things are written out in bulk to a properly setup RDBMS.

Semaphor · on Sept 6, 2022

> Much better than ORMs

I recently migrated to EntityFramework Core (from the non-core version) and I’m actually impressed. Most SQL is pretty much what I’d write by hand.

Now granted, if there are complex joins, subqueries and stuff, I don’t even try wrangling the ORM to somehow give me that output, but still. I feel more comfortable just using EF than I used to.

moonchrome · on Sept 6, 2022

My main problem with Entity Framework is the magic underneath.

Like simple operation

    x = Ef.Find(xid)
    x.Name = "something"
    y = Ef.Find(xid)

what is y.Name ? Even though you didn't save anything to the database yet ? And the second Find didn't actually refresh from the database ?

Oh and the random bugs where people improperly include related entities but it somehow ends up working because they are automatically added as you're firing off other related queries, until eventually it does not (usually in production only).

It's a really really complex system designed to look simple and pave over important details with "works most of the time" defaults.

fabian2k · on Sept 6, 2022

Once you move beyond trivial cases you really need to spend time understanding the principles behind the ORM you're using. They are always a very leaky abstraction, there is not really a way around that.

In this case the important part to know is that the DbContext represents the unit of work and "knows" Entities you previously queried on it. That's very useful, but also can hide bugs like you mentioned with the Includes. I do wish that you'd get more obvious errors if you forget an include, this can be really annoying to debug especially if you're new to EF Core. For read queries I mostly use Select instead of Include, which I find easier and more straightforward in most cases.

ORMs are really useful for making very common operations easy and for making stuff composable. They're also very complex and to make the best use of them you do need to understand both SQL and some basics on how your specific ORM generates this SQL.

moonchrome · on Sept 6, 2022

>They're also very complex and to make the best use of them you do need to understand both SQL and some basics on how your specific ORM generates this SQL.

I think the biggest pitfall is how it maps object model to SQL.

The thing people fear about SQL query generation - IMO it's a non issue - when you identify hotspots you write your query manually, tools for that are there, it's easy to do retroactively and >90% of the code won't be the critical path.

DbContext is basically shared mutable state between your entire execution scope, and worst of all it makes it non-obvious.

kd5bjo · on Sept 6, 2022

> Once you move beyond trivial cases you really need to spend time understanding the principles behind the ORM you're using. They are always a very leaky abstraction, there is not really a way around that.

This is why I avoid ORMs in favor of writing SQL queries manually: I only need to understand one complex system for non-trivial cases instead of two.

(To be fair, I haven’t done any database programming for a few years. ORMs may have significantly improved since I last looked at them.)

wruza · on Sept 6, 2022

They are always a very leaky abstraction, there is not really a way around that.

Editing in general is hard. E.g. if in a form you change a field that participates in some filter which generates a dataset to be used in that form, it creates an issue that a naive join now returns incorrect data (because the join condition itself was edited). Complex ORMs which help with that^ are not leaky abstractions, they just try to avoid mistakes an average programmer would do anyway in “trivial” SQL tasks without blinking once.

And yes, gp question about x vs y means that no thought of editing contexts was ever considered. Plain old fetch-store is too low-level and doesn’t represent a model that business logic thinks in.

^ Idk about EF in particular, just assuming

geraldwhen · on Sept 6, 2022

This is why I like TypeORM. There is no magic, and every command maps 1:1 with a database operation.

No weird caching, no auto saves. Just an object mapper that you can use when you want and ignore when you need to.

politician · on Sept 6, 2022

This example is incomplete. We need to see what the enclosing transaction scope looks like.

moonchrome · on Sept 6, 2022

That's sort of my point - when you see a random dbcontext read inside a function you have no idea what the fetch will actually do. It might just return an object that was already fetched elsewhere in the context and modified but not saved. It might return the first value. It will automatically plug related entities into navigation collections - even if they are queried completely independently.

fifticon · on Sept 6, 2022

I concur with these. I see colleagues who use EF as 'black magic', who postpone or fear looking into what is happening under the hood. Because they lack insight into what it really does, they regularly cause horrendous queries to happen. My pet peeve with EF LINQ is, that your c# is not really c#, so you may write queries that compile silently, but fail to execute on runtime because the C# cannot be translated to SQL.

Antshockey · on Sept 6, 2022

Funnily, one of my pet peeves is people worrying about the SQL Generated from EF Linq. If you care that much about it I think you should just be writing the SQL by hand.

pc86 · on Sept 6, 2022

I might even go a step farther - you shouldn't care. It's the equivalent of caring whether or not your generated HTML or compiled IL or Assembly "looks nice."

If the SQL is performant and it returns the expected data, that is good enough for 99.9% of cases.

fabian2k · on Sept 6, 2022

There are cases where you have to care, e.g. the difference between AsSingleQuery() and AsSplitQuery() in EF Core. This option only affects what kind of queries EF Core will create to perform the same job, but it can have pretty significant implications on performance in some cases, and it can affect the consistency guarantees you get for your results.

hackerfromthefu · on Sept 7, 2022

The abstraction is great, until it breaks.

In this case, with ORMS, even good ones, this happens often enough in production that to actually master the tool you do need to care.

tailspin2019 · on Sept 6, 2022

Another vote for EF Core here. It’s superb.

AdamN · on Sept 6, 2022

ORM is a very valuable tool and should be aggressively used. One can always step down to SQL as needed but otherwise, the ORM logic is easier to write and maintain.

swasheck · on Sept 6, 2022

the all-or-nothing approach is prevalent in both camps. i’ve worked in places where a straightforward optimization could not be implemented because it would require the developers to break from the orm-only standard they’d set.

i’ve also worked places where orm were held as such anathema that any orm proposal was dismissed out of hand without any sort of discussion.

pc86 · on Sept 6, 2022

As is almost always the case, the middle ground is better. In a few of the larger .NET projects I've worked on we would use EF until it became too much of a pain (or the business rules made the query unmanageable) and then someone would eventually spend a few days transitioning the query into a stored procedure + SQL functions, updating tests, etc. The biggest complaint I ever got from that was that it was impossible to tell what was going to be LINQ and what was going to be a stored procedure. While not an insignificant concern, it certainly beats "oh we can't optimize that 20-second long query because then we'd have to drop below the ORM and 'we don't do that here'" or "no you can't use any ORM even though it objectively eliminates a lot of boilerplate work."

hk1337 · on Sept 6, 2022

The all-or-nothing approach is what makes people hate ORMs and go anti-ORM.

CuriouslyC · on Sept 6, 2022

ORM logic is not easier to write and maintain unless you're bad at SQL and writing your queries without any tool support.

bluefirebrand · on Sept 6, 2022

Strongly agreed.

In my experience all using an ORM accomplishes is making sure the people on your team who are amazing with SQL write just as bad queries as those who suck at SQL.

rowanG077 · on Sept 6, 2022

Yep entity framework is truly amazing. If you have used that ORM you never go back. You still need to sometimes make your own query for perf or other needs. But it's quite rare in my experience.

Most of the time when I had performce issues it isn't EF. It's a missed index or higher level query issue.

sonthonax · on Sept 6, 2022

> Another classic is the “joins are slow” argument

The only person I knew who died on that hill would insist on doing two queries to the database, and then would insist on doing a client side cartesian join.

AdamN · on Sept 6, 2022

I remember getting beers with somebody in the aughts who claimed that he saw an entire website where the url was the key and the webpage was the value in an Oracle database. Any code was SQL operations inside the value field.

brightball · on Sept 6, 2022

I once had a coworker who dreamed of that exact setup.

boatsie · on Sept 6, 2022

Isn’t that effectively what a CMS is?

NoSorryCannot · on Sept 6, 2022

That's amazing!

10x-dev · on Sept 6, 2022

Are joins in a 5NF database now as fast as querying a denormalized database?

e12e · on Sept 6, 2022

Maybe the joins are faster? It's really hard to tell without more context/detail.

I think many were burned by mysql back in the day - trying to use sql as a document database - or using php frameworks that happily did a hundred queries pr page view.

As a general rule of thumb, for a REST app - I'd say the db should be normalized, and the cache layer(s) can handle the denormalization.

Ie when you get /page=1 varnish can spit out a response from ram (which if you squint, is a denormalized projection of your data), or it can go talk to your app, that talks to the db. And the latter is most likely fast enough (tm).

viraptor · on Sept 6, 2022

Maybe I'm missing some context, but isn't that true by definition even if the db does nothing special? You either spend time sending N queries and waiting for responses, or join and use one query. Given actually matching scenarios for both, the one with less communication overhead wins.

10x-dev · on Sept 6, 2022

In a normalized database that's true, but in a denormalized database, by definition, you get a third option, which is to have tables with redundant data that can be returned in a single query (as if it were pre-joined, I suppose).

goto11 · on Sept 6, 2022

Depends. Denormalized means the database contains redundant data. If a query have to scan 10x or 100x as many rows due to redundant data, it is obviously going to be slower. But it is hard to say anything general since denormalization will make some queries faster and other queries slower.

skyde · on Sept 7, 2022

with good index you will not scan more rows.

But each query will use a different copy of the same data instead of joining with the same copy.

Storing both copy in memory take more space so you can’t cache as much in memory.

I’m not talking redis or memcached but the page cache inside the sql engine.

skyde · on Sept 6, 2022

they always been faster! When you have 5NF, the database is smaller and all the row you join will be in memory in the SQL server PageCache.

While when using denormalized database, your read will have to go to the disk.

CuriouslyC · on Sept 6, 2022

Seq scans will be faster in a normal form database, if you're seq scanning then joining other tables on an index it might be faster. Otherwise the denormalized table will probably be faster.

lupire · on Sept 6, 2022

To be fair thata a reasonable approach if the database is at its monolithic scaling limit in CPU but not IO, while the clients can scale horizontally to more machines.

Unlikely in practice, though.

skyde · on Sept 7, 2022

oh so your talking about using the database as a file system and moving all the query logic in the client

tluyben2 · on Sept 6, 2022

We used to do large setups at companies for what was then called intra and extranets begin 00s. These were very read/write intensive as the staff and partner staff would be on there basically all the time during office hours and data was not great for caching as data changed a lot especially in some companies like large hospitals and universities. We used mysql (I cannot remember why) and we did a lot of performance testing at that time; we removed all joins which made everything a lot faster. This is no longer the case now but indeed many people still believe it ; not (only) because they saw or tried it back then, but also because it’s less strain on the brain to just do single table selects and use not FKs or joins.

function_seven · on Sept 6, 2022

Back in the day I was forced to ditch FKs in my MySQL application, because I needed a FULLTEXT index on one of my columns, and MySQL only supported that type of index on MyISAM tables (this was on 5.x or something). MyISAM didn't do foreign keys.

It was a pretty central table, and the inability to use FKs there kinda spread outward.

edmundsauto · on Sept 6, 2022

Did you consider making a 1-1 relationship on a new table that only had the FULLTEXT column? Curious how you evaluated the trade offs

function_seven · on Sept 6, 2022

I can't remember how much time I spent thinking about it, but if I were to reenact my state of mind at the time, I probably concluded something like, "Without transactions, I'll have to write more code to make sure the ID in both tables stays in sync, and I'll have to send 2 separate INSERTs (sequentially) for every record added, and if the first one fails, I need to handle that, and if the 2nd one fails, I need to handle that differently, and... fuck it. I'll just promise to be good and not use FKs"

Or something. I can't remember the details, but I was (and still am) very averse to complexity in my application code.

bartread · on Sept 6, 2022

> Another classic is the “joins are slow” argument

Along with the "indexes slow down INSERTs and UPDATEs" argument that you touch on. I mean, it is literally true that indexes make writes slightly slower, and an excessive quantity of indexes (which I have seen) can slow down writes enough to cause problems. But - in general - the slowdown is irrelevant compared with the overhead of querying a table that contains 2 billion rows using, oh, I don't know, a table scan because you don't have even a single index (I have also seen this).

vbezhenar · on Sept 6, 2022

One reason to avoid FK is when your database is partitioned to multiple servers, but that's obvious, I guess, and it's not really RDBMS anymore.

tailspin2019 · on Sept 6, 2022

> RDBMSes are highly optimized pieces of software

> Much better than ORMs

These two things are not mutually exclusive though right?

It’s entirely possible to have a lightweight and relatively transparent ORM which makes full use of the underlying RDBMS.

sverhagen · on Sept 6, 2022

Yeah, I was going to say something similar. But ORMs get blamed for obscuring what's going on, to the point that a developer may end up doing some sort of inefficient 1-to-n lookup that would've indeed been much better off as a SQL JOIN.

I use JPA/Hibernate professionally, as a decision maker, but I don't think I'm in either camp entirely. ORMs aren't a magic wand, but they do help you standardize the boilerplate that you'd end up with one way or the other, in most cases.

tailspin2019 · on Sept 6, 2022

I definitely see that, and ORMs (particularly older ones) have historically made it easy to shoot yourself in the foot.

But, everything is an abstraction, and I tend to think that if you use any abstraction, you need to have at least a little bit of knowledge about what’s happening in the layer beneath it.

So using an ORM will not be an optimal experience if you don’t know how the underlying RDBMS works.

And effectively using an RDBMS directly still requires a bit of knowledge about the layer below that level of abstraction too (eg how underlying query optimisation works etc).

It’s possible to implement both incorrectly and get bad results and the opposite is true too

ljm · on Sept 6, 2022

Agreed, and there's a lot you can gain from an ORM/query builder just in terms of ergonomics or niceness for the 80% use-case.

Doing intensive string manipulation to put your query together becomes painful, fast, especially when you're dealing with optional parts like ordering, limiting, filtering, pagination, etc. It's also incredibly easy to slip in an injection vulnerability as you do that (especially if you're new to programming).

Just don't use it as a crutch because the declarative nature of SQL is vastly more powerful than an imperative wrapper and you'll be at a loss for only knowing the conventions and opinions of your ORM of choice.

Gordonjcp · on Sept 6, 2022

> in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway,

ORMs are just a wrapper around RDBMSes. If your ORM is producing incredibly stupid SQL to query the DB with, you might want to check that you're not modelling your data in a stupid way.

I am by no means an expert, but in general I have found that if the ORM is doing something particularly crazy, it's because my underlying assumptions about the data model is wrong.

axylos · on Sept 6, 2022

> Your coworkers argument about FKs making data migrations difficult is one of them.

Got any arguments to back up this bald assertion?

In particular, I'd love to hear more about how to manage schema migrations on large tables with FK's without incurring lengthy locks or downtime.

Betting the answer is going to involve some variation on "well, don't do that" which is when I'll rest my case.

roguas · on Sept 6, 2022

There are tools for live migrations for most popular databases. Also a lot of Postgres DDL is very fast and/or capable of happening live.

radicality · on Sept 6, 2022

A lot of it depends on the use case. For example, Facebook - one of the largest (if not the largest) deployments of mysql does not allow any FK constrains. There’s multiple reasons, but one of those is better predictability of db operational perf - a row delete should delete just the row and not potentially trigger N cascading deletes.

pindab0ter · on Sept 6, 2022

I don't understand “a row delete should delete just the row and not potentially trigger N cascading deletes”. If you want that to not happen, then define that in the database definition. It sounds like you're saying that a core piece of functionality is somehow ‘wrong’, even though that same functionality can be used to make the desired bahviour for this exact use case explicit?

skyde · on Sept 6, 2022

facebook data model is a Graph where each row store one object “comment” or one association “comment is with post id” between objects .

They made an query and indexing system on top of it to make it fast called TAO.

Without it you need to send a distinct SQL query pet parent object to get list of associated child object which would be awfuly slow.

radicality · on Sept 6, 2022

Non-tao use cases of mysql at FB also cannot use FK constraints (or ‘triggers’).

skyde · on Sept 6, 2022

by "no FK contraints" do you mean "no join using index" or you simply mean foreign key violation is not checked.

goto11 · on Sept 6, 2022

Cascading deletes is a separate from FK constraints. You can have FK constraints without cascading deletes.

pharmakom · on Sept 6, 2022

How about when the ID in a FK column has been generated outside the RDBMS but the target of the ID has not been written yet?

ashkulz · on Sept 6, 2022

You can use DEFERRABLE INITIALLY DEFERRED constraints so that the check happens when the transaction is committed.

GoblinSlayer · on Sept 6, 2022

I assume the target is externally generated too, thus can be legitimately absent.

Cthulhu_ · on Sept 6, 2022

I had this when importing test data; I found it acceptable (since it was just in development) to temporarily turn off FK checking.

thatjoeoverthr · on Sept 6, 2022

Your database is is the state of your system. Guard it!

I just ran into severe data corruption at a large client because a programmer four years ago wrote an empty catch block. The system would open a transaction, hit a fault, roll back, then continue writing to the database as if it’s still in the context of the transaction.

I spent some time trying to pin down exactly what it did, and found that many writes went through because of a missing foreign key constraint.

In short: if a particular table of minor importance had a foreign key constraint, there would have been no damage whatsoever, because it would have faulted immediately after the rollback.

You can’t rig up a constraint against every dumb write. But you can rig them up against some of the dumb writes. And sometimes that’s enough.

UglyToad · on Sept 6, 2022

You're exactly right. This is the crux of it. In many domains (not every domain, but every one I've ever worked in) you can delete/rewrite/change the app code without much fuss but the data is vital.

For that reason having constraints enforced by the system that stores the data, external to the app code which developers will inevitably mess up, is so useful and important.

So many issues in my experience have been similar to the one you describe. If the right constraints had been present so much work and so many headaches could have been avoided.

I feel like people who claim they're not needed or not important or "the app code will do it" need to be wrangled into maintenance work of old systems for a year or two until they repent. Rather than boshing out an ill-considered prototype and then moving on before the bugs are discovered.

yuvalr1 · on Sept 6, 2022

Key statement: integrity of the state is much more important than some inconveniences here and there

chousuke · on Sept 6, 2022

I'd go so far as to say that integrity of state is a requirement to build robust software. If your state is ill-defined, it's pretty much impossible to write software that behaves correctly.

Sure, you can guard against some forms of bad data and fix it as it comes in (or abort with errors if your system can handle that), but in order to do anything interesting, you need to make assumptions about the data. the only operation you can perform without assumptions is the identity function, and most software requires much more than that.

spiffytech · on Sept 6, 2022

Data that's malformed at captured time is often indistinguishable from outright data loss.

Since it just gets worse the longer you accept corrupted data, this is a good justification for "crash early" programming, like DB enforcement of data integrity.

P5fRxh5kUvp2th · on Sept 6, 2022

a few weeks ago I had a discussion with a developer with 20+ years experience about why you shouldn't squelch DB errors and try to continue running.

Nothing is a hard and fast rule, but in this case the only way for an error to occur is if the query schema differed from the code select statement (column names changed). At that point wtf are you doing trying to keep running without errors, something is fundamentally mismatched between your application and the data.

pif · on Sept 6, 2022

I agree with your colleague, and I insist on pushing my car everywhere because I fear gas as it is flammable.

In other words, the world is full of idiots; and any time I start forgetting about it, I read something like your post and I get a wake-up call.

What does R stand for in RDBMS is you don't use foreign keys and joins?

Please, keep using your FKs, stay safe and don't mingle too much with idiots.

autarch · on Sept 6, 2022

The "R" stands for "relations", as in "relations", which is a mathematical concept. SQL calls a "relation" a "table". The "relational" is RDBMS has nothing to do with relationships.

But I still agree that OP's colleague is an idiot.

hu3 · on Sept 6, 2022

That might have been how it started (https://www.ibm.com/ibm/history/ibm100/us/en/icons/reldb/).

But it's definitely not what it means for the great majority of contemporary contexts.

Relations in modern RDBMS are usually aliases to foreign keys unless otherwise specified.

goto11 · on Sept 6, 2022

To be really pedantic, tables are relations but a join between two tables are also a relation. Base tables, queries and views are all relations and therefore interchangeable in relational algebra.

mkeedlinger · on Sept 6, 2022

I agree that using foreign key constraints is the right choice, but the tone of your comment comes off as very condescending and dismissive, and I don't like it.

Spivak · on Sept 6, 2022

Eh, there are a lot of people who don't like using FK constraints, calling them all idiots is just bad faith and ignores the reasons they did it. Just because you can enforce a constraint at a specific layer doesn't mean you have to. DB people love shoving all sorts of application logic into the DB and there are good arguments to do it as well as downsides. App people sometimes prefer to do everything in the app and just let the DB be a dumb data store and there are good arguments for that too. But it depends isn't a hot take.

FlyingSnake · on Sept 6, 2022

If you're using an RDBMS and not using FK or other relational constraints, how do you plan to maintain referential integrity?

bdavisx · on Sept 6, 2022

I'm not arguing one way or the other wrt to FK, I generally use them.

You can maintain the integrity through code though.

pif · on Sept 6, 2022

> DB people love shoving all sorts of application logic

I agree that application logic goes into the application, but data integrity is NOT application logic.

Spivak · on Sept 6, 2022

Sure it is! As a thought experiment consider evil dba whose job it is to crash your application or make it do wrong things just by manipulating the data in the DB but following the constraints. Totally trivial, right? So data integrity is at all times the responsibility of both the app and the db. And the set of constraints you can enforce with the app will always be a superset of what can be enforced by the db. And for some cases (usually when the db is private to the application) it's easier to build that logic into the app and for other cases (multiple apps sharing a db) it's easier to build that logic into the db.

CRConrad · on Sept 9, 2022

> So data integrity is at all times the responsibility of both the app and the db.

OK, so what you're saying is that data integrity is never the responsibility of just the app, right?

Spivak · on Sept 9, 2022

I get where you're leading but I don't think it follows. FK constraints are an optional nicety with tradeoffs rather than something fundamental -- cascade and set null are footguns (and business logic which shouldn't live in the db), and no action exists to catch bugs in your code. If your app isn't getting errors from the db saying it's trying to delete stuff with references then you could, in theory, turn off the constraints without any need to change the code. And if a relational database simply didn't have FK constraints at all, such as vitess/planetscale, you can still maintain data integrity.

What I've done in the past with prod dbs that lack FK constraints is add them back for development, testing, and CI.

jonnycomputer · on Sept 6, 2022

Mathematically a relation is a set of tuples; which is exactly what a table is.

jongjong · on Sept 6, 2022

I think the author is talking about 'foreign key constraints' - You could have foreign keys without enforcing a constraint.

Personally, I don't use foreign key constraints because:

1. It makes schema migrations and other data-management operations more difficult.

2. On insertion, the database needs to perform an additional check to verify that the record exists at the foreign key; this carries a performance cost; IMO, this is something which should be enforced at the application layer anyway.

3. It makes it more difficult to scale the database later because you can't separate tables onto 2 different hosts if one table references another using a foreign key.

BTW, about #3, the same argument can be made against using table joins. Once you start using foreign keys or table joins, you will be forced to run those two tables on the same host in the foreseeable future; it's very difficult, error-prone and time consuming to migrate away from such architecture if you have a lot of data in a live environment. Personally I prefer to design all my tables and front end applications to not rely on foreign keys or table joins. There is a good reason why databases which are focused on scalability (like MongoDB) do not support foreign keys or joins (or at least they try to avoid them).

I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular; each one only refers to a single kind of resource; this helps to simplify caching and real-time updates; it also uses fewer resources on the server side and I find that it makes the front-end code more maintainable. Also, I like to design my front ends to mirror the natural separation of resources within the database. When the user wants to open up a related record, they need to click on a link (the foreign key ID/UUID is used to construct the link to the related resource); this loads up the other record as a separate step. This creates a very smooth (and fast) user experience - I also like it because this approach does not overload the user with information; collections of items don't show much details, on the other hand, individual resources may show a lot of detail.

The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view... Sometimes the reason why they want to do that is because they didn't design their tables correctly; maybe the tables which they use to generate list views don't contain enough columns/detail to be useful on their own so they feel forced to do joins. I find that drawing ER diagrams helps a lot with that. It's very important to get the cardinality of relationships between the different tables exactly right. Also, I find it very helpful to represent any many-to-many relation between two tables as a distinct table.

P5fRxh5kUvp2th · on Sept 6, 2022

On point 3 it should be noted that it's almost always a mistake to optimize for scale at the start of a projects lifetime. There will be exceptions, but in general this is true.

You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view

This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.

jongjong · on Sept 6, 2022

This perspective only makes sense of you assume that designing a scalable system requires MORE work. My experience is that designing a scalable system requires LESS work if you and your team have the right skillset.

In most cases, I can build a scalable system faster than I can build a non-scalable one with the same feature set.

It would make no sense for me to implement the lesser alternative if it requires the same or more work.

P5fRxh5kUvp2th · on Sept 6, 2022

I'm always leery of people who claim to be senior and have never spent 3-5 years on the same system, and this attitude is why.

It takes at least that long to really start surfacing the design errors that were made that kills productivity long-term in a system. As a result I very often will claim the difference between a skilled and unskilled developer is the ability for a system they built to be reasonable after 5+ years without everyone involved wanting to rebuild the entire thing from scratch.

IOW, this is a fundamental difference in perspective. I was speaking to creating systems that are maintainable over the long haul by actively trying to control complexity. You're speaking speed of initial development.

Rich Hickey went on a small rant in one of his videos (I think the one describing datomic, but could be wrong) in which he pointed out many things that are fast initially will hurt long-term. I agree with that sentiment wholeheartedly.

The fact that you called the less complex alternative the "lesser" alternative speaks volumes. It honestly feels like the whole "mongodb is webscale" devbro culture rearing its ugly head.

jongjong · on Sept 6, 2022

I tend to prefer combining data at the last moment on the client side rather than having it pre-combined on the server side (I prefer REST philosophy over GraphQL). It's probably because I'm web-application focused and so scalability and concurrency is far more important to me than raw execution time. Maybe if I was a data scientist or embedded systems developer, I would care more about execution time. I've met people like that. But IMO performant scripts tend to be the result of more optimizations which makes them harder to maintain as the underlying engines or hardware changes.

P5fRxh5kUvp2th · on Sept 6, 2022

This has nothing to do with raw execution time.

pif · on Sept 6, 2022

> I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular

It's clear you have never work with a lot of data.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen

I hate this illusion that web programming is the whole of software development.

jongjong · on Sept 6, 2022

> It's clear you have never work with a lot of data.

Sure, I only wrote an open source distributed pub/sub system with channel-based sharding which has been used by thousands of companies to support hundreds of thousands of concurrent users, but I guess 'lots of data' is a relative term.

P5fRxh5kUvp2th · on Sept 6, 2022

That has nothing to do with data or data modeling.

jongjong · on Sept 6, 2022

Well that was just my hobby and side-gig... As part of my day jobs, I also worked on many projects with different databases including MySQL, Postgres, SQLite, MongoDB. I also implemented a side project (a distributed financial transaction processing system) using RethinkDB with per-table sharding and replication which runs on Kubernetes with statefulsets for persistence with automatic deployment and autoscaling and automatic database shard re-balancing with high availability with eventual consistency; I used a 2-phase commit algorithm for certain operations to achieve reliability in the event of write failure; as not to rely on atomic database transactions. I also did a course on relational database modeling at university (focused on ER diagrams and database normalization). I worked in the blockchain sector. I wrote a stateful, quantum-resistant blockchain from scratch including the cryptographic signature algorithm which uses an improved Lamport OTS variant suggested by Ralf Merkle and which uses a Merkle Signature Tree for key reuse and I contributed to the front end too. I also wrote a deterministic, fork-resistant, idempotent, heterogeneous multi-chain, chain-to-chain decentralized exchange. I also lead a team which wrote a P2P networking library with decentralized routing and efficient propagation of messages to peers belonging to the same subnets - Nodes in the network organized themselves into an unstructured, partial mesh topology with peer shuffling to avoid eclipse attacks but still retained the ability to form subnets based on the features they supported. But still, "a lot of data" a relative term.

P5fRxh5kUvp2th · on Sept 7, 2022

None of that is about data, it's about distributed computing.

No one is saying you're not a smart guy with skills, just that you're obviously not familiar with working with lots of data.

wahnfrieden · on Sept 6, 2022

postgres has a shitload of useful features that are unrelated to relations

buro9 · on Sept 6, 2022

I use FKs for most things in an RDBMS... but not for all things.

For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.

I always FK a large table (millions or more rows) to a small table (tens to hundreds of rows).

But I will pause and ask hard questions about FK a large table to a large table... will this impact migrations? Do I need this FK? Is data integrity at risk without this FK even assuming a buggy app? Does the app utilise this FK for queries, or is there zero performance benefit from having the FK? If I don't have the FK are both tables always queryable by a PK? Should I have an index, potentially a UNIQUE index, in place of a FK?

Like most things... it depends. A dogmatic insistence on using them everywhere isn't always healthy, and the inverse is true that an avoidance everywhere isn't healthy.

The DB is there to store data and make it available, whilst enforcing data integrity... if it makes sense to use a FK to achieve those things do it, otherwise don't.

buro9 · on Sept 6, 2022

I should've added...

FK based on the possible size of a table rather than the current size of the table.

FKs are incredibly performance on a near-empty local dev database ;)

vbezhenar · on Sept 6, 2022

> For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.

    on delete set null

?

Diggsey · on Sept 6, 2022

That still modifies the audit log, which is presumably supposed to be immutable. It also means you lose information, eg. if this is a user ID, then before you would be able to determine if two audit actions were done by the same (deleted) user. With "set null", you can't determine that anymore.

The parent's point is that the choice should depdend on the situation. Sometimes a foreign key is appropriate, sometimes it is not.

masklinn · on Sept 6, 2022

> For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.

Or updated. Or prevent the deletion.

But that’s configurable anyway.

scottcodie · on Sept 6, 2022

Foreign keys also let the query optimizer make better query plans. This is actually a bigger deal than most people think, the query optimizer can rewrite to semi joins or even eliminate joins completely if the optimizer has better guarantees about referential integrity.

seunosewa · on Sept 6, 2022

Could you give us a simple example?

CraigJPerry · on Sept 6, 2022

There’s a pithy quote by someone famous in DB circles who said something like “normalise until it hurts, de-normalise until it’s fast enough” - I’m vague on the exact words used but that’s the gist of it.

I’ve never come across a scenario yet where this wasn’t sound advice.

I tend to lean heavily on my DB as well. E.g. I tend to push all state down to the DB and out of the application. I work in environments where it’s common for developers to want to disable FK constraints, and i temporarily do sometimes during specific bulk operations during releases for example. The usual reasons others will suggest relaxing FK constraints permanently will be due to the need for audit logs or soft deletes but i have patterns for these too.

Foreign key constraints are pretty awesome, all databases I’ve worked in so far have escape hatches for when they hurt too much.

evdubs · on Sept 6, 2022

I've heard it as, "Normalize until it hurts. Denormalize until it works."

Still, in my experience, database normalization has seemed like less of a performance impediment than queries written without considering an execution plan and indexes.

josephcsible · on Sept 6, 2022

Giving up foreign key constraints because they cause errors is basically the same mistake that the monk in http://thecodelesscode.com/case/115 made.

metadat · on Sept 6, 2022

Yes, this 100%. Though, as other comments have printed out: you need to take application and scale into account.

Fundamentally it's a question of how one should go about handling multi-tenant situations.

Rexxar · on Sept 7, 2022

I didn't remember this little stories where sometimes so cruel.

quickthrower2 · on Sept 6, 2022

Foreign keys can be removed during migrations and added back after. You would also disable triggers and check constraints too.

I like foreign keys, check constraints and tight data types. Might as well constrain it and limit the scope for errors.

Application programmers can write some buggy code. The DB should provide a line of defence.

If you don’t have foreign keys that should be a design choice with a legitimate reason the entity can become orphaned.

You need a definition of what that means in real life. E.g PERSONID id 1012 and there is no associated record. This kinda means you need to look at another entity to know what the first entity “means”.

This might be useful for data that needs to sit in distributed databases.

Tade0 · on Sept 6, 2022

To me "let's wholesale throw away this non-deprecated language feature" is a major red flag.

I've found that proponents of this usually either don't fully understand the feature in question or made some major mistake in the implementation, which in turn causes problems that manifest themselves when they try to use said feature.

forinti · on Sept 6, 2022

You need FKs to ensure you don't delete data that's still needed. If your data is important, you have to use FKs.

You also need to index your FKs so that the database does not have to do a full table search before you can delete a row. This is often overlooked.

Your friend can do his data migrations without FKs and create them afterwards. This is quite a common procedure.

ivraatiems · on Sept 6, 2022

I'm going to assume that by "foreign keys", you mean "foreign key constraints" where the DB itself is insisting on particular relationships.

There are a few different schools of thought. I will list them, but the important thing to remember is not to be dogmatic. They are all right or wrong depending on your circumstance.

One school of thought says "I want all data in my DB to be normalized. I want it to be right when it goes in so it never breaks the application layer." That school would say foreign key constraints are critical.

Another school of thought says "I want all the data in my DB to be retrieved and inserted quickly. I want the application layer to do any error-checking that is necessary, or, I want to be in a situation where I can always fail gracefully if there's errors in data validation."

Still another school of thought says "I don't trust those programmers to write good application code, so I will insist on normalized data for that reason," and yet another says, "I don't have control over the DB schema, that's some DBA's job, so I will just do all my validation in-app."

The point of this being, there's tradeoffs either way you go.

Personally, I typically would rather have the application layer do the validation and even the joining of data, a lot of the time, when the application is high-volume. At the volumes my organization works with, it is very difficult to write performant SQL queries that use JOINs and other relationships as a developer - even as a DBA! - and often much easier, for me, to write performant application code. The DB is also a pet with many owners, whereas the infra for my applications is owned by my team. So, it's better for me to do relationship validation in code myself. (We also do not use a heavy ORM, again for performance reasons. Just Dapper.)

At my previous job, the situation was the opposite - we weren't under such load at most times that it mattered if the queries were performant, we had Entity Framework building relationship, and EF will blow up if you ask it to build relationships where none exist. So, we needed more normalized data, and that was what we went for. But even then, not in every situation.

jve · on Sept 6, 2022

> At the volumes my organization works with, it is very difficult to write performant SQL queries that use JOINs and other relationships as a developer - even as a DBA! - and often much easier, for me, to write performant application code.

How can this possibly be true? Won't that result in sending unnecessary data over the wire, stressing network and SQL buffer?

What are these queries and what are these volumes? I just can't wrap my head around the performance statement. You know better join algorithms that SQL Server is capable of (Loop/Hash/Merge)? Given that you have appropriate indexes in place, perhaps query hints is what you need to control sql plan guides if you know a thing or two about your data and it's distribution more than the sql statistics.

peterhi · on Sept 6, 2022

Joins are severely misunderstood and often incorrectly used. I've seen code that had the wrong join and would return 1,000s of rows which then had to be programatically squished down into the data we actually wanted. Some hand crafted SQL usually fixes this

Also some ORMs write dreadful SQL where it comes to joins

A badly written join (or collection of joins) will take a longer time to run that will, when the system is under load, backlog other queries. If these errant queries make up a significant portion of your queries then it will hit performance significantly

It's not the joins themselves just the incorrect use of them

ivraatiems · on Sept 6, 2022

You said what I'm trying to say way more clearly than I did.

A lot of people have the same concern but I'm just gonna reply to this comment.

The ratio of SQL-focused devs to non-SQL-focused devs at my org is not favorable. And we certainly DO write joins... just not complex ones. Likewise, we do use constraints... just not all the time against multi-billion-record tables.

But that's not all. Our biggest tables are also our oldest and most unwieldy. Here is an (admittedly outdated in the specifics) example of what it's like to add constraints to a big table in SQL server that didn't have them already: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/3eb...

So for a dev to come along and add a whole bunch of FK relationships and/or write some big fun queries against one of these tables is asking for a lot more than one realizes immediately. New devs join and run up against this all the time.

Is it a good or great situation? No. But that's not the question. The question is is it a real one and why.

AdrianB1 · on Sept 6, 2022

Joins may be severely misunderstood by some people that write front end, middleware and backend code in the same day, but not by dedicated SQL developers. If the app is big enough and important enough, having dedicated SQL devs is the solution.

names_are_hard · on Sept 6, 2022

I'm not GP so I don't know what they meant, but a key upside to complex logic in the application layer vs the database is that the application layer is often much easier to scale out than the db. Where I work, if I run out of memory in the app I just change a configuration variable and k8s gives me more instances instantly. But if the database is memory constrained and I'm already on the biggest server available to me...I need to re-shard or take some other more sophisticated approach (tuning, replication, other stuff I don't know about).

Of course each scenario is different, YMMV, and as always "it depends".

pdhborges · on Sept 6, 2022

> key upside to complex logic in the application layer vs the database is that the application layer is often much easier to scale out than the db

I think the point of the GP is that all of these application instances are still connected to the DB, doing sub optimal data fetches taxing the database in multiples.

Izkata · on Sept 7, 2022

One example: MySQL table locks. The application can do in parallel what the database can't.

jve · on Sept 7, 2022

Locks ensure you get committed data back and not some data that is in-operation and may or may not end being persisted. If you don't care about it: SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;

And SQL can do parallel operations too.

winrid · on Sept 6, 2022

You made the parent commenter's point for them. You went over the heads of half the developers with join algorithms and index hints. That's just how it is, unless you're at a company with a very high bar for hiring and training.

jrumbut · on Sept 6, 2022

But, besides index hints, the developers don't need to worry about those things if they use the database to perform joins. The database management system chooses for them and does it pretty well (counterproductive index hints are not unheard of).

If they do it in application code, then they probably ought to learn about fancy sorting and joining algorithms.

But they should really just do it in the database (using read only replicas if the load gets high).

winrid · on Sept 8, 2022

Using read replicas isn't always an option depending on your write throughput and consistency requirements.

The database can only do so well (and will spend a lot of CPU cycles working on your crazy query plan, because getting it wrong is more expensive, so now you effectively limit capacity regardless of how good your storage engine is).

Joins are great, tons of research went into making joins work, and lots of different join algorithms and optimizations based on data sizes, indexes, etc. But you really have to be careful, verses just denormalizing data across multiple tables/collections. Most applications are read-heavy, anyway... I generally plan for things to be successful, in which case joins don't usually work in the hot path.

dspillett · on Sept 6, 2022

> At the volumes my organization works with, it is very difficult to write performant SQL queries that use JOINs and other relationships as a developer - even as a DBA!

Leaving aside my initial snark reactions¹ as they are not really relevant.

What you say may be true if the data is not arranged in a manner conducive to efficient queries of the type you are trying to make (for instance if the DB was optimised for a different sort of output because the needs were (or were expected to be) different at design time. BUT, read performance is not relevant to foreign keys. A constraint is assessed as INSERT/UPDATE/other time to maintain referential integrity and has no effect on later reads. You can do your own linking in the application if you want, but I'm keeping my foreign keys to stop bad data getting in - they won't affect your process of getting data out either way.

Also note that a foreign key does not imply an index exists in most DBMSs⁴ so if you are expecting the constraint to help performance when referring to a table from its parent then you may be disappointed. An index will exist where the key is referring to as FKs will always refer to a primary key or unique index but the other side is not usually indexed unless you explicitly ask for it to be. I've seen a few people run into this trap, expecting an index to be there because an FK constraint is, and coming to the conclusion that JOINs are just slow because one isn't so their queries that would benefit from it are slow.

----

[1] Sorry, not a good enough person: “sounds like you need a better DBA!”²

[2] Well, a better database developer. Even the best can't get good performance from an inappropriate design. Or maybe a time machine, everyone who has worked with BDs long enough will have been stuck with bad or inappropriate design³ we have no power or time to fix…

[3] Possibly of our own making!

[4] Some DBs create one automatically, and some ORMs & other data manipulation libraries built on top do too. But it is generally not done because it is far from always necessary, so it could waste space, and you may want a compound index instead depending on other properties of the data ans desired outputs.

BeefWellington · on Sept 6, 2022

> Personally, I typically would rather have the application layer do the validation and even the joining of data, a lot of the time, when the application is high-volume. At the volumes my organization works with, it is very difficult to write performant SQL queries that use JOINs and other relationships as a developer - even as a DBA! - and often much easier, for me, to write performant application code.

There is basically no way it is faster for "high-volume" systems to return excess data to the application rather than doing the joins on the dB and returning the record set to the application.

Even if we were talking about multiple billions of records you'd still be better off with a completely denormalized data warehouse style table and doing filtering against indexed columns db-side before sending vast quantities of data to the application.

I think people think this way because of licensing and the specialized nature of DBAs.

ivraatiems · on Sept 6, 2022

We are absolutely talking about billions of records.

But when you say "return excess data to the application" I'm not sure what you mean. Not doing lots of complex JOINs doesn't mean not filtering the queries at the DB at all. Nobody is pulling back a billion records at a time.

Here's an example of what I'm talking about - read the first comment on https://www.brentozar.com/archive/2015/05/do-foreign-keys-ma..., another venue for this same debate.

BeefWellington · on Sept 6, 2022

> We are absolutely talking about billions of records.

> But when you say "return excess data to the application" I'm not sure what you mean. Not doing lots of complex JOINs doesn't mean not filtering the queries at the DB at all. Nobody is pulling back a billion records at a time.

The question then becomes "Are you gaining anything by not using foreign keys on the database?" What is the additional speed impact of JOINs actually costing you? How denormalized is your database already, if JOINs are costly?

If you want raw speed at billions-of-records scale, you want as flat a schema as you can get and good indexes are actually going to fit into RAM.

By that point though, you should be able to recognize your use case is not the 90% (or even 95%) case, and your specific requirements are driving doing something different. That's very different than the vague "high-volume" statement you made at first.

My experience has been in assisting clients in migrating to data warehouses and specifically during the heyday of Hive/Hadoop/Spark years ago in seeing clients mistakenly believe they were "big data" and go down the rabbit hole of trying to scale out before it was actually necessary. The problem I have with the vague notion of "high-volume" is that I saw clients with 500m records believe they fit the bill, as well as clients with as few as 20m records who thought the same. The reality is neither of them did and they wound up wasting a lot of money in pursuit of slower systems.

> Here's an example of what I'm talking about - read the first comment on [...], another venue for this same debate.

That's not an example though, that's just some vague statements about needing to profile your query and evaluate the costs for yourself, which should be rather obvious.

ivraatiems · on Sept 6, 2022

I'll cop to my situation being unusual, but that's what I said in response to other posts in this thread: so is everyone's. This isn't a topic with a blanket rule. I loaded my original post with caveats like "for me" to try to make that clear. Sorry if it wasn't.

Apart from your skepticism about whether my org's DB is as efficient as could be, I don't think we actually disagree, unless your argument is that people in my situation are somehow obligated to make a specific set of choices rather than what works for them.

jrumbut · on Sept 6, 2022

Foreign key constraints are easy to remove if they become a problem but almost impossible to add once data integrity problems arise (and I've never seen them not arise in projects without FK constraints).

The small number of people with high enough scale that they can't use them know who they are, the rest of us need to think carefully when performing database migrations and reason out the order of operations required to maintain data integrity (sounds like a good idea anyway?).

dspillett · on Sept 6, 2022

> sounds like a good idea anyway?

Heck, yes. If you are butting against constraints then you may have misunderstood the problem.

I don't mind people turning constrains off for a mass migration under the following conditions:

• This is not production, or if it is production you are in a maintenance window during which you have exclusive access to this DB

• The data you are piling in will be verified against the constraints once the job is complete (no turning things back on with “WITH NOCHECK”), and you have a rollback plan for if that fails (“restore from backup take before the maintenance window” might be acceptable, if that can be done in the timeframe of the maintenance window and you are happy taking questions from your team/clients/management if this means planned updates have to be completely postponed).

• You can explain why not getting the data to be modified in an order that allows the constraints to be kept on through the process would be significantly more complicated (simplifying migration code is a valid reason for temporarily turning off constraints if it makes maintaining the relevant code less error prone) or significantly slower (sometimes doing it most right unavoidably takes more time) or both.

jrumbut · on Sept 6, 2022

RDBMSes don't do a lot of extra stuff, if they throw an error it's usually for a reason.

y1426i · on Sept 6, 2022

There are many reasons to not use foreign keys, but it also depends on the kind of application. For small databases, foreign keys do make things simpler from a validation standpoint.

When building systems for scale where the databases may grow large, foreign keys can cause many issues -

- ORM features around foreign keys can easily bring your system down when joining large tables with incorrect/missing indexes during heavy loads

- As the table grows, not having foreign keys makes it simple in taking out large tables into big-data solutions in the future

- The schemas and relations are sometimes hard to understand during the initial phases of application development. Not having those relations makes changing schemas simpler and faster.

- Sharding tables is much simpler when there are no foreign keys

- It helps to add some of the reference logic in the application rather than the database. Databases are the bottlenecks when it comes to IOPS and scaling. The more processing you move to your application server, the better scalability you can achieve.

throwawayffffas · on Sept 6, 2022

Yes and no, there are cases that you should not use FKs mostly for things like audit logs, order logs, stuff you never want deleted or modified in general.

In general it depends on whether you intend to use the database to drive a system, or whether you intend to use the data for reporting and data analysis.

Inconvenience in migrations is not a legit reason, if your migration would fail if you had FKs but doesn't because you don't you just broke your data, the errors are there to protect you. In other words your colleague is straight up wrong.

bazoom42 · on Sept 6, 2022

Your colleagues logic is akin to turning off the smoke detector because it starts beeping any time you barbecue indoors.

vbezhenar · on Sept 6, 2022

Of course I do. I think that it's crazy not to use database features to increase database integrity. I use foreign keys, I use not nulls, I use checks, I use triggers sometimes.

All my issues with databases usually stem from the fact that someone did not use enough checks and we got dirty data nobody knows what to do with.

My only non-conventional usage of database schemas is text field lengths. I have rule: it's either 20, 200 or 2000. It should be big enough to fit any non-insane value. Like 200 should be enough to fit a phone number. The main point is not to let absurdly broken data in.

devmor · on Sept 6, 2022

I use FKs because I have built my career on refactoring old software.

I have seen over and over firsthand the kinds of data integrity problems that come from leaving the decision to the business software and those who meddle asynchronously with data.

You can always rewrite software. Rewriting bad data is not only difficult but often impossible.

steve_gh · on Sept 6, 2022

This:

"You can always rewrite software. Rewriting bad data is not only difficult but often impossible."

charcircuit · on Sept 6, 2022

>You can always rewrite software. Rewriting bad data is not only difficult but often impossible.

Isn't that an argument against FKs? It's easier to rewrite the software to handle FKs than to deal with trying to setup FKs with bad data since it's difficult to fix.

dmak · on Sept 6, 2022

Data often outlives the teams that input it. Often the case is, you can't recapture that domain knowledge to rebuild the missing data.

CRConrad · on Sept 9, 2022

> >You can always rewrite software. Rewriting bad data is not only difficult but often impossible.

> Isn't that an argument against FKs?

No, it's an argument for FK checks.

> It's easier to rewrite the software to handle FKs

Except once you have that bad data, you don't know what to do to fix it in whatever language your app is coded in any more than you do than in SQL. Once you know that, you can just as well do it in SQL as in any other language. Or, if it's just that you know that language better than the SQL language, by all means write a separate one-time fix-the-data app in that language, run it once to fix the data...

And then enable the FK checks so you won't have to do it again.

> than to deal with trying to setup FKs with bad data since it's difficult to fix.

Which is why you want FK checks in your DB from the beginning.

[Edit: Fix bad original editing.]

mattewong · on Sept 6, 2022

There is only upside to using foreign keys. When enabled, the benefits are obvious, so I won't discuss, and as for the drawbacks, the only drawback is performance. But if that's a concern, then guess what, on most databases, you can disable foreign keys.

Well, one might ask, what is the point of having foreign keys if they are disabled? And the answer is, there are several benefits. Here are a couple:

1. foreign keys, disabled or not, create a record of your data design that itself serves as documentation, and that can be programmatically queried, extracted, copied, modified etc, all of which reduces technical debt and is useful for other users or developers to understand your schema and work more efficiently.

2. You still have the option to enable the foreign keys. Furthermore, you can pass this job over to someone in your organization who may not have authority to create or modify foreign keys, but does have the authority to enable them and/or to fix the data however required to enable them

bvrmn · on Sept 6, 2022

I have an anecdote from my experience. We had an unpleasant debug week in search of an cause of slow insertion query in PG to a particular table. 15 FKs to the table were the reason of a bottleneck.

silisili · on Sept 6, 2022

Yes. I use FK constraints and cascades. Experience has taught me doing these things in the application layer is very lengthy to get right, often error prone, and rarely as fast.

abraae · on Sept 6, 2022

Even if it was easy to get things right at the application layer, in a legacy system (and today's modern hotness is tomorrow's legacy system), the database is a constant.

Entire generations of application may rise and fall. New languages, frameworks, developers all lead to rot over time. Heck, some legacy projects the application code is incomplete or lost.

But the database doesn't rot. Show me a database that is 20 years old, and it is as fresh as the day it was created. If it has FK constraints, it's even healthier.

That means that the more value is embedded in the database - FK and of course many more constraints - that value will live for decades.

llanowarelves · on Sept 6, 2022

I have lived the rise and fall of all those things in my own projects (rewrites, you name it) using other's ORMs (for example). Handling as much as you reasonably can at the data layer is definitely the way to go.

Commenting just to draw attention to your comment.

iforgotpassword · on Sept 6, 2022

I use FKs and have migrated/updated the scheme multiple times over the years. If something goes wrong during migration that means I don't understand my own scheme properly. Getting rid of FKs for that reason feels like disabling all compiler warnings, or wrapping every single method in java in a try-catch block with an empty catch part.

dbsmith83 · on Sept 6, 2022

I have not worked on a system where the performance mattered so much to consider not having a FK. I personally think that the key relationships should reflect the nature of the data. It prevents a lot of errors so that new devs don't create data which should not be able to exist. Don't make invalid data have a chance to exist or be representable. Make it difficult to pollute the db because relying on your app layer is risky

alberth · on Sept 6, 2022

FK’s is what makes a relational database “relational”.

It sounds like your colleagues want NoSQL (key-value) given their no JOINs policy.

zergu · on Sept 6, 2022

"Relational" comes from https://en.wikipedia.org/wiki/Relational_model Table = relation

IshKebab · on Sept 6, 2022

I followed the etymology further because that makes no sense on the face of it. It's referring to the relationship between columns. The table is a mapping from one column to another so there is a relationship between them.

This example makes it pretty clear: https://en.wikipedia.org/wiki/Finitary_relation#Example

I always thought it was referring to foreign keys too. Pretty bad name in hindsight.

zupa-hu · on Sept 6, 2022

Thanks for elaborating on this! I’ve learned something today.

I couldn’t make up a more misleading name if my life depended on it..

EDIT: Reading the page in more detail, it seems to me relational means both. Indeed each field of a row has a relationship (they belong to the same row after all), but there are other kinds of relationships like one-to-many achieved by foreign keys.

Eg.: each row of a class table corresponds to a class, and a class corresponds to multiple students, so the relationship between the class table and the student table is "one to many"

oxff · on Sept 6, 2022

How is this even a question, besides denormalization cases, FKs are essential for data integrity and management if you have even slightly relational data (data that is composed of joining tables), it guarantees you a source of truth..

golondon · on Sept 6, 2022

No, especially on large tables with billions of records. They make online schema changes impossible. More details: https://github.com/github/gh-ost/issues/331#issuecomment-266...

glasshug · on Sept 6, 2022

Curious why no one else had mentioned this. Are they using online schema transforms besides gh-ost and Percona? Does Postgres handle this better than MySQL?

golondon · on Sept 6, 2022

Postgres has index concurrently which is better or few other instant things which MySQL 8 also has but you still need to be careful.

But some things like adding a column with default value is still hard.

https://www.braintreepayments.com/blog/safe-operations-for-h... this article has a good guideline on this.

belltaco · on Sept 6, 2022

Why not just turn off foreign key checks during migrations and after validating the data, turn them back on? SQL Server lets you do this.

pmontra · on Sept 6, 2022

But you could get duplicate key errors when turning fk on. A customer of mine uses UUIDs as keys. That makes almost impossible to get duplicate keys and removes any problem with moving data from one db to another. Reading UUIDs is a minor pain though.

masklinn · on Sept 6, 2022

> But you could get duplicate key errors when turning fk on.

Why? Fks don’t require unicity. Or do you mean that you target non-unique columns? That sounds… horrendous.

But regardless sounds like something you’d want to resolve before declaring the migration finished.

pmontra · on Sept 6, 2022

You're right. I meant the troubles we get if we turn off all constraints to ease the migration, end up with duplicate keys in the target table because of mistakes, turn on constraints, errors.

CRConrad · on Sept 9, 2022

> ...turn off all constraints to ease the migration, end up with duplicate keys in the target table because of mistakes, turn on constraints, errors.

[Emphasis added -- CRC.]

But that's a good thing, not a bad one -- it shows you you've made a mistake! Might even be you wouldn't have caught that mistake at all otherwise. How could it not be better to catch a mistake right as you made it and are still in the maintenance window to fix it, than to have it lingering undetected, getting ever more corrupt data, until it rears its head possibly years later, when some vital report shows your DB is full of garbage?

pards · on Sept 6, 2022

> difficulties during data migrations

It's not difficult; it merely requires that you think about the order of the migration operations.

> He'd rather have a smooth data migration process than having an unexpected error

This should be grounds for dismissal. Data integrity problems should be dealt with immediately instead of making it someone else's problem later. He's just kicking that "unexpected error" down the road to some poor soul that will spend weeks trying to figure out how the data got so messed up.

If you don't care about data integrity then use a different data storage solution.

bartread · on Sept 6, 2022

My other beef with no foreign keys is that it makes a database a lot harder to understand.

There are any number of tools that will generate me a pretty and useful database schema diagram if I point them at a relational database. This is incredibly handy when you're new to a database and need to figure out which tables to query and update, and which (gasp) sprocs you need to call. I've been on projects where people have been poking around in the dark, and then a good database diagram has saved us days or weeks of effort trying to figure out how to make something work. As I say, there are plenty of tools capable of generating one of these in seconds or minutes[0].

However, if you don't have foreign keys, the utility of such a tool is severely diminished because you just get a big pile of nodes representing tables clustered at the top or bottom of your diagram (depending on exactly which layout algorithm is being used and how it's been configured).

[0] Many years ago I and three colleagues built one of these: Redgate's SQL Dependency Tracker (https://www.red-gate.com/products/sql-development/sql-depend...). It was pretty neat because you could build a diagram spanning databases, or even linked servers, and unlike most other tools at the time it could handle thousands and thousands of database objects, but the product name doesn't really help get across that it's fundamentally a diagramming tool. I built the dependency engine, the graph, and radar views. We used yFiles from yWorks for the graph layout calculations, with a bit of extra hackery, but I seem to remember yFiles lost compatibility with a newer version of .NET at some point so (or something along those lines) so RG ended up swapping it out for something else.

saint_abroad · on Sept 6, 2022

> There are any number of tools that will generate me a pretty and useful database schema diagram if I point them at a relational database.

I'd even go so far as to design database schema with these ER diagram tools in mind: if the automatic diagram is messy then that's more-often-than-not a code-smell in need of refactoring (before being hit with production data).

sebazzz · on Sept 6, 2022

Besides the other reasons mentioned, FK's are also implicit documentation - even if you disable them.

oedo · on Sept 8, 2022

Most (non-gigascale) applications that are non-trivial enough to use a relational database in the first place will benefit from using foreign key constraints (ie delegating enforcement of foreign key relationships to the database). Though, context can matter: system architecture, RDBMS limitations, etc. (eg IIRC, FK constraints complicate certain kinds of migrations in MySQL at scale).

>He rather have a smooth data migration process than having an unexpected error and abort the whole operation

That's not a good reason. It doesn't sound like he's making an educated decision based on context-- it sounds like he's sacrificing data consistency to make his own work easier. If that's the case, perhaps he should do away with all integrity constraints (primary keys, checks, et al). Then, he could enjoy a "smooth process" for adding new data, too. No more pesky errors-- just blissful, silent corruption.

>To be fair, as far as I know, he never had major data problems.

Yet, anyway. He has intentionally limited his visibility over at least one kind of problem (invalid references). His problems might not be clear until they start causing noticeable issues elsewhere.

>He is not the only one I've met who insisted on not having FK. I've even seen large systems prohibit using JOIN statements.

Allergy to the relational model. Many such cases!

gigatexal · on Sept 6, 2022

In an OLTP system yes of course. In an OLAP system no since they’re often ignored anyway but are nice to be defined when trying to understand the schema. We logically use them in that the id column from says a supplier table might be found in the orders table.

cratermoon · on Sept 6, 2022

Foreign keys are an implementation detail specific to the relational model. In the stereotypical example, an Order might have several Line Items, so each Line Item has a FK back to the Order, via the FK OrderID. (Ignore for a moment Assembly/Part where each Part can be in multiple Assemblies). Anyway the point is the that an Order has 1..N Line Items (we can assume an Order with 0 line items is degenerate and not allowed). What matters is that we can know the Line Items that belong-to an Order and the Order to which a Line Item belongs.

In a strict relational model implementation, the only way to reify that is by the FK of the Order in the Line Item, but in some other implementation, say a generic programming language, the Line Items might be an array or similar data structure that is part of the the Order, and the programming language implementation keeps track of the pointers or address offsets or whatever detail it cares about.

Anyway that's all beside the point.

In my experience, if the IDs are not autogenerated by the relational system, then it's relatively easy to migrate, however when data is full of IDs defined via some AUTO_INCREMENT behavior then migrations become an awful mess and any system (and this happens more frequently than you might expect) where a specific ID starts to have semantic meaning (oh, the customer's ID is 387437. whoops) then all bets are off and you might as well just give up and accept that your auto-generated IDs are now fixed for all time and can't be changed.

Oh and just to add, I have notebooks from previous employers where I have written THE Important IDs to Know, which started life as auto-generated numbers but which now are enshrined and encased in acrylic to the extent that years later they are important tribal knowledge.

Rudism · on Sept 6, 2022

Saying you refuse to use FKs because you keep running into problems with data migrations is like saying you refuse to use arrays in your code because you keep running into problems with out-of-bounds index errors.

pram · on Sept 6, 2022

Yes I use constraints, because I like having CASCADE for DELETE and UPDATE.

I think trying to manage consistency in your application is probably a bad idea.

QuiCasseRien · on Sept 6, 2022

"use FK, you fools!"

FK maintains your database integrity. If you care about data, just use it !

__d · on Sept 6, 2022

I've worked on a couple of systems that use foreign keys (and other constraints) specifically after a migration to ensure all the migrated data is consistent, but then drop them again to reduce the performance overhead of having them checked during production use.

samatman · on Sept 6, 2022

Do... you not turn the foreign key constraints off during migration?

In SQLite it's a pragma, in Postgres you turn triggers off, those are the ones I've personally done but surely any relational database has this ability for this specific reason?

wahnfrieden · on Sept 6, 2022

best to use tools for this such as https://github.com/fabianlindfors/reshape or gh-ost for mysql

sverhagen · on Sept 6, 2022

Absolutely "yes" on foreign key constraints.

There _is_ a scale, though. I have worked in enterprises where the database is the "contract" and where the business logic is implemented in the database, with triggers and all sorts of constraints and what not, in addition to foreign key constraints. But I now mostly work in service- or micro-service-oriented places where the "contract" is a REST(-ish) API, business logic is implemented in Java, and the database is exclusively owned by that one code-base, where not all business logic is replicated from the Java code into the database (still foreign keys, though!)

astura · on Sept 6, 2022

Your colleague doesn't understand relational databases, which, unfortunately, is fairly common - classes on RDBMSes are electives in most CS programs. It's scary/concerning this person is being allowed to operate like this - I'd suggest your company add design reviews and code reviews to reel this sort of rouge behavior in.

...Of course, if nobody understands relational databases at the company, then that won't help...

Even if data migration was difficult with foreign keys for some reason, they can be disabled at the time of migration - no need to put weird constraints on day-to-day operations.

viraptor · on Sept 6, 2022

I've joked before that I can often see which tables were created first and which ones were added over a longer time by checking where did people start giving up on foreign keys. But it's kinda true too.

I see them the same as stored procedures. If you use them, you better use them everywhere and have all your data consistency model in the fk and stored procedures. Also, figure out how to disable them during the more interesting online schema migrations. If you can't commit to that, they'll only cause issues down the line.

mejutoco · on Sept 6, 2022

I can relate partly to the problem. Sometimes you want to import a single table from a backup and, because of FK constraints, a very specific order in the tables is required, and can span many tables depending on the complexity of the data.

As an alternative you can use Foreign keys without constraints. This way you get the conveniences of them without the migration problems. You can do this permanently (a bit wilder) or just before an import, and re-enable it later.

Canada · on Sept 6, 2022

I go for constraints every time. Data Integrity is king as long as you can reasonably afford it.

Your data migration aborted on an unexpected error? Well in my view that's a problem and you need to figure out why that is happening and fix it.

But hey, you can always drop the constraint when you decide it's not worth it, and you can always add it back again after. Depends on the application too, sometimes it just doesn't matter if you make a mess.

BWStearns · on Sept 7, 2022

It can be annoying doing migrations where it turns out your assumptions about FKs were wrong. However, it's absolutely worse to deal with a system where relationships aren't enforced at the schema level. You still end up validating the existence of the related rows, but it happens all over your codebase in an ad hoc manner and you'll still end up with bizarre bugs.

PaulHoule · on Sept 6, 2022

It is an old debate. You can get into awkward situations when you save and restore tables, everything has to happen in the right order and there's always the fear of some circular situation.

I'm remembering the time I was working at a place that had a huge number of Microsoft Access, Microsoft SQL Server and mysql databases and I was the first person they'd hired who knew how to do joins and they thought it was pretty scary.

sergiosgc · on Sept 6, 2022

That's not how you migrate. You disable the foreign keys, load the data, then enable them (which checks everything is ok).

exabrial · on Sept 6, 2022

> His main argument is difficulties during data migrations which he frequently encounters

He's just kicking bugs down the road. That's not engineering.

hyperman1 · on Sept 6, 2022

For me it's a casevof pick your poison.

If you have no FK, the day will come that your data corrupts. Tiny application bug, wrong manual data fix, incomplete datamodel communication. It's a case of when, not if. And when it happens, it can fester for weeks or months, corrupting all kinds of data in unfixable ways.

If you have FK, you can go all-in on relational. The stuff is so powerfull, you'd be dumb not to. But then comes the too-smart-optimizer problem. Just like with compilers and opengl shaders, the tiniest change might make you fall off the optimizer's preferred path and get a way-to-slow version. Even when no code has changed, a minor DB version or even a tiny shift in statistics will kill you.

Personally, I tend to consider integrity more important than performance, but both have a zone where they are good enough to make trading off worthwile.

So I do FKs and joins, and count on rigorous testing and monitoring to keep things in check. Someone else might decide otherwise. Both strategies require you to do the unsexy part of the wirk, or a harsh punishment will follow.

dedomena · on Sept 6, 2022

Well its an interesting subject, yes it is RDBMS i.e. relational database, tables=relations, alas it does not automatically mean that just because you have a relational database you must use FK. Most of relational databases theories were created at the time when databases were few megabytes in size so the theory start quickly collapsing with terabyte size databases and larger. I would say it is truly depends on your strategy - I've seen companies remove all FK and simplify databases management and vise versa. First and foremost I would test just how many FK violations are really there - if you have FK violations into thousands per 24 hrs you might need to take another look at your app on the other hand if you have very few FK violations does it make sense to lace your database with FK's thus significantly complicating administration? All and all the databases are designed to store your data and that is it, yes of course you can use the database to QA bad coding practices but it does not make it scale very well.

wingshayz · on Sept 6, 2022

It would seem a bit crazy to me to ditch FKs for that reason. Why not just drop the constraint? I would much rather keep my data integrity.

The issue with managing the relationship just in code is if you ship a bug to break the relationship, you now have to manually fix your data, and if you want to find out when or where the bug was introduced, you're looking at commit history instead of a migration history. Same thing when it comes to making manual updates or adds in the db. Even if it's just on a dev stage, if your code makes an assumption about the constraint which isn't true, you can end up with bugs or exceptions on dev, which is also annoying. If you want to remove the assumption of the relationship from the code entirely, that would be more understandable, but not if instead it means replacing what would be an efficient constraint and join with a separate query.

dkjaudyeqooe · on Sept 6, 2022

That's how I've always done migrations. Load the data, then create the FK constraints and indexes. It's not hard at all but slightly more work.

Cody_C · on Sept 6, 2022

I prefer relational databases for most cases unless I really need to use a non-relational store. I also use foreign keys whenever possible. From what I have seen, anything to enforce referential integrity is a benefit in most cases. It is more comforting to have a delete fail than have many other queries mysteriously fail down the line.

In most cases, a well designed database covers a multitude of sins and makes life for future you and your future team much easier. Some of the biggest dumpster fires of code I have seen started as a smoldering dumpster from terrible db choices that were made far harder to fix when it had been in production for a few years.

If someone put a gun to a developer's head and threatened to pull the trigger if the database was crap, sadly they would be forced to pull the trigger most of the time.

zzzeek · on Sept 6, 2022

> He rather have a smooth data migration process than having an unexpected error and abort the whole operation that the migration is only a small part of.

that's some take, because if that "unexpected error" is not reported because FK integrity was turned off, that means the data containing integrity errors goes right in. Now your database is corrupt. Dealing with a DB where random rows here and there are not conforming to implied-only constraints is zero fun. in my own experience, things like the main page of the production site is a 500 error, with a stack trace deep into some logic nobody has looked at in two years, run the identical code on staging, works fine. Fun stuff! Seems like an odd choice to let errors like that stream right into your production DB without checking.

astura · on Sept 6, 2022

This is one of the many problems with "senior" engineers who have been in the industry four years over three different jobs - they don't stay long enough to see the consequences of their shitty code.

darepublic · on Sept 6, 2022

Yes I do! Your colleague reminds me of a colleague I had who would copy and paste code into node_modules via a shell script to share it because they didn't like to learn about npm publish. Do things the right way if you can and your life will be easier, even during the data migrations.

JohnDeHope · on Sept 6, 2022

I do app databases for a living, so I thought I'd dump my 2 cents here. I am very much not on a high horse about it. I'd be perfectly happy to use FK if somebody felt strongly about it. I myself have never found them helpful. My databases are small enough and I write all the SQL access code by hand, so nobody is touching the data except me, and the SQL that I write myself. Things being half-deleted isn't a problem. In fact I can't ever remember where that has happened. I have lots of other problems, but that's not one of them. PKs, indexes, views, etc. I use all of that stuff, but not so much FK. YMMV. Prohibiting joins is throwing the baby out with the bathwater.

sethammons · on Sept 6, 2022

By default, yes, FKs and joins. However, I tend to work on high scale systems generally backed (primarily) by mysql. To scale out (tbs of data, hundreds of tables, millions of users, billions of daily interactions), we've had to remove some FKs, avoid db triggers, use caching heavily (including piggy backing requests where you don't serve an identical request from the db, you wait for the response and returned the memoized version). When you do choose to remove a FK, the need doesn't go away, you just move it. Now you have to look into data sync and out-of-band integrity checks and you have to have a plan on what to do when you go looking for data that is no longer there.

throwaway22032 · on Sept 6, 2022

The entire point of using an RDBMS is to have constraints IMO. Foreign keys are one of the most basic forms of constraint.

"It's never gone wrong yet" is a good way to get yourself electrocuted when you're changing a light bulb.

thefz · on Sept 6, 2022

Yes, when I did develop on MSSQL, I used them all the time. They force you to think, and hard, and saved my bottom quite a number of times. I then had to maintain for the last years a system with surrogates only and I hated it.

nablaone · on Sept 6, 2022

If you have any business app (OLTP) using FK is no brainer. I know devs whom fights for 99% code coverage and do not use any constraints in the database. Fixing bug in the code is easy. Fixing data inconsistency is not.

ajorrell · on Sept 6, 2022

Hello frogcoder, Yes, I use FK's for non-trivial systems. Personally, I believe in the enforcement of FK's via DDL as they document the schema and serve their intended purpose of maintaining data integrity according to the rules of the schema. Enforcing constraints and any other system constraint is part of system design. From a debugging perspective, I would rather have the RDBMS return an integrity error when the layers using it don't follow the rules, instead of chasing down where the contraint was not enforced in every layer.

eric4smith · on Sept 6, 2022

Yes.

Mainly to cascade delete though, since we were robust enough to insert proper data.

This was spurred on in new projects because in a very very large older project with many many tables we had problems with orphaned records.

(PostgreSQL)

kureikain · on Sept 6, 2022

I used to scare of it thinking why do I need it? everytime I delete sth it annoying me. I read blog post of GIthub not using FK key, and try to convince myself that FK isn't worth it.

One day, I decided let try this out. And turning out it wasy easiser than I think. When delete data just make sure to use cascade or ensure its associated data is deleted, which make sense.

Once I embrace it, FK now becomes my friend to enter data consistency, I cannot imagine how did I live without it before.

mccorrinall · on Sept 6, 2022

It’s can become pretty annoying if you want to alter your database schema.

astura · on Sept 6, 2022

floppydisc · on Sept 6, 2022

Yes, but it definitely depends on the model I'm putting into the database.

Specifically I hadn't thought through an Order model and the OrderLines ended up being dependent directly on the product meaning through an FK. The orders wouldn't "settle" once they had been completed, since the product could be updated and change the values of the orderline and order. Dumb dumb. It was one of the cases where denormalizing data, very much, makes sense

Philip-J-Fry · on Sept 6, 2022

He's talking about FK constraints right?

We do not use them for write performance concerns.

There's plenty of nice features of the DB we're not allowed to use under the excuse of "performance". But I'm told this by people who live and breathe SQL, so I trust them and I hope they have evidence to back it up. Because a lot of these features we're not allowed to use would make our lives 100x easier if we could!

connormcd · on Sept 6, 2022

If you've got a bit of time to spare, here's a 30min walk through of physical design in the database. https://youtu.be/x0P4zAptTiA

In this talk, we start at 10,000 transactions per second, and just by altering the design we get to 20,000 transactions per second... all on a 5 year old laptop.

And at no time did we ditch any constraints (primary or foreign). The claim that you can't get performance with constraints on a database is a myth

bawolff · on Sept 6, 2022

I mean, was that 20,000 transactions that are likely to have lock contention related to foreign key constraints? Because otherwise you are measuring the wrong thing.