Prisma doesn't do JOINs [0], it just SELECTs everything and joins them in its ow...

vlod · on Oct 14, 2023

WTH? I'm completed gobsmacked. Just tested this and you're right. I just assumed that it would do that. I haven't seen any mention of this in the docs.

Maybe it's time to test the above package or maybe knexjs (I like up/down migrations and Prisma doesn't do 'down'). I'm certainly considering not using Prisma going forward.

Thanks for bringing this up.

halfmatthalfcat · on Oct 14, 2023

Are there no times this would be preferable? Just curious. It seems like the DX Prisma provides could potentially outweigh the need for joins, especially if you don’t have a ton of referentiality. I’ve been a very happy Prisma user for a couple years now and take it very seriously fwiw.

sgarland · on Oct 14, 2023

Take this with the bias of a grumpy SRE/DBRE.

The only reason to not support JOINs is because of the added complexity. However, since ORMs are supposed to (among other things) reduce the complexity for the user, not the DB, this seems like a poor decision.

There are other ORMs for the JS world. I can't imagine that they're all so much worse than Prisma as to render them non-choices.

> outweigh the need for joins

At toy scale, sure, it doesn't matter. Nothing matters - you can paginate via OFFSET/LIMIT (something else I've seen Prisma unexpectedly do) and still get moderately acceptable query latency. It's embarrassing, though, and will not scale to even _moderate_ levels. A simple SELECT, even one with a few JOINs (if they're actually being performed in the DB) should be executed in sub-msec time in a well-tuned DB with a reasonable query. In contrast, I've seen the same taking 300-500 msec via Prisma due to its behavior.

> especially if you don’t have a ton of referentiality

Then frankly, don't use RDBMS. If what you want is a KV store, use a KV store. It will perform better, be far simpler to maintain, and better meets your needs. There's nothing wrong with that: be honest with your needs.

halfmatthalfcat · on Oct 14, 2023

I think the access pattern that Prisma uses (for a given records from X, give me it's relations to Y, then give me Y's nested relations to Z in a unified data model), using multiple queries may the the _best_ way (not necessarily the most efficient) to do it to produce the resulting object model.

If you were to do this with joins, you'd have a ton of duplicated data returned (via returning every column from every table to have all the relevant data) and you'd have to manually reduce the rows into the data structure you'd want; either in the db or in biz logic. You may be able to swing it somehow using Postgres's json functions but then it gets _super_ messy.

Prisma avoids that by just requesting the relevant level(s) of data with more discrete queries, which yes, result in _more_ queries but the bottleneck is really the latency between the Rust Prisma engine and the DB. Again, we're sacrificing some speed for DX, which imo, has made things much cleaner and easier to maintain.

You can also use cursors for pagination, which is definitely in their docs.

I see your points but unless there's some extreme network latency between your app(s) and the db (which can be minimized by efficiently colocating stuff) 300-500ms seems extreme. I would be curious if you logged out Prisma's queries and ran them independently of the client, whether you see the same latency.

sgarland · on Oct 14, 2023

> If you were to do this with joins, you'd have a ton of duplicated data returned (via returning every column from every table to have all the relevant data)

ARRAY_AGG for Postgres or GROUP_CONCAT for MySQL does what you’re asking without duplicating rows.

Re: JSON, don’t. RDBMS was not designed to store JSON; by definition you can’t even get 1st Normal Form once its involved.

IMO, claims of DX regarding SQL just mean “I don’t want to learn SQL.” It’s not a difficult language, and the ROI is huge.

halfmatthalfcat · on Oct 14, 2023

After writing a lot of SQL over the past 10 years, I'm ready to not do it anymore lol. If I could never worry about syncing models with schema changes, hand writing migrations, etc, I would be happy. Most SQL work is CRUDy and monotonous to write. Prisma allows you to break out of the query builder with their "rawX" variants too and write plain SQL if you never need to (and I do occasionally).

Again, not saying you _cant_ do it with current db constructs but Prisma deals with all that for you while allowing escape hatches if you need them. Just like with anything related to software engineering, there are footguns a plenty. Being aware of them and taking the good while minizing the bad is the name of the game.

sgarland · on Oct 14, 2023

I’m all for using a library to help you with migrations, and even a full ORM like Django has its niceties. As a DBRE, I just want the end result to be performant, scalable, and to encourage good SQL habits.

Knowing SQL can help inform the choices a dev makes in the ORM - for example, knowing about semi-joins may let you write code that would cause the ORM to generate those, whereas if you didn’t, you may just write a join and then have to deal with the extra columns.

ttfkam · on Oct 14, 2023

Re: JSON https://www.postgresql.org/docs/current/datatype-json.html

Re: RDBMS was not designed to store JSON https://youtube.com/watch?v=rpw_x8TtqTo

sgarland · on Oct 15, 2023

Just because RDBMS have added JSON support (MySQL has had it since 5.6 as well) doesn’t mean it’s a good fit. The language and its implementations are designed for relational data that can be normalized; JSON is neither.

Have you ever tested JSON vs. normalized data at scale? Millions+ of rows? I have, and I assure you, JSON loses.

Last I checked, Postgres does a terrible job at collecting stats on JSONB - the default type - so query plans aren’t great. Indexing JSON columns is also notoriously hard to do well (even moreso in MySQL), as is using the correct operators to ensure the indices are actually used.

ttfkam · on Oct 15, 2023

Millions of rows with a JSON column? Yes, indeed I have. Recently in fact. When all normalized fields are populated, you're absolutely right. However that wasn't our dataset. We had entries with sparse keys. This was due to the nature of the data we were ingesting.

We ended with multiple partial expression indexes on the JSON column due to the flexibility it provided. Each index ended up relatively small (again, sparse keys), didn't require a boatload of null values in our tables, was more flexible as new data came in from a client with "loose" data, didn't require us to make schema migrations every time the "loose" data popped in, and we got the job done.

In another case, a single GIN index made jsonpath queries trivially easy, again with loose data.

I would have loved to have normalized, strict data to work with. In the real world, things can get loose without fault to my team or even the client. The real world is messy. We try to assert order upon it, but sometimes that just isn't possible with a deadline. JSON makes "loose" data possible without losing excessive amounts of development time.

willsmith72 · on Oct 14, 2023

This just feels like throwing the baby out with the bath water. For one, they're slowly adding more joins to queries, and two, a lot of times there's almost no performance difference.

This has been much discussed in many GitHub issues. People get stuck on joins because it's how they've always done it. Same with now Vercel convinced everyone they need SSG and not just a cache

sgarland · on Oct 14, 2023

People have always done it that way because it’s to date the most performant and efficient way to do so. RDBMS have enormous amounts of logic devoted to turning your query into the most efficient possible access pattern.

If what you want is NoSQL, then use NoSQL.

reducesuffering · on Oct 14, 2023

I thought the webdev startup shtick is to build fast and optimize speed where your bottlenecks are later. If Prisma helps you write 90% of your queries faster, you can still just drop down to SQL for efficient joins the rest of the times.

sgarland · on Oct 14, 2023

IME it’s more like “build fast and then blame the DB later when it’s slow.”

I’m begging devs to learn the absolute basics of SQL and relational algebra. It isn’t that hard.