More

stephen · 2024-10-12T20:12:03 1728763923

Using raw SQL directly is doable, but it means you're responsible for maintaining the business logic & validation rules of every single hand-written INSERT, UPDATE, and DELETE query in your codebase.

Personally I don't trust myself to remember to do that :-) hence preferring entity-based ORMs:

https://joist-orm.io/docs/modeling/why-entities

(That said, I definitely "know SQL" and use raw SQL queries for the ~5% of queries in a CRUD/SaaS app that are actually unique/non-boilerplate, instead of forcing them to go through an obtuse query builder DSL.)

stephen · 2024-09-13T21:04:01 1726261441

I mean, of course they are a good idea, what we need is more examples of actually doing them in practice. :-)

I.e. quoting from the post:

- monolithic databases need to be broken up - Tables must be grouped by module and isolated from other modules - Tables must then be migrated to separate schemas - I am not aware of any tools that help detect such boundaries

Exactly.

For as much press as "modular monoliths" have gotten, breaking up a large codebase is cool/fine/whatever--breaking up a large domain model is imo the "killer app" of modular monoliths, and what we're missing (basically the Rails of modular monoliths).

stephen · 2024-09-12T02:32:53 1726108373

We built Joist, an ActiveRecord-style ORM in TypeScript, on top of Facebook's dataloader library, and haven't had a single N+1 in ~4 years:

https://joist-orm.io/docs/goals/avoiding-n-plus-1s

Any "query in a loop", whether a manually-written `for` loop, or an "emergent for loop" from 100 `after_update` lifecycle hooks firing at once, are always auto-batched, every time.

Joist is also likely why we actually like GraphQL, vs. it generally having a reputation for being a pita to implement (N+1s everywhere w/o tedious, manual batching).

stephen · 2024-07-23T16:01:58 1721750518

Maybe its Stockholm syndrome after ~4-5 years of TypeScript, but I like knowing "this method call is going to do I/O somewhere" (that its red).

To the point where I consider "colorless functions" to be a leaky abstraction; i.e. I do a lot of ORM stuff, and "I'll just call author.getBooks().get(0) b/c that is a cheap, in-memory, synchronous collection access ... oh wait its actually a colorless SQL call that blocks (sometimes)" imo led to ~majority of ORM backlash/N+1s/etc.

Maybe my preference for "expressing IO in the type system" means in another ~4-5 years, I'll be a Haskell convert, or using Effect.ts to "fix Promise not being a true monad" but so far I feel like the JS Promise/async/await really is just fine.

CuriousSkeptic · 2024-07-24T13:12:05 1721826725

I seem to recall a similar argument being made a while ago: https://scholar.harvard.edu/files/waldo/files/waldo-94.pdf

goatlover · 2024-07-23T23:34:01 1721777641

Don't you generally know when you're making an I/O call?

t-writescode · 2024-07-24T03:02:02 1721790122

Not when I'm using some library function that's "helpful"

stephen · 2024-05-29T05:29:13 1716960553

> So if you access a1.b then it will also fetch a2.b and > cache it if you loaded a2 within that same transaction.

Ah ha! I'd implemented the same trick in this older Java ORM:

https://github.com/stephenh/joist-java/blob/master/domain/sr...

It worked well!

stephen · 2024-05-29T05:25:32 1716960332

Definitely appreciate your "sometimes ORMs/sometime raw SQL" pragmatic stance. I agree!

> then if you join on both you now are loading A * B * C rows

I'm realizing it's not actually written up in our proper docs, but the "cross join lateral" trick is pretty neat:

https://joist-orm.io/blog/nextjs-sample-app#join-based-prelo...

This will return A number of (author) rows, where each author row as a `_b` column that is a JSON array of that author's books. And if you asked for C comment rows, then one of the "columns"/indices in each of those book arrays will itself be a nested array of the book's comments.

So all the data gets returned, but with no repetition/Cartesian product.

Hat tip to PostGraphile where I first saw the "cross join later" + "json_agg" technique used. Disclaimer we're not actually running this Joist feature in production yet--just haven't had time to run it through the test suite.

superice · 2024-05-29T05:54:32 1716962072

I wasn't aware of Joist until I read a comment of yours somewhere else in this comment section, interesting stuff! I think Joist is the closest thing I've seen to an actual fix to this problem. The way you use the JS event loop as your 'scope' is pretty clever, and satisfies most use-cases I think.

It aligns pretty closely to what we do internally. We don't use classes or annotations to do entity definitions, instead we'll have something like this:

    const User = {
        fields: {
            id: defineIdField(),
            email: defineField('string', false), // false = NOT NULL
        },
        manyToOne: {
            role: { entity: 'Role', nullable: false },
        },
    };

Then the way to query and save is something like this:

    datasource.inTransaction(async (orm) => {
        const users = await orm.findMany('User', { where: sql`...`, orderBy: sql`email asc` });
    
        // Await in this loop might look bad, but only blocks for the first user
        for (const user of users) {
            console.log(user.email, (await user.role).name);
        }

        const adminRole = await orm.findOne('Role', 1); // by ID, throws if not found
        return orm.saveMany('User', users.map(u => ({ ...u, role: adminrole }));
    });

Which looks different from what Joist is doing, but the fetching mechanics under the hood are very similar I think. Our 'scope' is just somewhat more explicit, since we don't use the event loop mechanics of JS but rather use the inTransaction callback as the boundary.

(The reason we haven't open sourced this yet is that I'd like to iron out some of the ugly parts first, like the string entity identifiers, and the forced usage of an id column, and the lack of type safe where clauses right now)

stephen · 2024-05-29T00:50:03 1716943803

I can assure you that brandur knows SQL. :-)

(I don't know him personally, but have been following his blog for years.)

What these "just write SQL" rants are missing is encapsulation--let's say you've got a business logic, like "if the account is disabled, render the account name as 'Foo (disabled)'".

You want to write this logic in your preferred backend language, Go/TS/C/etc.

This works fine, in the /account/X endpoint (just load the account, and apply the logic).

But now what about the /accounts/client:Y endpoint (load all accounts, all the logic for all of their accounts)

As time goes by, you end up having 10-20 endpoints that all "return some part of account" as their payload, and you want the same "Foo (disabled)" business logic.

Your options are:

1. Build a single, giant SQL statement that strings together every snippet of business logic applicable to this endpoint (bulk friendly b/c the db is doing all the cross-entity work w/joins)

2. Push the business logic down into the db layer (simple for this, just string concate with an if, but what about anything that is a loop? doable in SQL but tedious)

3. Use your language's abstractions, like functions, to organize the business logic (what Brandur is attempting to do).

Nearly everyone wants to do 3, because 1 doesn't scale as your business logic becomes more & more sophisticated (copy/pasting it around every endpoint's single-giant SQL statements, or pushing it down into the db as views/stored procedures).

sgarland · 2024-05-29T12:02:42 1716984162

> because 1 doesn't scale as your business logic becomes more & more sophisticated (copy/pasting it around every endpoint's single-giant SQL statements, or pushing it down into the db as views/stored procedures).

I think the idea that doing business logic inside the DB is an anti-pattern is cargo culting. You can quite easily store schema definitions, stored procedures, etc. inside version control. You can document it just as you would any other code. You don’t have to see what precisely the call is doing at all times, provided you understand its signature.

Letting the DB do more than just be a dumb data store is a great idea, IMO, and one that is too often eschewed in favor of overly-complicated schemes.

bbkane · 2024-05-29T13:38:31 1716989911

I know putting business logic in the DB has been used very successfully, but it also has some large downsides.

It's harder to express some things as SQL and your team will be less familiar with SQL than your preferred language. SQL language tooling (code autocomplete, testing libs, etc) is also far less mature than tooling for most languages.

It's harder to debug when things go wrong. Where do you put a breakpoint?

It's likely your team ends up splitting your business logic between your app and your DB, and then you have to figure out which part is writing incorrect data when there's a problem.

For my apps, I'm trying to set up the database to reject incorrect information (with types and constraints and sometimes triggers), but have the app code do all the business logic and writing data.

Honestly, it's too soon to tell whether my apps will be successful with this approach. They haven't yet gone through years of real world usage and requirements changing. But so far it's gone well!

sgarland · 2024-05-29T16:05:53 1716998753

> It's harder to express some things as SQL

This is true, but then, it's also harder to write Rust than Python. There are tradeoffs which are made for every choice.

> and your team will be less familiar with SQL than your preferred language.

Also true, but given how SQL is practically everywhere and is not likely to go away any time soon, I strongly feel that nearly every dev could benefit from learning it.

> SQL language tooling (code autocomplete, testing libs, etc) is also far less mature than tooling for most languages.

Again, true. Personally I dislike autocomplete functions (they break my flow, needing to parse the suggestion and accepting it, rather than just typing from muscle memory), but I get that others like them. Re: testing, I have an unpopular opinion: this matters very little compared to most other languages. Declarative languages like SQL or Terraform generally do not produce surprises. If you get an unexpected output, it's probably because you have incorrect logic. There are some testing frameworks that exist for various flavors of SQL should you wish, but IMO as long as you have some rigorous E2E tests, and your dev/staging environments are accurate and representative of prod, you'll be fine.

> For my apps, I'm trying to set up the database to reject incorrect information (with types and constraints and sometimes triggers), but have the app code do all the business logic and writing data.

Hey, you're doing better than most! I've never understood people's unwillingness to use things like length constraints. Sure, your app should ideally catch basic issues so they never have to make it to the DB, but I'd rather have the guarantee. The last thing you want is for someone to find a way to inject the entirety of Moby Dick into a `name` field.

d0100 · 2024-05-29T01:43:05 1716946985

Since I mainly use Postgresql, I mix 1, 2 and 3: I create CTE's in code which are then reused in queries

    sqrl.Select("id", "cte1.*", "cte2.*").From("mytable").Join(CTE1{}).Join(CTE2{}).Where(etc)

xarope · 2024-05-29T01:54:19 1716947659

for 3, you could write a stored proc to handle the various situations, and call that stored proc appropriately, letting the DB engine optimize appending "(disabled)".

however, I do wish Go had a map function ala python map() as opposed to just verbosely writing more for loops or a complicated thread-safe goroutine. Seems almost strange that Google, who popularized the mapreduce model, doesn't want it in Go.

sgarland · 2024-05-29T11:16:45 1716981405

Stored procedures are also screeched at as being anti-patterns, somehow. I don’t get it; you can store them in version control just like anything else, and add comments where necessary in the code base indicating what the DB is doing.

jimbokun · 2024-05-29T02:34:22 1716950062

Now that Go has generics, you can write “map_slice” in a few lines then use it everywhere. I’ve done it myself.

Not sure why they haven’t added it to the standard library.

stephen · 2024-05-28T23:59:08 1716940748

Ah, I look forward to every brandur post! :-)

If he can give up Go, we've got a TypeScript ORM that will de-N+1 basically everything* that is not a paginated/limit-offset query:

https://joist-orm.io/docs/goals/avoiding-n-plus-1s

This works even in adhoc loops, i.e. if you have a lifecycle hook** of "after an author changes, do x/y/z logic", and you update 100 authors, every SQL operation invoked by those ~100 individual hooks is auto-batched.

We've been running this in production for ~4 years at this point, and haven't had an N+1 since then (although we didn't initially support auto-batch find queries; that came later).

Of course kudos to dataloader.

*everything --> any queries our "find" API supports, which doesn't do aggregates, sums, havings, etc.

**lifecycle hooks --> yes, a blessing and a curse; we're always attempting to find better/higher-level abstractions for declaring the intent of business logic, than raw/imperative hooks.

stephen · 2024-05-29T00:41:19 1716943279

Besides pitching Joist, going through OP, I'm not following how verbose the nested examples would get, i.e. ProductLoadBundle is loading "products & widgets".

But what if I need "products & widgets & widget orders". And like sometimes I want "just products & widgets" (2 levels) and sometimes I want "products & widgets & widget orders (3 levels)"?

Would these be the same "ProductLoadBundle" with some conditionals to the Product.LoadBundle method? Would the result of those conditionals be seen in the type-safe as, i.e. sometimes the 3rd level is available, sometimes it is not?

Or would it be two separate bundles, a 2-level Product.LoadTwoLevelBundle and a 3-level Product.LoadWidgetsAndOrdersBundle, which has the pro of better type-safety, but a con of repeating the bundle boilerplate for each unique shape/tree of data your app needs to load.

My guess/hope is that it's the 2nd, if only because I assume brandur also values type-safety over verbosity.

It took me awhile to find again, but this OG scala library in theory handled these adhoc shapes (called "enrichments" in their readme) somewhat elegantly:

https://index.scala-lang.org/getclump/clump

Mea culpa another Joist pitch, but TypeScript makes this "give me an adhoc type based on the call-site specific tree of data" super-pleasant, i.e. Loaded<Product, "widgets"> vs. Loaded<Product, { widgets: "orders" }> gets you the two different 2-level vs. 3-level types.

...granted, you just have to accept non-Go/non-native levels of performance., but tradeoffs. :-)

stephen · 2024-04-28T03:29:42 1714274982

I worked in a Rails app in ~2018 & hunting down N+1s by sprinkling `includes` in just the right places was tedious, despite being a regular occurrence.

Since then I've been building Joist, which is written in TypeScript, and has dataloader integrated into every operation (even `find`s) to basically never N+1:

https://joist-orm.io/docs/goals/avoiding-n-plus-1s

stephen · 2024-03-23T22:37:47 1711233467

Love a tiling WM post! Reading the "twos days to get setup" reminds me of fighting a blank xmonad setup back in the day :-), and makes me all the more appreciative of Regolith Desktop, a super-minimal Ubuntu+i3wm (and sway) setup; everything just works, Zoom sharing, multi-window, etc. Recommended!

gchamonlive · 2024-03-23T22:57:01 1711234621

And putting effort into adding every feature you need makes me appreciate it even more. I don't mean to say that humans derive value from suffering, but looking back at some of the headache and seeing that they are mostly solved now gives me this cool peace of mind. Plus, if anything breaks I know how to solve it. It is akin to moving from some batteries included distro like pop_os to arch. Nothing against pop, it is a wonderful effort, but reading through the documentation and putting your distro together gives you way more insight into the inner workings which is invaluable for when it breaks.

stephen · 2024-03-24T02:50:05 1711248605

> reading through the documentation and putting your distro together gives you way more insight into the inner workings which is invaluable for when it breaks.

I definitely feel that way about the core libraries/frameworks I'm building my applications/codebases on top of, so can relate!

But for my WM/desktop, I'm pretty happy to a) not know it's innermost workings, and b) just have it not break in the first place! :-)

I guess we can each have our own yak-shaving preferences. :-)

gchamonlive · 2024-03-24T11:29:12 1711279752

I was exactly like that until kde decided it knew what was best for me and killed all the kwin scripts I relied upon with plasma 6 :(

But yeah! We can't know everything, it is very important to consciously decide what to be ignorant about. What is relevant to me might not be relevant to you and that is the beauty of it. If everyone got interested in the exact same thing, just imagine the chaos.