Have to say, if you like to do bare metal SQL, using the nice features of Postgres, working with this lib is a total Joy!
No “fancy” and bleeding abstractions like something like Prisma, not convoluted type annotations like other TS libs, no, just plain SQL statements in the best JavaScript way of doing it.
I discovered learning to do Postgres a couple years ago, after getting sick of trying to hack Prisma, tried out several others, and after a couple minutes using this, i knew I wasn’t going back.
I'd love to see a library like ecto (eilxir) for javascript. its easily the best toolkit for working with sql I've ever found. simple, powerful and easy to reason about
the list would merit its own dedicated blog post but to highlight a few nice parts, ecto lets me write sql code in elixir. it does not try to wrap it all into objects but I do get schemas. so for instance I can create a schema `Player` which maps to a table "players"
I can then get them via
```
from(p in Player)
|> where([p], p.id in ^player_ids)
|> select([p], p)
|> Repo.all()
```
need to join with an association?
```
from(p in Player)
|> join([p], s in assoc(p, :scores), as: :score)
|> where([p], p.id in ^player_ids)
|> preload([p, score: s], [score: s])
|> select([p], p)
|> Repo.all()
```
this gets me all players with their scores.
need to get the sum of scores for all players?
```
from(p in Player)
|> join([p], s in assoc(p, :scores), as: :score)
|> where([p], p.id in ^player_ids)
|> select([p], %{
player_id: p.id,
score: sum(score.value)
})
|> group_by([p, score: s], [p.id])
|> Repo.all()
```
as you can see, you can unwind this pretty fast to sql in your head. but you can also let the system just grab the kitchen sink when you jsut need to get somethign working fast. and its all very easy to modify and decuple.
this is hardly exhaustive though, I could spend a day talking about ecto. I spent 8 years developing in nodejs but now I work exclusively in elixir. unless what you're doing is 90% done by existing libraries, its juts a better tool for building highly performant backends.
This is probably great to use, but it also highlights in comparison the beautiful simplicity of SQL. Doing something simple is hard, and SQL is so simple that I think most people find little value to abstractions built on it.
Their migrations tool is proprietary and not open-source. The project is basically an open source bait and switch riding off the edge computing hype (since prisma and other major JS ORMs have poor support for edge environments).
There's One Weird Trick™ in managing joins in this case -- create views that implement the joins and then just treat it as yet another table. Bonus feature is creating a consistent way of working with that relation using native DB tools. ;-)
that works great till your database gets big. Then you will watch your memory use skyrocket to perform joins into that view and it will F@#$@# your shit up. eventually it will take so much time to construct your view table that you'll tiemout your connection pool.
to be clear, dont' use views in production queries. you WILL regret it.
I love layers above sql because it can make a lot of domain logic nice to model. But dropping down when it it makes sense and finding a rock solid foundation is nice too.
<3 Entity Framework Core but also love the new interfaces that let me cast a db connection to Npgsql connection and use its advanced features and raw performance.
It would be great if postgresjs could underpin Knex and MikroOrm.
Yea, it is a totally fan to work with this code after a responsible developer leaves a company. No types, no annotations, no abstractions, no migrations, just plain SQL.
I can't read if you're being snarky or if you are being earnest.
If snarky, is this based on something that actually happened to you? If so, I would love to hear how that actually went about, and what it is you are unable to grasp when things aren't bogged down by complexity?
If earnest, I'm glad more people prefer code that isn't littered with premature abstractions, redundant types, and useless comments expressing what can be more clearly read in the actual code.
This library is proper fast. Don't get distracted comparing this to other libraries just on the basis of similar string template syntax; it doesn't have a dependency on pg because it implements its own low level client.
I've built systems on this loading multiple 10k records at time and it crushes.
Can load 10-20k records(ymmv) in 20ms or under where it would otherwise take 150ms. That can be a game changer based on use case by itself, but if you are used to thinking about what's viable in Ruby or etc it's even more inspiring.
It also supports logical decode which opens a lot of doors for processing transitional events.
Perhaps not as featured as OP’s but I’ve found https://pgtyped.dev/ useful. Queries are written as SQL files, and pgtyped creates typed functions you call with query parameters.
>> Prepared statements will automatically be created for any queries where it can be inferred that the query is static
What does this mean in practice? Like, actual prepared statements are created against the DB session if there are no bound variables, even if that query is only made once?
If so, it's an interesting but highly opinionated approach...
If you're using parameterized queries, then you _have_ to use PostgreSQL's "Extended Query" flow, which is what most people would think of as a "prepared statement". This is hardly opinionated.
But normally, you use an unnamed prepared statement and/or portal, which PG will clean up for you, essentially only letting you have one of those per session (what we think of as a connection).
I agree that sentence didn't make any sense. So I looked at the code (1) and what they mean is that they'll use a named prepared statement automatically, essentially caching the prepared statement within PG and the driver itself. They create a signature for the statement. I agree, this is opinionated!
In most SQL databases, once you have a connection handle, you can allocate an arbitrary number of statement handles, and each SQL "statement" can be executed independently, has an associated query, parameters, a result set, and so on.
It is entirely ordinary with an API like that to prepare a statement, bind parameters and columns, and execute and fetch the results. You can then reuse a statement in its prepared state, but usually with different parameter values, as many times as you want within the same session.
The performance advantage of doing this for non-trivial queries is so substantial that many databases have a server side parse cache that is checked for a match even when a client has made no attempt to reuse a statement as such. That is easier if you bind parameters, but it is possible for a database to internally treat all embedded literals as parameters for caching purposes.
Thanks... I haven't got the mental energy to follow their code ATM but yeah, it seems weird to buffer a static query as a prepared statement if it's only going to be used once.
Maybe that kind of goes with a Nodejs philosophy, though? It seems like an assumption that in most cases a static query will recur... and maybe that's usually accurate with long running persistent connections. I'm much more used to working in PHP and not using persistent connections, and so sparing hitting a DB with any extra prepare call if you don't have to, unless it's directly going to benefit you later in the script.
I believe it's typical for database libraries with auto-prepare to also have batch statement sending (and use it for prepare-executes), so there's no real cost to doing the prepare. The database has to parse/plan the statement regardless, and there's no round-trip delay with statement batching.
my own, very opinionated way of doing this in nodejs was to wrap node-mysql in promises and then force myself to make it explicit when preparing a a query whether I want it to set up a prepared statement which returns a reusable handle or run directly. That requires tracking prepared statement names and key/values for each one.
you'll probably find this bad code too, but it was more of an experiment... I still don't feel safe using nodejs in deployment.
Postgres supports 2 types of prepared statements - "generic" prepared statements (which can accept parameters) and "custom" which are prepared for a specific execution so bake the parameters in.
I feel obliged to namedrop Zapatos here, another postgres JS client with a remarkably similar design (sql in template strings), but fully typesafe. If TypeScript is your thing, you might appreciate it: https://jawj.github.io/zapatos/
Personally I feel it’s one of the best designed (and documented) TS libraries out there and I’m sad it’s not very well known.
I don’t know if this has changed recently, but I’m using multiple schemas with Zapatos just fine. The type you can import looks like `auth.user.Selectable`, where auth is the schema and user the table.
I didn’t know support for schemas had landed, but the docs still seem obtuse for joins and upserts. Compared to plain SQL files in PgTyped — which admittedly needs some annotations to work, and lacks transactions —, I find the lateral joins in Zapatos really verbose and ugly. Lots of extra parameters too.
Which isn’t to say it’s not a great tool! You pick what you like :)
Seconded. The lateral join features in combination with the shortcut functions make it exceptionally easy to build backends with actual SQL, without compromising on ease of use.
I've been using it in production (Deno) for a while. It's pretty solid, but their documentation is sorely lacking, so you often have to feel your way to getting more complex things to work.
Could you comment on what you're using deno in prod for? I'm curious to see and learn more about deno for production cases. I'm a big ts and nodejs user, and deno always seemed interesting to me. However I am afraid of being an early mover, so I like to learn more about use cases people have validated.
We use it for processing real-time UDP streams, as well as storing the data in postgres (as well as a hosting small API for exporting said data). We also use it to re-publish the UDP streams in a WebSocket format, for consumption by web applications.
My search results are always excellent. Google and DuckDuckGo deliver the documentation as first result. Bing delivers a stackoverflow answer about pg as the first result but the documentation is the second result.
I don’t really see any other problems or rather the problems are common and most developers have developed habits (like the search pattern above) to solve them.
This reminds me of a friend of mine.. He got struggled with a programming-related task. When he gave up, he turned to StackOverflow, where he found the exact answer there, ready to be copy/paste. Ironically, it turns out that he was the one who answered it a few years back :)
Oh, that happens to me quite a bit (I'm a top 500 SO user). I find some great question or answer and then try to upvote it and then get confused why it won't let me.
Totally agree. Slonik has had typed queries for some time now, but the integration with zod means that those types are verified at runtime. Incredibly useful with Typescript.
Test node_version:18 test_only:postgres-integration is taking 3 minutes.
Test node_version:18 test_only:pg-integration is taking 38 seconds.
It is possible that this is an issue with https://github.com/gajus/postgres-bridge, but I was not able to pinpoint anything in particular that would explain the time difference.
Author here - I'm actually a happy previous pg-promise user :)
If you check out the documentation I think you should be able to see if it's worth it for you to switch. My personal top reason is the safe query writing using tagged template literals, and the simpler more concise general usage. Then there's a lot around connection handling which is simpler, but also handles more cases. Then there's the performance. Postgres.js implicitly creates prepared statements which not only makes everything faster, but it also lowers the amount of work your database has to do. Pipelining also happens by default. As you can see by the benchmarks listed elsewhere, Postgres.js is quite a bit faster, so you can either get more oomph out of your current setup, or scale down to save some money ;)
Even so, if pg-promise works for you as it is, it might not make much sense to switch, but if you have the connection slots for it, you can run them side by side to get a feel for it too.
Another thing I remember when starting with pg-promise was the return value helpers, which I used a lot. Now - I think it's much nicer to have the simple API, and simple return value being an array, like Postgres.js does[1]. Especially now that we have destructuring, it just looks like this:
Tags are functions that receive an array of strings and then a series of arguments corresponding to values passed in the interpolation slots, allowing you to use template literal syntax while the tag function does something much more complex under the hood, potentially including returning non-string values.
Also, template strings like sql`foobar`, graphql`foobar` and others enable syntax awareness in IDEs.
I have a big Typescript / SQL mixed codebase (No ORM), and its very nice to have VSCode syntax color and code-format TypeScript, GraphQL and SQL in the same file
replace how? it can complement it, probably, in edge functions, but replace?
Postgrest-js is something you can put in your code running in client's browser or phone, or in any place you cannot trust code won't be tampered, thanks to many quirks and limitations of PostgREST.
How to make an actual postgres connection, which is simultaneously limited enough to be secure and safe to your data and data of other users, and still be useful for anything, I have no idea. Is it possible at all? Or, ok, how many different users it could handle?
For people who would like a more safe and structured approach (but almost as powerful and with the option to resort to SQL strings if needed) I highly recommend checking out Prisma https://www.prisma.io/
WTH? I'm completed gobsmacked. Just tested this and you're right. I just assumed that it would do that. I haven't seen any mention of this in the docs.
Maybe it's time to test the above package or maybe knexjs (I like up/down migrations and Prisma doesn't do 'down'). I'm certainly considering not using Prisma going forward.
Are there no times this would be preferable? Just curious. It seems like the DX Prisma provides could potentially outweigh the need for joins, especially if you don’t have a ton of referentiality. I’ve been a very happy Prisma user for a couple years now and take it very seriously fwiw.
The only reason to not support JOINs is because of the added complexity. However, since ORMs are supposed to (among other things) reduce the complexity for the user, not the DB, this seems like a poor decision.
There are other ORMs for the JS world. I can't imagine that they're all so much worse than Prisma as to render them non-choices.
> outweigh the need for joins
At toy scale, sure, it doesn't matter. Nothing matters - you can paginate via OFFSET/LIMIT (something else I've seen Prisma unexpectedly do) and still get moderately acceptable query latency. It's embarrassing, though, and will not scale to even _moderate_ levels. A simple SELECT, even one with a few JOINs (if they're actually being performed in the DB) should be executed in sub-msec time in a well-tuned DB with a reasonable query. In contrast, I've seen the same taking 300-500 msec via Prisma due to its behavior.
> especially if you don’t have a ton of referentiality
Then frankly, don't use RDBMS. If what you want is a KV store, use a KV store. It will perform better, be far simpler to maintain, and better meets your needs. There's nothing wrong with that: be honest with your needs.
I think the access pattern that Prisma uses (for a given records from X, give me it's relations to Y, then give me Y's nested relations to Z in a unified data model), using multiple queries may the the _best_ way (not necessarily the most efficient) to do it to produce the resulting object model.
If you were to do this with joins, you'd have a ton of duplicated data returned (via returning every column from every table to have all the relevant data) and you'd have to manually reduce the rows into the data structure you'd want; either in the db or in biz logic. You may be able to swing it somehow using Postgres's json functions but then it gets _super_ messy.
Prisma avoids that by just requesting the relevant level(s) of data with more discrete queries, which yes, result in _more_ queries but the bottleneck is really the latency between the Rust Prisma engine and the DB. Again, we're sacrificing some speed for DX, which imo, has made things much cleaner and easier to maintain.
You can also use cursors for pagination, which is definitely in their docs.
I see your points but unless there's some extreme network latency between your app(s) and the db (which can be minimized by efficiently colocating stuff) 300-500ms seems extreme. I would be curious if you logged out Prisma's queries and ran them independently of the client, whether you see the same latency.
> If you were to do this with joins, you'd have a ton of duplicated data returned (via returning every column from every table to have all the relevant data)
ARRAY_AGG for Postgres or GROUP_CONCAT for MySQL does what you’re asking without duplicating rows.
Re: JSON, don’t. RDBMS was not designed to store JSON; by definition you can’t even get 1st Normal Form once its involved.
IMO, claims of DX regarding SQL just mean “I don’t want to learn SQL.” It’s not a difficult language, and the ROI is huge.
After writing a lot of SQL over the past 10 years, I'm ready to not do it anymore lol. If I could never worry about syncing models with schema changes, hand writing migrations, etc, I would be happy. Most SQL work is CRUDy and monotonous to write. Prisma allows you to break out of the query builder with their "rawX" variants too and write plain SQL if you never need to (and I do occasionally).
Again, not saying you _cant_ do it with current db constructs but Prisma deals with all that for you while allowing escape hatches if you need them. Just like with anything related to software engineering, there are footguns a plenty. Being aware of them and taking the good while minizing the bad is the name of the game.
I’m all for using a library to help you with migrations, and even a full ORM like Django has its niceties. As a DBRE, I just want the end result to be performant, scalable, and to encourage good SQL habits.
Knowing SQL can help inform the choices a dev makes in the ORM - for example, knowing about semi-joins may let you write code that would cause the ORM to generate those, whereas if you didn’t, you may just write a join and then have to deal with the extra columns.
Just because RDBMS have added JSON support (MySQL has had it since 5.6 as well) doesn’t mean it’s a good fit. The language and its implementations are designed for relational data that can be normalized; JSON is neither.
Have you ever tested JSON vs. normalized data at scale? Millions+ of rows? I have, and I assure you, JSON loses.
Last I checked, Postgres does a terrible job at collecting stats on JSONB - the default type - so query plans aren’t great. Indexing JSON columns is also notoriously hard to do well (even moreso in MySQL), as is using the correct operators to ensure the indices are actually used.
Millions of rows with a JSON column? Yes, indeed I have. Recently in fact. When all normalized fields are populated, you're absolutely right. However that wasn't our dataset. We had entries with sparse keys. This was due to the nature of the data we were ingesting.
We ended with multiple partial expression indexes on the JSON column due to the flexibility it provided. Each index ended up relatively small (again, sparse keys), didn't require a boatload of null values in our tables, was more flexible as new data came in from a client with "loose" data, didn't require us to make schema migrations every time the "loose" data popped in, and we got the job done.
In another case, a single GIN index made jsonpath queries trivially easy, again with loose data.
I would have loved to have normalized, strict data to work with. In the real world, things can get loose without fault to my team or even the client. The real world is messy. We try to assert order upon it, but sometimes that just isn't possible with a deadline. JSON makes "loose" data possible without losing excessive amounts of development time.
This just feels like throwing the baby out with the bath water. For one, they're slowly adding more joins to queries, and two, a lot of times there's almost no performance difference.
This has been much discussed in many GitHub issues. People get stuck on joins because it's how they've always done it. Same with now Vercel convinced everyone they need SSG and not just a cache
People have always done it that way because it’s to date the most performant and efficient way to do so. RDBMS have enormous amounts of logic devoted to turning your query into the most efficient possible access pattern.
I thought the webdev startup shtick is to build fast and optimize speed where your bottlenecks are later. If Prisma helps you write 90% of your queries faster, you can still just drop down to SQL for efficient joins the rest of the times.
Prisma is great for schema management, but it's migration stuff broke for us and their updateMany implementation is just plain dumb.
We just use db push because prisma migrate fucked up in some unknown way that is impossible to recover from. The process in the docs for re-baselining doesn't work, and we're not about to reset our database.
The problem with ORMs is always that if something goes wrong you need to understand why, but then you need to also understand what the orm is doing. So it doesn't really abstract the database away, it just makes it easier to use. Or if you don't understand your database you're SOL. I mean, there was a big transaction isolation bug that you now have to deal with...which if you knew anything about databases you would have dealt with anyway.
Also, they pack a bunch of prebuilt binary handlers with their stuff, which makes using it awkward in serverless code.
We use prisma's db push to change schemas. Anything higher level we do with our own patch functions. I think this is how it works in ruby too, or how one place or another did it.
The db push deals with constraints, indexes, types, etc. we do everything else.
Sorry, but you’ve been fooled by the web and functional programmers propaganda against performance.
Also, what the author seems to mean is that this is faster than specific python, js and go clients.
Anyway. At the IO boundary, there’s a lot of syscalls (polls, reads, mallocs especially), and how you manage syscalls can be a major performance sticking point. Additionally, serialization and deserialization can have performance pitfalls as well (do you raw dog offsets or vtable next next next). How you choose to move memory around when dealing with the messages impacts performance greatly (ie, creating and destroying lots of objects in hot loops rather than reusing objects).
If that was true, then there wouldn’t be database clients that are 100% faster than competitor clients and web servers that are several hundred % faster than slower ones, don’t you think?
“IO therefor slow” is a “myth”. IO is slow, but the 1,000 millisecond request times from the server you’re sitting beside is not slow because of IO.
When experts say “IO is slow”, they’re not saying that you should therefor disregard all semblance of performance characteristics for anything touching IO. They are saying that you should batch your IO so that instead of doing hundreds of IO calls, you do 1 instead.
B very careful with json and postgressjs, there are strange bugs. Also there are various gotchas on the edges with typescript. AFAIK the code uses a lot of regex and that breaks json in particular.
Uff. Author here. Care to share more? There's no regexp related to json handling, maybe you're thinking of the dynamic query helpers? If not could point to an issue or create one on the repo it'd be great.
I think you should just keep a thing like this to yourself. It's attitudes like this that keeps people from daring to share their code with the world.
What you might find ugly, others find beautiful. I'm actually pretty proud of this lib, and i enjoy my style of coding. Your comment doesn't change that, it just kills a lot energy.
It’s the giant nested ternaries that kill me. In nearly all circumstances I can think of (Terraform being a notable exception), ternaries could be replaced with an if/else if block and be far more readable.
In my head I’m having to cast these to an if to figure it out anyway - why not just write it like that to begin with?
It's all about familiarity. To me ternaries gives me less ways in which the code can behave, so it takes less time to write, and less time to read. The obvious reason is that it's an expression and not a statement. So, the reason for me not writing it like this to begin with, is because I prefer ternaries over ifs. So much that i sometimes rewrite other code from ifs to ternaries to better grasp it. But that's me.
You aren't writing this for public opinions, presumably.
I've also written a Postgres wire protocol implementation and a blog post on how to do so, which people are welcome to say "It's terrible" about if they wish.
Most people care to some degree what other people say about them / their work.
The fact that you've also dabbled in this area could lead to a far more interesting discussion. Wouldn't mind seeing your blog post. Who knows, maybe I've already read it at some point?
btw the reason for extending eg Array is to make for a much better surface API when using the library.
Being able to do
const [user] = await sql`...`
leads to some very readable concise code.
No “fancy” and bleeding abstractions like something like Prisma, not convoluted type annotations like other TS libs, no, just plain SQL statements in the best JavaScript way of doing it.
I discovered learning to do Postgres a couple years ago, after getting sick of trying to hack Prisma, tried out several others, and after a couple minutes using this, i knew I wasn’t going back.