Hacker News new | past | comments | ask | show | jobs | submit login
How We Went All In on sqlc/pgx for Postgres and Go (brandur.org)
293 points by conroy on Sept 8, 2021 | hide | past | favorite | 155 comments



From the article:

> I’ve largely covered sqlc’s objective benefits and features, but more subjectively, it just feels good and fast to work with. Like Go itself, the tool’s working for you instead of against you, and giving you an easy way to get work done without wrestling with the computer all day.

I've been meaning to write a blog post about sqlc myself, and when I get to it, I'll probably quote this line. sqlc is that rare tool that just "feels right". I think that feeling comes from a combination of things. It's fast. It uses an idiomatic Go approach (code generation, instead of e.g. reflection) to solve the problem at hand, so it feels at home in the Go ecosystem. As noted in the article, it allows you to check that your SQL is valid at compile-time, saving you from discovering errors at runtime, and eliminating the need for certain types of tests.

But perhaps most of all, sqlc lets you just write SQL. After using sqlc, using a more conventional ORM almost seemed like a crazy proposition. Why would someone author an ORM, painstakingly creating Go functions that just map to existing SQL features? Such a project is practically destined to be perpetually incomplete, and if one day it is no longer maintained, migration will be painful. And why add to your code a dependency on such a project, when you could use a tool like sqlc that is so drastically lighter, and brings nearly all the benefits?

sqlc embraces the idea that the right tool for talking to a relational database is the one we've had all along, the one which every engineer already knows: SQL. I look forward to using it in more projects.


> code generation, instead of e.g. reflection

"Hey man, we noticed there's not enough compiler in your compiler, so we made a second compiler for your compiler."


The one thing that kept annoying me with compilation type orms like this is the chicken and egg situation.

Your migrations will likely be run by your application, but your application won't compile until you've run your migrations.


"Your compiler compiles A, but you also want to compile B, so we added an extra compiler to help you compile it"


Makes sense, Go can't compile SQL and vice versa.


Nah, I love SQL, but if I have to choose between SQL and ORM, I will go with ... something in between, like the core layer of SQLAlchemy.

Just take a look at that monstrosity of an UPDATE statement in the article. In my first job, I had to do something similar in SQL, but with a SELECT statement to allow users to search dynamically using any combination of the columns. I picked up SQLAlchemy in my second job, and have never looked back.


I get this, but personally I love using jOOQ because of how composable and dynamic query building is with a good query builder pattern. Adding optional filters, joins etc doesn't turn into a string-concatenation abomination; it's something. can do in native readable Java

(and then translating back to generated Java objects is also great compared the alternative of unpacking some generic result set, casting to the right type, etc)


Working with SQL in X (any language) usually has a poor developer experience that is why ORM or query builders are popular. Things like proper syntax highlight or type safety (I remain to be convinced that sqlc can really check the validity at compile, usually it only works in specific basic cases).

You just have to choose wisely your tools for sure, but most of the code you write needs to be rewritten anyway every X years.


> Working with SQL in X (any language) usually has a poor developer experience that is why ORM or query builders are popular.

At least when it comes to Postgres, I don't understand why more developers don't create their own user-defined functions with PL/pgSQL. It's very a robust and powerful procedural language. In my opinion, ORM's like SQLAlchemy add a completely unnecessary layer of abstraction. ORMs might be convenient for quick/simple queries, but when you're creating or troubleshooting a moderately complex query, it's far better to do it with native SQL than it is with a chained mess of ORM methods.

At my company, we employ UDFs for every SQL query we make and every UDF returns a declared type. The end result is far easier to debug and maintain when compared with the ad hoc ORM query alternatives. Plus it has the side benefit of separating out application code from database code which allows our DBAs to more effectively code review SQL code written by developers.

> You just have to choose wisely your tools for sure, but most of the code you write needs to be rewritten anyway every X years.

Not if you stick with plain SQL and/or Pl/pgSQL :-)


I did this on a recent project, and it worked really well.

I had each function definition in its own .sql file, with a preceding "drop function" call, and a Makefile clause to run them all. Which meant managing versions was easy (coupled with migration .sql files). I also got to find out if any of my SQL was broken right up front, and testing the SQL was simple - call the function and check the return.

I also defined views for return types, so mapping the return values to the structs in the Go code was easy (yes it's boilerplate, but it really is not as painful as the author makes out). Query functions always returned the relevant view type (or a set of them).

The author's approach seems to go to great lengths to avoid a relatively small amount of boilerplate.


Could you please give 2 specific examples on how this would work? Are the functions only for UPDATE/INSERT, or also for reading data? I'm using views to simplify queries, but still via ORM.


Sure, why not.

Functions are for both reading and modifying data, because the actual SQL for the query might be complex (and therefore better managed as a function than as a string in Go code).

Example: User CRUD

    create view vw_user as select id, name, email from users;

    create function fn_get_user(p_user_id uuid) returns vw_user as $$ select * from vw_users where id = p_user_id $$;

    create function fn_change_user_name(p_user_id uuid, p_name text) returns vw_user as $$ update users set name = p_name where id = p_user_id; select * from vw_users where id = p_user_id; $$

    create function fn_create_user(p_name text, p_email text) returns vw_user as $$ insert into users (id, name, email) values (gen_random_uuid(), p_name, p_email); select * from vw_users where id = p_user_id; $$
The advantage here is that there is only one return type, so you only need one ScanUser function which returns a hydrated Go User struct. If you need to change the struct, then you change the vw_user view, and the ScanUser function, and you're done.

Each function maps 1:1 to a Go function that calls it, though it's also possible/easy to have more complex Go functions that call more than one db function. Or indeed, meta-functions that call other functions before returning a value to the Go code.

The problem with ORMS is always that eventually the mapping between struct and database breaks, and you end up having to do some funky stuff to make it work (and from then on in it gets increasingly complex and difficult). The structures in the database are optimised for storing the data. The structures in the Go code need to be optimised for the business processes (and the structures in the UI are optimised for that, so they will be different again). An ORM ignores all of this and assumes that everything maps 1:1. Maintaining those relationships manually (rather than in an ORM) does involve some boilerplate, but it allows you to keep the structures separate.

For example: User UI data:

    create view vw_ui_user as select users.id as user_id, users.name, users.email, sessions.id as session_id, max(sessions.when_created) as last_logged_in from users left join sessions on users.id = sessions.user_id group by 1,2,3,4;

    create function get_ui_user(p_email text) returns json as $$ select to_json(select * from vw_ui_user where email = p_email);$$
The json data generated from this function can be returned direct to the UI without the Go code needing to do anything to it. If the UI needs different data, the view can be changed without affecting anything else.

caveat I didn't bother checking this for syntax or typos. I have probably made several errors in both.


Thanks so much Marcus, it makes more sense now!


> I had each function definition in its own .sql file, with a preceding "drop function" call, and a Makefile clause to run them all.

why drop instead of CREATE OR REPLACE ?


CREATE OR REPLACE requires that the new function has the same signature as the old one [0]. I understand why, but I needed to change the signature sometimes. It was easier to do a Drop and then a Create as standard. Though this did mean having to manage dependencies between files myself (I prefixed the .sql files with 00_, 01_,02_ etc to indicate order of dependencies). It sounds like a lot of work, but in practice it was easy - it broke very quickly and very loudly if I got it wrong at all ;)

[0] https://www.postgresql.org/docs/13/sql-createfunction.html


Two problems

1. How do you handle versioning? Like if you want to try a development branch on a non-branch/shared db. Creating different version of stored procedures creates a recursive problem. A calls B, now A’ has to call B’

2. Sometimes we still need to programmatically decide to include a table in the join or not or get creative on a filter. Pl/pgsql is less flexible in this regard. You get the benefit of syntax checking only when query is verbatim and not dynamically constructed.


> 1. How do you handle versioning? Like if you want to try a development branch on a non-branch/shared db. Creating different version of stored procedures creates a recursive problem. A calls B, now A’ has to call B’

Writing UDFs and using Pl/pgSQL has no impact on how you do versioning. At my company we follow standard Gitflow and use golang-migrate for schema migrations (or Phinx for our PHP code bases).

If you're working at a company where developers are all forced to use the same shared database, then you're going to have a lot of development challenges that are unrelated to UDFs and Pl/pgSQL. Multiple devs sharing the same database always requires some team coordination to ensure that each member isn't stepping on another's toes -- whether that be prefixing your UDFs with your initials during development or agreeing not to work on the same UDFs at the same time.

> 2. Sometimes we still need to programmatically decide to include a table in the join or not or get creative on a filter. Pl/pgsql is less flexible in this regard. You get the benefit of syntax checking only when query is verbatim and not dynamically constructed.

That's just untrue. Pl/pgSQL fully supports conditional logic, dynamic query string construction, multi-query transactions, storing intermediate result sets in a variable or temp table, etc. The use case you described is actually a great example of when you would decide to use Pl/pgSQL. The language is extremely robust.


I get that pgsql can construct dynamic queries, but I was assuming you were talking about the benefit of install time / compile time verification of query syntax. This is true for most regular stored procedures except when query is dynamic. Obviously the exact query isn’t known until runtime. I agree it is not a major downside.


dynamic query string construction

Is this the same as concatenating strings or is there some special PL/pgSQL support for this?


Yep, close to 100% of my data manipulation is done in pl/pgsql. It’s awesome. At least 50% fewer LOC, and 10-100x faster than the equivalent code written in Java or Go, due to all the round trips.


Oh yeah, I completely forgot to mention that. The performance gains are immense when you use Pl/pgSQL to eliminate round-trips to the database. That's easily one of the most important reasons to use Pl/pgSQL.

The vast majority of data-heavy web apps today must have the database running on the same server or within the same datacenter -- they can't tolerate any kind of latency between the application and the database server because they failed to reduce round-trips. Ever tried deploying MediaWiki on an application server with >20ms latency to the database server? It just doesn't work -- each page takes several seconds to load.

If you minimize round-trips to the database, it gives you more flexibility on how you can deploy/host your database server. That's flexibility you want when you're designing failover/disaster recovery schemes.


Yep, I realised I needed to give plpgsql a shot when I started thinking, from first principles, about all the effort I was wasting. Not just machine cycles - the buffer copying, context switches, network switches, latency - but also, as I was working in Java at the time, there was the immense weight of the ridiculous JPA ORM sitting on top of it all, making it worse. When I took a step back and realised what we had done, the minimalist in me went into cardiac arrest.

With plpgsql you define your schema once, in SQL alone; you don't write a million duplicate "entity objects" in your language of choice, there is no friction or "impedance mismatch", no need to catch network errors for each DB call -- you just write SQL and return values like any other Go function. Because my functions are generally self-contained, I rarely even need to bother with transaction management, which eliminates even more round trips.

It's true that I needed to write some supporting code to manage schema upgrades (one day I hope to open source it), and I'm really intrigued to see if I can use sqlc to create Go stubs for my PG functions. But my SQL code sits next to my Go code in my IDE, it's syntax and correctness checked by the GoLand IDE, and life with an SQL database is super enjoyable!

I'm looking forward to integrating plpgsql_check into my build chain.


I’d love to hear how you handle code coverage, dynamic queries (custom filters in reports, etc.), live schema migrations (we aren’t brave and don’t do that at all - and don’t use functions either).

I’m coming from a sqlalchemy background if that helps to set the context.


All my test code runs against sqlite. I'll only need a postgres instance for one or two geospatial queries that use postgis and for now that's manual, so this speeds up my builds. SQL functions tie you in


JetBrains IDEs help with this actually.

When I was working with Go, all SQL longer than one or two lines went into the package level sql.go file, each query being a multi-line string. If you preface it with a comment like `// language=PostgreSQL`, you get syntax highlighting for PGSQL. What's more, if you have databases configured in your IDE, it tries to cross-reference the tables, columns, etc, and validates the queries for you.

I never did manage to get the latter part working 100%, but I think it's because we used multiple top-level databases, but in the DB setup we had just one connection, that we then switched around to look at things.


With Go 1.16, you can embed SQL files into your Go app instead of using a multiline literal.

I actually am curious if a SQLC competitor should be written using embed + generics once generics drop in 1.18. I haven’t totally worked on what it would look like yet, but it’s an interesting idea.


I think generics will change a lot of the ORM space for go yeah. Will be interesting to see.


> Working with SQL in X (any language) usually has a poor developer experience that is why ORM or query builders are popular.

I believe that you're partly right. In my experience there's also a large number of developers who simply have no SQL training, and don't actually care to learn.

We've frequently have customers who complain about the performance of managed database (either managed by us, or a cloud provider). When we look at the queries it's clear that they use a ORM, without giving the schema it will generate much thought. It can be extremely hard to help make optimizations, because you need to figure out how to wrangle something like Hibernate into generating efficient queries and schemas, while not breaking the object model for the developers.

For my own Go projects I normally just stick to sqlx. I like that I can design the schema the way I need, and just create the queries that will map into my structs. Then it's just updates that are annoying.


A few years ago I had spent a year working with a Go project that made heavy use of one of the (then) popular Go ORMs. Learned my lesson, never again. Magic=Bad.


What ORM was that?


Likely Gorm, I'm using that at the moment and it's eeehhhh.


That would've been GORM v1, then. Nowadays we're at v2. It's no Hibernate, but it has been solid in my experience.


I've used similar tools (which I'm not going to recommend, as they have not aged well), and found it great to be able to rely on the compiler for checks. With ORMs or SQL I realized I would be better off working in Python, as without the compile time checks I got none of the benefits of Go and all the down sides. I haven't used sqlc, but do like that you just feed it queries. Other tools rely on using a templating language to generate the Go code from database schema introspection, and it is just awful to work with. Generics should do away with needing the templates, so maybe the database schema introspection approach will improve.


It does indeed look promising!

Can it help with migrations? Seeing it has the field definitions right there it should at least be possible.

Otherwise I can see how a system like this could become quite complex over time as the database structure changes.


Based on just this article, it looks like it needs a complete table definition to work with. Unless it queries the database for the definitive schema, but that would mean you need a database up and running at compile time.


I took for granted that sqlc could run all these CREATE TABLEs for you, perhaps that's not the case. It probably should, shouldn't it?


Hopefully they'll get SQLite working on it soon and I'll be all over it.


I'm certainly eager to see official SQLite support as well. In the mean time, the overlap between postgres and SQLite syntax is enough that a lot of my "postgres" definitions in sqlc work just fine in SQLite. The recent addition of "RETURNING" (https://www.sqlite.org/lang_returning.html) to SQLite was a big help since that was a standard part of sql used for postgres with sqlc if you wanted the last insert ID.


sqlc definitely uses reflection

But don't let that stop you, it looks like a nice solution and reflection isn't really all that bad anyway :)


Are you sure? Well, it uses "reflection" in a general sense of introspecting your SQL code, but not in the Go sense of using using type information at runtime via the "reflect" package. sqlc compiles your SQL at build time to statically typed, non-reflect-using functions, as shown here: https://docs.sqlc.dev/en/stable/howto/select.html


Technically, reflection is used by Go’s database Scanner interface, but it’s not what most people think of when they complain about reflection.


Good point. I had assumed Rows.Scan() would have just used type switches for efficiency -- it looks like it does for common cases (https://github.com/golang/go/blob/d62866ef793872779c9011161e...) but then falls back to reflect. I wonder why it doesn't just do all of that with type switches? Maybe there are just too many cases and it ends up slower than reflect for the rest of the cases.

Scanner.Scan() is actually just called via a type assertion, though I guess implementations of Scan() might use reflection.


It needs reflect in order to translate struct field names into column names at the very least


sqlc doesn't quite work that way (though "sqlx" and other packages do). sqlc generates code at build time that avoids reflect by using database/sql's Rows.Scan() with pointers to fields (see the link above):

  var i Author
  if err := rows.Scan(&i.ID, &i.Bio, &i.BirthYear); ...
This generated code is exactly what you'd write by hand if you were using database/sql directly.

As earthboundkid points out, database/sql itself may use reflection under the hood to convert the individual fields (though the common cases are done without reflect, using ordinary type switches: https://github.com/golang/go/blob/d62866ef793872779c9011161e...).


> Why would someone author an ORM, painstakingly creating Go functions that just map to existing SQL features?

The answer to this question lies in the assumption you make in this statement:

> the one which every engineer already knows: SQL.

Not every engineer knows, or wants to learn, SQL. I've met very competent engineers, SMEs over their particular system, who were flummoxed by SQL. And many more just want to work in their preferred language. I don't like ORMs either but, like, half the reason why they exist is so the programmer can talk to the RDBMS in Java, JavaScript, etc. and not touch SQL.


If you are using an ORM and need to write a query that will run on a SQL server you *need SQL knowledge and ORM knowledge*. If you are missing one the two, you probably have just wrote something with big performance penalties. This has been seen over and over in the Rails community.


Exactly my experience too.

The amount of SQL hatred from rails learning resources is unjustifiable.

If you're dealing with a relational database with SQL as its primary interface, you'll end up learning SQL eventually because all abstractions leak!


> Not every engineer knows, or wants to learn, SQL.

Which is bizarre cause you pretty much need some form of RDBs in most of the apps. And because of ANSI SQL, the syntax/concepts are relatively same on different databases too. No point in not making this investment.


Agree. This is the weird kid down the street who gets by on manyioise and saltines. To engage persistent storage is to engage sql for the first 75% of all work. Hey you gotta have some competence in the domain of work. I like Jordache (orm) but not Calvin Klein (sql) isn't wisdom; it's merely personal predilection.


No, the main benefits of ORM are not to avoid to learn SQL, they are:

• syntactic sugar and language/tooling integration for very common operations;

• a centralized data access API to build upon, that you would have to create anyway with raw SQL;

• a single introspectable source of truth you can use to generate web API, models, data validation/migration and so on.

Eventually, even using an ORM, you will need to learn SQL if you go beyond the toy project.


ORMs often introduce their own flavor of Domain-Specific Language, or their own language (Entities instead of e.g. rows). I'd posit that learning an ORM is just as much work as learning SQL, but with an extra layer of indirection.

It's like learning HTML via React.


Or just to save time and better yet, save you from massive security issues down the line.

I've seen raw SQL queries full of fatal SQL injection bugs like these in littered in codebases, very cringeworthy.


Most languages have a way to avoid SQL injection attacks, and linters that enforce usage of that.

For bad workplaces just using an ORM is a lot safer though, I agree. Performance can quickly become an issue when people stop thinking entirely about the DB level operations happening, and this comes up much quicker at workplaces where not enough people care.


I agree whole-heartedly that writing SQL feels right. Broadly speaking, you can take the following approaches to mapping database queries to Go code:

- Write SQL queries, parse the SQL, generate Go from the queries (sqlc, pggen).

- Write SQL schema files, parse the SQL schema, generate active records based on the tables (gorm)

- Write Go structs, generate SQL schema from the structs, and use a custom query DSL (proteus).

- Write custom query language (YAML or other), generate SQL schema, queries, and Go query interface (xo).

- Skip generated code and use a non-type-safe query builder (squirrel, goqu).

I prefer writing SQL queries so that app logic doesn't depend on the the database table structure.

I started off with sqlc but ran into limitations with more complex queries. It's quite difficult to infer what a SQL query will output even with a proper parse tree. sqlc also didn't work with generated code.

I wrote pggen with the idea that you can just execute the query and have Postgres tell you what the output types and names will be. Here's the original design doc [1] that outlines the motivations. By comparison, sqlc starts from the parse tree, and has the complex task of computing the control flow graph for nullability and type outputs.

[1]: https://docs.google.com/document/d/1NvVKD6cyXvJLWUfqFYad76CW...

Disclaimer: author of pggen (https://github.com/jschaf/pggen), inspired by sqlc


Dang unfortunate name clash. I wrote a very similar tool of the same name and just open sourced it. I stumbled on your tool right after publishing https://github.com/opendoor/pggen.


I'm the author of xo. I appreciate the reference here, but that's not really what xo does. It generates all boilerplate for you directly based on whatever is defined by the database. It doesn't do anything with YAML, nor does it generate a SQL schema. xo does have 2 templates that _generates_ YAML and _generates_ SQL queries to recreate the schema, but the only input you can give xo is an active database.


It seems you forgot sqlboiler [1] which is something like "write SQL schema, parse SQL schema, generate Go structs and functions to build queries using SQL primitives". Not quite like generating activerecords, I think.

[1]: https://github.com/volatiletech/sqlboiler


> - Write custom query language (YAML or other), generate SQL schema, queries, and Go query interface (xo).

I've also tried to unify this approach with gRPC/Protobuf messages and CRUD operations: https://github.com/sashabaranov/pike/


One thing I’m pursuing with pggen is serializing DB query results directly to a proto. It’d be super cool to write a sql query and have it spit out a proto and avoid all the boilerplate struct copying.


I like this design! Asking the database to tell you the schema of your result does seem like the simplest, most reliable option.

However it does require you to have a running database as part of your build process; normally you'd only need the database to run integration tests. Doable, but a bit painful.


Yep, that’s the main downside. pggen works out of the box with Docker under the hood if you give it some schema files. Notably, the recommended way to run sqlc also requires Docker.

I check in the generated code so I only run pggen at dev time, not build time. I do intend to move to build time codegen with Bazel but I built out tooling to launch new instances of Bazel managed Postgres in 200 ms so not that painful.

More advanced database setups can point pggen at a running instance of Postgres meaning you can bring your own database which is important to support custom extensions and advanced database hackery.


> Notably, the recommended way to run sqlc also requires Docker.

This isn't accurate. sqlc is a self-contained binary and does not have any dependencies. Docker is one of the many ways to install and run it, but it is not required. (author of sqlc)


Oh hey! Apologies, not trying to throw shade. Last time I used sqlc was a while ago and I ran it in docker.


No worries! Just wanted to make sure people didn't think Docker was a necessity.


What does your bazel tooling look like? We use postgres and bazel together in automated tests and I'm curious about how others start and manage postgres instances in similar scenarios. Currently were driving postgres from Python (py.test) while running tests.

The issue with our current approach is that we need to keep the bazel tests fairly coarse (e.g., one bazel test for dozens of python tests) to keep the overhead of starting postgres instances down.


I uploaded most of our Workspace setup here: https://github.com/jschaf/bazel-postgres-sketch. The tooling is a bunch of Go to manage the Postgres process. Basically, give it schema files and receive a running database with a tear down command.

We make temp instances of Postgres quickly by:

- avoiding Docker, especially on Mac

- keeping the data dir on tmpfs

- Disable initdb cleanup

- Disable fsync and other data integrity flags

- Use unlogged tables.

- Use sockets instead of TCP localhost.

For a test suite, it was 12x faster to call createdb with the same Postgres cluster for each test than than to create a whole new db cluster. The trick was to create a template database after loading the schema and use that for each createdb call.


Cool, thanks for the link.

For what it's worth, we use rules_nixpkgs to source Postgres (for Linux and Darwin) as well as things such as C and Python toolchains, and it's been working really well. It does require that the machine have Nix installed, though, but that opens up access to Nix's wide array of prebuilt packages.

https://github.com/tweag/rules_nixpkgs


- generate code from running database and use a type-safe query builder (https://github.com/bokwoon95/go-structured-query)


Author of sqlc here. Just wanted to say thanks to everyone in this thread. It's been a really fun project to work on the last two years. Excited to get to work on adding support for more databases and programming languages.


Thanks a lot for this great project. I looked in the issues for Sqlite support and saw the merge of PR to "Add three new experimental engines, including SQLite" [0] and there's major architecture changes involved. That merge was 1.5 yr ago though, and I am curious what the plans are to take that further.

[0] https://github.com/kyleconroy/sqlc/pull/331


I haven't written up a public roadmap yet as I'm still focused on improving the MySQL and PostgreSQL support. While there is technically a SQLite parser in the main tree, it's substantially lower quality than the others. This is due to the fact that it's generated using Bison and not used by any else in production.

SQLite uses a custom parser generator called lemon[0] to parse SQL queries. Sadly that parser is deeply entwined with SQLite itself; it's not trivial to extract a full AST.

My current plan (still a work-in-progress and by no means final) is to use sqlparser-rs[1] via wasmtime. The AST produced by this crate is very high quality and it supports multiple dialects of SQL.

[0] https://www.sqlite.org/lemon.html [1] https://github.com/sqlparser-rs/sqlparser-rs


You could also see whether https://pkg.go.dev/modernc.org/sqlite could help.


Oh wow, thank you for this information!


I've used https://github.com/xo/xo, extended it with some custom functions for templating, extended the templates themselves, and can now generate CRUD for anything in the database, functions for common select queries based on the indices that exist in the database, field filtering and scanning, updates for subsets of fields including some atomic operations, etc. The sky is the limit honestly. It has allowed me to start with something approximating a statically generated ORM and extend it with any features I want as time goes on. I also write .extra.go files along side the generated .xo.go files to extend the structs that are generated with custom logic and methods to convert data into response formats.

I like the approach of starting with the database schema and generating code to reflect that. I define my schema in sql files and handle database migrations using https://github.com/golang-migrate/migrate.

If you take this approach, you can mostly avoid exposing details about the SQL driver being used, and since the driver is mostly used by a few templates, swapping drivers doesn't take much effort.


That's cool. As per the other comment, you should share it with the community.


Sounds like a masterful work from you. Planning on open-sourcing your extensions?


As an aside - for anyone working with databases in Go, check out https://pkg.go.dev/modernc.org/sqlite

It allows drop in replacement of SQLite that is in pure Go - no CGO or anything required for compilation, while still having everything implemented from SQLite.

Insert speed is a bit lacking (about ~6x slower in my experience compared to the CGO sqlite3 package), but its good enough for me.


It's not really pure go, it's transpiled using https://gitlab.com/cznic/ccgo

Just about all the code looks like this:

  // Call this routine to record the fact that an OOM (out-of-memory) error
  // has happened.  This routine will set db->mallocFailed, and also
  // temporarily disable the lookaside memory allocator and interrupt
  // any running VDBEs.
  func Xsqlite3OomFault(tls *libc.TLS, db uintptr) { /* sqlite3.c:28548:21: */
   if (int32((*Sqlite3)(unsafe.Pointer(db)).FmallocFailed) == 0) && 
  (int32((*Sqlite3)(unsafe.Pointer(db)).FbBenignMalloc) == 0) {
    (*Sqlite3)(unsafe.Pointer(db)).FmallocFailed = U8(1)
    if (*Sqlite3)(unsafe.Pointer(db)).FnVdbeExec > 0 {
     libc.AtomicStoreNInt32((db + 400 /* &.u1 */ /* &.isInterrupted */), int32(1), 0)
    }
    (*Sqlite3)(unsafe.Pointer(db)).Flookaside.FbDisable++
    (*Sqlite3)(unsafe.Pointer(db)).Flookaside.Fsz = U16(0)
    if (*Sqlite3)(unsafe.Pointer(db)).FpParse != 0 {
     (*Parse)(unsafe.Pointer((*Sqlite3)(unsafe.Pointer(db)).FpParse)).Frc = SQLITE_NOMEM
    }
   }
  }


Being translated means it doesn't have the normal cgo calling overhead. It also means you can cross compile it for every platform that the Go toolchain supports without any external compilers.


OP mentioned that the pure-Go version is ~6 times slower, so the cgo calling overhead is clearly made up for by C. Also, I've heard that sqlite is the rare piece of C software that is actually bulletproof, so I don't think the pure-Go version can make the usual boasts about correctness and security in this particular case.

Not needing extra external compilers is still a nice proposition, however.


Especially for portability/cross compiling.


Nope, note their readme says:

These combinations of GOOS and GOARCH are currently supported

darwin amd64, darwin arm64, freebsd amd64, linux 386, linux amd64, linux arm, linux arm64, windows amd64

and if you look at their source tree https://gitlab.com/cznic/sqlite/-/tree/master/lib you can see they have

sqlite_darwin_amd64.go sqlite_darwin_arm64.go sqlite_freebsd_amd64.go sqlite_linux_386.go sqlite_linux_amd64.go sqlite_linux_arm.go sqlite_linux_arm64.go sqlite_linux_s390x.go sqlite_windows_386.go sqlite_windows_amd64.go


That's nice and impressive as far as it goes for portability however on the other side, we have C.


I hadn't realized it was now ready for general use...

    SQLite 2020-08-14 13:23:32 fca8dc8b578f215a969cd899336378966156154710873e68b3d9ac5881b0ff3f
    0 errors out of 928271 tests on 3900x Linux 64-bit little-endian
Whee, I shall have to give it a go - thanks for the heads-up :-)


Swswsswwzzwwwwwwwwxw


I was really really excited when I saw the title because I've been having a lot of difficulties with other Go SQL libraries, but the caveats section gives me pause.

Needing to use arrays for the IN use case (see https://github.com/kyleconroy/sqlc/issues/216) and the bulk insert case feel like large divergences from what "idiomatic SQL" looks like. It means that you have to adjust how you write your queries. And that can be intimidating for new developers.

The conditional insert case also just doesn't look particularly elegant and the SQL query is pretty large.

sqlc also just doesn't look like it could help with very dynamic queries I need to generate - I work on a team that owns a little domain-specific search engine. The conditional approach could in theory with here, but it's not good for the query planner: https://use-the-index-luke.com/sql/where-clause/obfuscation/...


Arrays are nicer for the IN case because Postgres does not understand an empty list, i.e “WHERE foo IN ()” will error. Using the “WHERE foo = ANY(array)” works as expected with empty arrays.


Works as expected? Wouldn't that WHERE clause filter out all of the rows? Is that frequently desired behavior?


I could imagine that you're building up the array in go code and want the empty set to be handled as expected.


I think I'm missing something, but I don't get sqlc. Let's say I want to "get a list of authors", so in sqlc I would write:

    -- name: ListAuthors :many
    SELECT * FROM authors
    ORDER BY name;

and in Go I can then say `authors, err := queries.ListAuthors(ctx)`. This is cool. Now, if I want to "get a list of American authors" I would write:

    -- name: ListAuthorsByNationality :many
    SELECT * FROM authors
    WHERE nationality = $1;

and in Go I can then say `americanAuthors, err := queries.ListAuthorsByNationality(ctx, "American")`. Now, if I want to "get a list of American authors that are dead", I would have to write:

    -- name: ListDeadAuthorsByNationality :many
    SELECT * FROM authors
    WHERE nationality = $1 AND dead = 1;

... I like the idea of getting Go structs that represent table rows, but I don't want to keep a record of every query variation I may need to execute in Go code. I want to write in Go:

    deadAmericanAuthors, err := magic.GetAuthorsBy(Params{
         Nationality: "American",
         Dead: true
    })
without having to write manually the N potential sql queries that the above code may represent.


> without having to write manually the N potential sql queries that the above code may represent.

The article provides an alternative by using conditionals inside of the SQL, but honestly it's not an improvement.


this is the reason why I chose upper/db over pgx/sqlc for my current cockroachdb side project

while upper/db is not as type safe, with proper testing infrastructure, it felt most similar to django due to its simplicity/composability/query building support

i'm also excited to see how upper/db grows after generics land in Go later this year

https://github.com/upper/db

https://upper.io/


I'm still waiting for a compile-to-sql language in the vein of coffeescript or typescript. It seems like there is so much that could be improved with some very simple syntax sugar: variables, expression fragments and even real basics like trailing commas.


For me that's ecto (an elixir dsl for performing queries). The single defining improvement is I can define reusable building blocks. (This is also why I like react-style frameworks over raw js).

    entry_of(record)
    |> select_basic_info()


    def entry_of(record) do
        Entry
        |> where(record_id: record.id)
    end


    def select_basic_info(query) do
        query
        |> select([entry], BasicEntey.new(entry.foo, entry.bar))
    end


Tagged template combinators work surprisingly well [0] for injection sanitation/intuitive fragment generation while giving access to full set features of underlying database.

[0] https://github.com/appliedblockchain/tsql


Have you tried LINQ to SQL?


If you are looking for a way to map SQL queries to type safe Go functions, take a look at my library Proteus: https://github.com/jonbodner/proteus

Proteus generates functions at runtime, avoiding code generation. Performance is identical to writing SQL mapping code yourself. I spoke about its implementation at GopherCon 2017: https://www.youtube.com/watch?v=hz6d7rzqJ6Q


sqlc is a great code generator that seems to work miracles.

It uses the official postgres parser to know all the types of your tables and queries, and can generate perfect Go structs from this.

It even knows your table and field types just from reading your migrations, tracking changes perfectly, no need to even pg_dump a schema definition.

I also found it works fine with cockroachdb.


How are migrations defined?

I ask because I'm still trying to find a good solution for my project.


it supports the migration files of several different Go migrator modules. Usually just a series of text .sql files with up/down sections.


We just use something like github.com/Masterminds/squirrel in combination with something like github.com/fatih/structs (it's archived, but it's easy code to write) to help with sql query generation, and use github.com/jmoiron/sqlx for easier scanning. I guess it's a little trickier when trying to use postgres specific commands, but we haven't run into many problems.


This looks really cool to me, because I love to write SQL.

Except for that `UPDATE` statement. That... is a problem.

Looks like there is an open discussion about this on the project: https://github.com/kyleconroy/sqlc/discussions/1149


I attempted to make something similar to this except the opposite direction at a previous job. It was called Pronto: https://github.com/CaperAi/pronto/

It allowed us to store and query Protos into MongoDB. It wasn't perfect (lots of issues) but the idea was rather than specifying custom models for all of our DB logic in our Java code we could write a proto and automatically and code could import that proto and read/write it into the database. This made building tooling to debug issues very easy and make it very simple to hide a DB behind a gRPC API.

The tool automated the boring stuff. I wish I could have extended this to have you define a service in a .proto and "compile" that into an ORM DAO-like thing automatically so you never need to worry about manually wiring that stuff ever again.


I was expecting the article to contain a note about SQLBoiler (https://github.com/volatiletech/sqlboiler) and why they didn't use it, but it doesn't. So I was expecting SQLBoiler to be heavily mentioned in the comments, but it's not the case.

If you want to see a (slightly heated) debate about `sqlc` versus SQLBoiler with their respective creators: https://www.reddit.com/r/golang/comments/e9bvrt/sqlc_compile...

Note that SQLBoiler does not seem to be compatible with `pgx`.

[edit: grammar]


i have been working with ORM and plain SQL for the past 10 years in a bunch of languages and libraries (php, java, javascript, Go). The issue i have with ORM and other libraries that supposedly reduce the work for you is that it's black magic. You will encounter yourself one day having to dig into the source code of the library to tackle nasty bugs or add new features. It's exhausting. When I started using Go I mostly used plain SQL queries. I took on the manual endeavour to map and hydrate my objects. Sure, it's more manual work, but abstractions have a cost too. That bill might have to be paid one day. One way or the other. Personally, I am never looking back. But every one of us has has a different use case. Therefore ymmv


This looks better than typical ORMs, but still not giving me what I want.

I want query objects to be composable, and mutable. That lets you do things like this: http://btubbs.com/postgres-search-with-facets-and-location-a.... sqlc would force you to write a separate query for each possible permutation of search features that the user opts to use.

I like the "query builder" pattern you get from Goqu. https://github.com/doug-martin/goqu


So they are using a full blown relational database to use it like they are reading files on a share. Amazing indeed.

From the docs and online comments, SQLC doesn't support join. I am amazed by the number of comments and nobody point this out.


Can you provide some resources about the lack of support for joins in sqlc? Because I wasn't able to find in official documentation and actually there's a discussion on github containing queries with join statements: https://github.com/kyleconroy/sqlc/issues/213


Yes, I was about to reply to myself with a link to this issue as I couldn't see anything in official docs unless looking at the github issues.

https://github.com/kyleconroy/sqlc/issues/1157


But it seems like this discussion is about the lack of support for null values in enum types in joins (not joins in general). Am I missing something?


I got it wrong because of lack of documentations and examples in the official documentation. So one would be only aware of the feature if they read the issue tracker which is dumb, joining two entities (or more) is like the first thing you want to do with a database.


It's not the main thrust of the article, but this snippet of code struck out at me:

> err := conn.QueryRow(ctx, `SELECT ` + scanTeamFields + ` ...)

Why are we, as an industry, still okay with constructing SQL queries with string concatenation?


As an alternative I suggest people to look at https://github.com/go-jet/jet. I had a good experience working with it and the author is quite responsive.

It really feels like writing SQL but you are writing typesafe golang which I really enjoy doing.


We're using a very similar lib for typescript: https://github.com/adelsz/pgtyped

Would love to hear if any others of comparable or better quality exist for js/ts


Personally, I tried pgtyped in a greenfield project but ended up switching to Slonik. Both are brilliant packages-- if you ever peak at the source of pgtyped it basically parses SQL on its own from what I could understand. The issues I had were-- 1) writing queries sometimes felt a little contrived in order to prevent multiple roundtrips to the database and back and 2) pgtyped gave weird/non-functioning types from some more complex queries. I've had better luck with Slonik and just writing types by hand.


His codeblocks have broken horizontal scroll on mobile.

Other than that I like it a lot. I built some codegen stuff in the past for test automation and it's really quite nice because it reduces a lot of user errors.


Looks very similar to the annoyingly named "mybatis". I approve of the principle: SQL should be separated from code, because SQL needs to be written, or at least tuned, by someone with database expertise. There are often deeply subtle decisions on how to phrase things that affect which indexes get used and so on, and can make orders of magnitude difference to how quickly a query executes.

This is also why ORMs that write the query for you are unhelpful. You're stuck trying to control how a machine makes SQL.


>ORMs also have the problem of being an impedance mismatch compared to the raw SQL most people are used to, meaning you’ve got the reference documentation open all day looking up how to do accomplish things when the equivalent SQL would’ve been automatic. Easier queries are pretty straightforward, but imagine if you want to add an upsert or a CTE.

How does this make sense? Most ORMs will give yo a way to execute raw sql which you marshal into a struct the way yo would with a lower level library.


Well yeah, but one of the motivations of using an ORM is that you don't have to write (database engine specific) SQL. I mean the database agnosticism is generally speaking not an issue, but still.


> However, without generics, Go’s type system can only offer so much

I was reading the whole article waiting to see this line, and the article did not disappoint. This is still the main reason I will stick with Rust or Crystal (depending on the use-case) and avoid Go if I can for the foreseeable future. Generics are just a must these days for non-trivial software projects. It's a shame too because Go has so much promise in other respects.


They're really not a `must`. What a silly comment - Docker and Kubernetes and substantial parts of Google wouldn't be classed as trivial.

For the thousands of devs shipping non-trivial code, keep going!


Kubernetes does a lot of code generation to work around some of Go’s shortcomings.

https://cloud.redhat.com/blog/kubernetes-deep-dive-code-gene...


> Docker and Kubernetes

Both of these projects have had to go way out of their way to make things work without generics, but they are large enough projects and have enough resources that they can do this. Both are actually really great examples of why Go is a bad choice until generics are added and have first-class support. Even C would be preferable over something where there are no generics.


You're incorrect but that's OK.

I've been writing Golang for 5 years and have never had need for generics.

It's nothing to with resources, it's just understanding how to build software.

Your retreat to C is a bit sad - I'd be happy to help you if you've got any Go you're struggling with?


Kubernetes does in fact need generics. There are types that all work the same but are different types all over the place. For example, you might make a reflector for a v1.Node, and that will call methods on your cache that are like Add(v1.Object), when the signatures should really be v1.Node. Instead, you have to cast the API to your internal implementation. It's not a big deal, but it's a solid example of something that generics would clean up.


I wrote Go for 3 days at my new job and found several places where people opted for interface{} over generics.


There's plenty of non-trival code written in C as well. That's not a good argument for the benefit of a programming language. You can work around any limitation with enough work -- this article is a perfect example. It's an ugly solution to a simple problem but it works.


C has few enough restrictions though that you can for example make a struct and then make an array of that struct. In Go this is like rocket science.


We're never going to get to Mars if `arr := [100]myStruct` qualifies as rocket science.



Sorry, meant to say some other data structure, say, a hash table where the key is some custom type / struct and the value is some custom type / struct. Or a binary search tree. Or a linked list.


>This is still the main reason I will stick with Rust or Crystal (depending on the use-case) and avoid Go if I can for the foreseeable future

Generics are to be added to Go 1.18 (Feb 2022)


For a full featured "go generate(d)" ORM try https://entgo.io/ Seems rather similar, with the main difference being that you define your schema in a specific go package, from which the ORM is generated. The nice thing is that you can import this package later again to reuse something like default values etc


> A big downside of vanilla database/sql or pgx is that SQL queries are strings

What's wrong with strings? The argument in the article is that they cannot be compile-time checked, but I'm confused as to the solution to that problem ("you need to write exhaustive test coverage to verify them"). Is this saying that if they weren't strings you wouldn't need test coverage?


I like SQL queries as strings but I also like my IDE to syntax check them ... Since there are already so many links to projects in this thread I'll happily introduce fileconst which provides the best of both worlds - https://github.com/PennState/fileconst.


Maybe the key word here is 'exhaustive'

What did stand out for me from that section was:

This is fine for simple queries, but provides little in the way of confidence that queries actually work.

Why not just paste the string first to psql to make sure the query actually works?


There is a significant number of developers who think requiring a database to run tests is abhorrent, even in the age of containers. Instead, they'd rather write tests and validations in their app code against the query syntax, which ends up being more work and not as comprehensive.


I stumbled across this library a few months ago and also really liked the approach. Unfortunately I had to drop it temporarily due to this issue (https://github.com/kyleconroy/sqlc/pull/983)... which I now see is solved. Taking another look. :)


How does it deal with mapping relationships? For example, a Many-to-Many between Posts and Tags, or a Many-to-One like Posts and Comments?


Relationships are concern of the SQL you provide, not the tool's processing. It is really concerned with only:

1. the inputs used to execute a query

2. the type of output

So whether your query is "SELECT * FROM comments" or "SELECT * FROM comments WHERE comments.postid=$1", the result is still []Comment.


If you want a code generator like this that has support for that kind of thing, https://github.com/opendoor/pggen can automatically infer these kinds of relationships based on foreign key relationships and emit slices of pointers to connect the records together in memory. It can even figure out 1-1 relationships if there is a UNIQUE index on the foreign key. There is a little mini-DSL for specifying exactly how much of the transitive closure of a given record you want to get filled in for you.


That's pushing into full-on-ORM. I get the sense that these kind of transforms are not liked by this OP.


Talk about JIT on target article ... I'll play with this at the office to tomorrow. I've got plans for it


Hmm that was a compliment. Translating: talk about a just in time article (with respect to me) which targets the problem I was thinking about just today ... I'll look at applying it tomorrow ... Now fix your down votes to some net positive value. Geez!


Unfortunately this tool only does static analysis on your SQL. I prefer tools like sqlx (the Rust one), which gets types by running your queries against an actual database. It feels more bulletproof and futureproof than the approach that sqlc is taking.


I wouldn’t say I’m all in, but this is what I use at work, and I think it’s better than the existing alternatives I could be using instead.


I'm a big fan of the database first code generator approach to talking to an SQL database, so much so that I wrote pggen[1] (not to be confused with pggen[2], as far as I can tell a sqlc fork, which I just recently learned about).

I'm a really big partisan of this approach, but I think I'd like to play the devil's advocate here and lay out some of the weaknesses of both a database first approach in general and sqlc in particular.

All database first approaches struggle with SQL metaprogramming when compared with a query builder library or an ORM. For the most part, this isn't an issue. Just writing SQL and using parameters correctly can get you very far, but there are a few times when you really need it. In particular, faceted search and pagination are both most naturally expressed via runtime metaprogramming of the SQL queries that you want to execute.

Another drawback is poor support from the database for this kind of approach. I only really know how postgres does here, and I'm not sure how well other databases expose their queries. When writing one of these tools you have to resort to tricks like creating temporary views in order infer the argument and return types of a query. This is mostly opaque to the user, but results in weird stuff bubbling up to the API like the tool not being able to infer nullability of arguments and return values well and not being able to support stuff like RETURNING in statements. sqlc is pretty brilliant because it works around this by reimplementing the whole parser and type checker for postgres in go, which is awesome, but also a lot of work to maintain and potentially subtlety wrong.

A minor drawback is that you have to retrain your users to write `x = ANY($1)` instead of `x IN ?`. Most ORMs and query builders seem to lean on their metaprogramming abilities to auto-convert array arguments in the host language into tuples. This is terrible and makes it really annoying when you want to actually pass an array into a query with an ORM/query builder, but it's the convention that everyone is used to.

There are some other issues that most of these tools seem to get wrong, but are not impossible in principle to deal with for a database first code generator. The biggest one is correct handling of migrations. Most of these tools, sqlc included, spit out the straight line "obvious" go code that most people would write to scan some data out of a db. They make a struct, then pass each of the field into Scan by reference to get filled in. This works great until you have a query like `SELECT * FROM foos WHERE field = $1` and then run `ALTER TABLE foos ADD COLUMN new_field text`. Now the deployed server is broken and you need to redeploy really fast as soon as you've run migrations. opendoor/pggen handles this, but I'm not aware of other database first code generators that do (though I could definitely have missed one).

Also the article is missing a few more tools in this space. https://github.com/xo/xo. https://github.com/gnormal/gnorm.

[1]: https://github.com/opendoor/pggen [2]: https://github.com/jschaf/pggen


Really appreciate the in-depth comment. Also, thanks for open-sourcing pggen. I love have more projects in this space. I'll have to take a look at how it works.

I did want to address the last point in your comment.

> This works great until you have a query like `SELECT * FROM foos WHERE field = $1`

When sqlc sees a query with a *, it rewrites the query in the generated code to have explicit column references. For example, if you have an authors table with three columns (id, name, bio), the following query:

  SELECT * FROM authors WHERE name = $1;
will have the * replaced in the final output.

  SELECT id, name, bio FROM authors WHERE name = $1;
You can see it in action here: https://play.sqlc.dev/p/2ea889b6d14ae7a91afdcdf4eebe7d100408...


Thanks for the correction! When I read the generated code, I just focused on the scan call. Since pggen doesn't parse the SQL, there is no great way to detect `SELECT ` and rewrite the query, so I hadn't considered this as a possibility. Parsing the SQL really opens up a lot of cool possibilities for you.

You already know this but in case anyone else is reading, another super cool thing that sqlc can do is infer good names for query arguments in go code by looking at what they are compared to in the SQL code. Thus for a query like `SELECT FROM foos WHERE created_at > $1`, the generated go wrapper would have a `createdAt` arg instead of having it be named something like `arg1`. Since opendoor/pggen doesn't parse the SQL, you need to explicitly override the argument names if you want to provide better names. Of course the names won't be perfect with sqlc's approach, but they will be better than `arg1` and it's still a very cool detail. It might not be obvious how neat this is if you haven't had to implement it, which is why I mention it.


Oh, another issue worth mentioning specifically when you are using the jackc/pgx driver is that pgx maintains a prepared statement cache under the hood for performance reasons, but the statements in the cache can be invalidated by query migrations. So if you do `SELECT * FROM foos WHERE field = $1` and then run `ALTER TABLE foos DROP COLUMN bar`, the next time you run that query you will get an error. This used to just be broken until you restarted the process to clear the cache, but I've patched pgx to make it invalidate the cached prepared statement when it sees one of these errors (it still returns the error though since jackc thought a retry would be too complicated). I added an automatic retry to opendoor/pggen to make it so users don't need to worry about this kind of low level detail. You may want to consider that for sqlc as well. The other option is to advise users to disable the cache by setting it's size to zero (I forget the exact config option, but jackc said there is one when I first brought up the issue).


sqlc looks very interesting and compelling. A similar library I like but haven't had the chance to really use is goyesql: https://github.com/knadh/goyesql

It also allows just writing SQL in a file, reminds me a bit of JDBI in Java.


wow, thanks for mentioning sqlc (and pggen below). The ergonomics is exactly what I have been looking for.

I've dreamt of writing such libraries but alas, never found the time to actually do it. But, now I can just use one of them!


Does anyone have a similar recommendation for Rust?


Have not used, but I believe it would be sqlx (https://lib.rs/crates/sqlx). Note that this uses async.


Java is awful and slow.

Invent Go.

Waiting for generics....

5 year later

Go looks like Java. Back to square one :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: