Hacker News new | past | comments | ask | show | jobs | submit login
SQLBolt – Interactive lessons and exercises to learn SQL (sqlbolt.com)
324 points by karimf on July 15, 2021 | hide | past | favorite | 84 comments



My favorite book for learning SQL is “The Art of PostgreSQL”. https://theartofpostgresql.com/

I found the combination of real-world problems, general SQL advice, and the broad range of topics to be a really good book. It took my SQL from “the database is not much more than a place to persist application data” to “the application is not much more than a way to match commands to the database”. It’s amazing how much bespoke code is doing a job the database can do for you in a couple of lines.


I'm slowly moving in that direction. Recently started using (pg-typed)[1] in our projects and its amazing, as it gives you the types from the database into typescript, and not the "general types from tables", but the exact specific types for each individual query.

Coupled with the same thing going the other direction where we get types from our api contracts (OpenAPI/Swagger) with (laminar)[2] means that our app is very close to the "if it compiles it will run" territory.

ORMs do give you a lot of convenience though. Things like "run this additional query every time you request this entity" thing for example like for logical delete, which is unpleasant to replicate in your database. But Postgres is so freaking powerful its more of the fact that we don't know how to do it properly than it not offering a good solution.

[1] https://github.com/adelsz/pgtyped [2] https://github.com/ovotech/laminar


The blog post[1] by the author about ORMs was what convinced me to purchase the book. I've had too many discussions with colleagues and tech friends struggling with N+1 problems and processing too much stuff in the application when they use mainstream ORMs to think it is a good idea. ORMs make the regular CRUD stuff simpler, but seem to make some more complex queries and transactions harder. There also seems to be an impedance mismatch between SQLs relational model and the OOP object relationship model.

Edit: I know there are ways to avoid N+1 problems with ORMs, but it seems to more easily sneak into code when your SQL queries look just like your application level code and you could easily enumerate over some SQL result, perform some action, and think that it builds an efficient query.

I've recently been working with a hobby project where I use the Clojure HoneySQL[2] library which essentially lets you build SQL queries as you normally would, but in Clojure's EDN syntax. It treats SQL queries as data. You can super easily evaluate them to get the resulting raw SQL query strings. There is no magic behind it and it encourages you to use the full power of your db.

[1] https://theartofpostgresql.com/blog/2019-09-the-r-in-orm/ [2] https://github.com/seancorfield/honeysql


Anyone who's interested in pgtyped may find this library comparison helpful: https://phiresky.github.io/blog/2020/sql-libs-for-typescript...


I haven’t read this yet but I love your take.

Far too often do I see developers doing analytics by slurping an entire table across the network and performing calculations on it in the application. Of course it appears to work in development with tens of kilobytes of records and a local database, but as soon as it’s deployed to production it unleashes chaos.

I consider myself very good at SQL but I do prefer to use ORMs for most things. However there’s no arguing that many of them have done a lot to obscure the incredible power inside a relational database, and perhaps go too far in hiding exactly when data is computed remotely vs. being pulled and operated on locally.


I’m very guilty of all those sins. Select all, then use array map, reduce and filter in the app code. I could draw a line in my life where I discovered window functions.


If the dataset is big enough, those operations actually do make sense because they can be translated to e.g. a Hadoop job - map / reduce / filter are all operations that don't depend on the sorting of a dataset, so in theory the dataset can be sharded across many machines.

Of course, that is a solution to a problem that people wish they had. Relational databases can run multi-million row queries in seconds, and then there's BigTable / BigQuery which scales SQL to improbable scales.


Exactly. I'm talking about the "select 5000 rows, serialize them to JSON, and send them over HTTP to the browser, only to render them in a table that never changes" type code.


One argument I've often heard is that since it's easier to scale your application compared to your database, pushing the calculations on your application is a way to get a better scalability in the long term. I wonder how true that is though.


I’ve seen applications go down hundreds of times from pulling entire tables locally to compute on them. I’ve seen maybe once where an application grew so successful that scaling the database became an extremely challenging topic.

And even in that case, I/O and contention were inevitably the problems. Not CPU.

I’ve similarly heard myths that you should be judicious when writing indexes because they can affect insertion performance. I’ve again seen hundreds of cases where under-indexing killed performance and zero where over-indexing caused problems.

99.9% of the time, you’re not the crazy special case. And if somehow you are, the solutions required are going to be nuanced and involve a ton of specific measurement. It’s widely unlikely you’ll accidentally avoid these problems through something like this.


That depends a lot on the workload, of course. But quite often the DB is not constrained on CPU, but rathen on IO or memory (or even networking), and calculations mostly consume CPU.

Moreover, calculations usually require just a small subset of columns in a table, and thus can use indexes efficiently, whereas grabbing all the columns to then filter them in application (because that's how many ORMs work, at least by default) becomes not only worse in terms of memory and networking, but also in terms of IO and CPU required on the DB side.

So overall I'm sceptical of that argument, unless there's clear proof from profiling that it's indeed the case.


Depends a lot. You need to consider the requirements more in detail, and especially if we are considering reads vs writes, and also what are the methods to write the data that needs scalable reads.


I have not used ORM, but used some basic SQL. Is ORM more performant? Why should I learn SQL if I can do all the things using ORM?


Quite the contrary, ORMs in general are less performant. Using ORMs boils down to two things, convenience and the ability to switch the underlying RDBMS. (For example all the OSS which says you can choose between MySQL, Postgres or SQLite.)

But if you support all RDBMS you only can support the smallest intersection between them and can't use advanced features like CTE, window functions, JSON support etc.


> But if you support all RDBMS you only can support the smallest intersection between them and can't use advanced features like CTE, window functions, JSON support etc.

That's not true at all, any more that it’s true of supporting all browsers with JS. You can use the advanced features where available, and implement logically (if not performance)-equivalent functionality using more basic functions where the advanced features aren’t available.

Or, if you are lazy, just have a reduced feature set available with less capable RDBMS engines. But, on any case, its simply not the case that an ORM that supports engines of varying capacity is limited to using only the least-common-denominator feature set.


Yeah, sure, you can always write raw SQL queries, but that has nothing to do with your ORM, does it?

I was talking about ORMs in general, not about the programmer in particular. So yes, you are right that if you have enough engineering resources you can support everything. But most ORMs don't help you with this, so it is not a feature of ORMs. You can always bypass it though.

The comparison with JS doesn't fit realy well, because JS is the tool you have to use. You don't have to use ORMs, plain SQL works fine as well.


When GP says 'you can use the advanced features', 'you' refers to the person developing the ORM, not to a developer using the ORM.

So the ORM author can do the hard work of allowing advanced features to work across databases. And it's transparent to the developer using the ORM.


Except, how many ORMs do that?


I use Postgres window functions with Hibernate and Doctrine, plus lots of other Postgres-specific functionality.

The real benefit of the ORM to me are that:

* some query results are cached

* the "unit of work" pattern allows me to distribute changes to an entity and then commit it as a transaction.

Anyway, the ability to suddenly switch RDBMS only works out in practice if you actively maintain support for multiple engines.


Solving a programming problem is usually finding the right data structure, so in that sense is not surprising. Most of the time I find that the real work is simply performed by the DB, and it's not just persistence, it's mantaining state and managing communication in concurrent applications.


I found that book really useful too, and it led me towards leaning more heavily on Postgres. For a long time using Postgres I just treated it as a ‘dumb’ data store with an ORM in front of it, but learning about CTEs, window functions, user defined functions, views, extensions, etc. has made me reconsider where a lot of application logic should go


That book is heavily biased toward doing everything you can in the database and boasts it as the only true way. Which IME is hardly true for many type of applications and phases of development. Many times having more brains in your application code leads to a simpler less decoupled solution.


Definitely going to take a look at this. I'm one of the few traditional developers in a company that's got a lot more analytics and database people than application or web people. In such a situation the company is loathe to use ORMs and the old adage of "never do your business logic in the database" doesn't really apply in practice. I've managed to learn enough SQL to get by and fulfill the software's requirements out of sheer necessity and I've actually got to the point I prefer to use it over an ORM, but I'd love to have a more systematic education in the language.


I'm thinking now about buying this because of your comment, but I'm wondering if full edition is worth extra 50$ compared to standard


It really depends how much $50 is worth to you. The book is brilliant by itself, but the datasets and unit test examples are a really good tool for playing with. Maybe if you have some data from a real-world project you could use that instead.


That's one of the most expensive technical books I've ever seen.


Holy smokes, $100?!?! You’re not kidding!

I’m always willing to buy technical books. I think it’s valuable to have material from different authors because they each have different perspectives and styles. E.g. CLRS vs Sedgewick vs Skiena. I also like to support the authors. However I’ll take a hard pass at this one.

My ability to level up is rarely due to the quality of the material, it’s more a function of how much time and effort into studying and learning. Time is the limiting factor in almost everything, not learning material.

In other words, learning isn’t about choosing “book A” vs “book B” but rather studying any books vs scrolling through HN, watching YouTube, or any of the other million blackholes of time.


I look at it that I spent $100 on this book, and got a $20K payrise. At that order of magnitude, the difference between spending $10 and $100 didn't matter.


You didn’t get the pay raise because spent $100, you got the pay raise because you spent hours, days, months.

It’s time, not money, you are spending.


From that point of view the $100 for the book is even less relevant.


I've been looking on brushing up on my SQL lately and have been going through these resources.

- Practical SQL, No Starch Press. ($30)

https://nostarch.com/practicalSQL

- Use The Index Luke ($15)

https://use-the-index-luke.com/

- Database Systems Concepts & Design by Georgia Tech on Udacity (free)

https://www.udacity.com/course/database-systems-concepts-des...

I can easily pay the $100, I cannot easily find more time, especially when there are a bunch of other things I'm spending time on. At a lower price I would likely buy the book "just because".


They sound good, too. Be good to hear a review if you get any of them.


The eBook bundle is $49 which is perhaps a little higher than average but doesn’t seem unreasonable.


For a pdf that doesn't come with the data for exercises it's borderline insulting.


Oh wow, used to work with Kris Jenkins but had no idea he was a SQL-guru, I always went to someone else on the team with queries!


He's a comedic genius too!


SQL is both slightly ugly (which I attribute to the historical decision to use natural language words instead of the mathematical symbols of the relational algebra notation - the symbols make the expressions more readable once you learn them) and very beautiful. I'm sad most of the people have forgotten it and prefer to use ORMs today. Most often (except when coding a stupid low-feature phonebook/todolist tutorial) it seems to me writing SQL queries manually is a way more efficient (not only in terms of execution speed but also for developing) solution than using an ORM, except you have to have some fluency in SQL which the most of the today coders seem to lack. Designing a normalized entity-relation model as well as writing the queries for it also feels great, very satisfying.



> For people that complain about SQL being hard to read/write... it could have been worse.

Thank you for the link. To me, however, this sounds like writing "ADD a TO b" instead of "a + b" and saying using the latter would be worse. Surely the former can be easier for a totally uneducated person to pick up immediately but as soon as you invest some humble time into learning the notation the latter becomes much easier to read than the former.


SQL lets you do "a + b". The issue was with the notation, really hard to do that in a text editor.


> SQL lets you do "a + b".

When it's about actually adding 2 integers - sure. But when it's about the relational algebra - does it? Can you actually just write ⟖ instead of RIGHT OUTER JOIN when querying a real database?

Also, I don't really understand why is it supposed to be hard to do in a text editor.


I'm not sure what arbitrary symbols* as operators gets you. "It's built on concepts from relational algebra" does not mean "it must literally be a 100% compatible relational algebra mathematical engine." That is not the purpose of an RDBMS, and the purpose of relational algebra isn't to find a use for sigma, pi, and rho. Nevermind how eye-bleeding the difference between ⟖, ⟕, ⟗, ⋈, ⋉, ⋊, and ▷ will be in a sea of table and field names.

(* Yes, I know those symbols are the conventional relational algebra symbols. They were still chosen arbitrarily as notations built on top of the multiplication symbol borrowed as the Cartesian product symbol.)


I just really hate seeing/typing RIGHT OUTER JOIN when it could be just ⟖. It feels like it takes me a microsecond to read the symbol, translating the glyph straight into its meaning in my head instead of subvocalizing it first and thinking what does this mean then. This takes way more time and struggle with the words. Also the whole query consisting of many such words becomes too big to read easily or split into logical parts visually and also looks ugly from the purely aesthetical point of view.

To me sea of the SQL language elements represented with words intermixed with table names and other words into one uniform ocean of words seems at least no better than the names intermixed with the distinct kind of symbols (each of which I recognize instantly).


From what I remember, I used to write a lot of JPQL, when I had to to do some Hibernate stuff. isn't it standard way to use ORM? (or do people really traverse their connected objects, creating a load of n+1 select?)


https://youtu.be/LEZv-kQUSi4?t=483 This talk goes into many anti-patterns, but he gets to the real problem with SQL, that it is ONLY a DSL and lacks either an API or a fundamental information model to program against. Composability when working with a DSL is a nightmare.

I also have been moving away from ORM to directly writing SQL, but lets be honest about the reasons that programmers are wary of working directly with SQL as they are legitimate issues.


I had to work with TypeORM at my last place and hated it. Prior to that I was at a place with a lot of hand written SQL and while that wasn’t perfect it was much easier to work with


I'm surprised, no one mentioned https://www.sql-ex.ru/ It has few hundred graded SQL exercises (some of them are very hard) + there is a book http://www.sql-tutorial.ru/ that goes over most common errors from submissions. On top of that, there are competition problems with the leaderboard. On top of that, I highly advise everyone to read T-SQL Fundamentals, by Itzik Ben-Gan (and few other books by same author) and do the all exercises in the book.


I guess we were just a few minutes apart in our comments. I have gone through the website and yes I highly recommend it.


I'm of the firm opinion that you can have a very successful career purely as an SQL genie.


We are trying to revive the art in our shop. Just got done building a really clean 3NF model for our problem domain and we are in the process of moving 100% of our custom configuration needs to SQL queries.

From a biz strategy perspective, we can scale up a lot faster if all we need to do is find people who know (or can be taught) SQL.

Consider the amount of time it would take to ramp someone on 1 SQL schema vs the entire C# ecosystem.

For us, the application is quickly turning into a dumb funnel that just gets data and requirements (queries) into SQLite databases for eval. We put a web interface around all this so it can be easily managed on a per-customer basis.

When everything is a SQL query, you can trivially export/import/clone customer configurations to rapidly bootstrap new ones.


This is great if you have highly-connected client apps. Its not as doable when you can't guarantee connectivity. In those cases, you need to ship/duplicate business logic to the clients to work around the network issues. Its a hard problem.


We do 100% of our processing server-side. All the hard stuff happens there. Spreading biz state across multiple computers would be a massive mistake for our application. Our client devices are effectively dumb terminals that just send user events to the server and display whatever view model is returned.


SQL wizards are a rare bunch. It's not like the golden era of DBAs is going to come back, but as an upside SQL is the lingua franca of data engineering. So, yes.

Disclosure: I write SQL queries just like Tarzan spoke English.


Not purely, I do other data related dev & admin work, and other infrastructure & security stuff[†], but I do fairly well out of being the senior[‡] around here who gets passed the nasty SQL problems (which usually turn out to be a missing index, or complex statements entirely consisting of non-sargable predicates that can be easily rearranged with a little thought, but are sometimes more interesting).

[†] I used to be considered completely “full stack” but that was numerous years ago, and I enjoy my non-techie hobbies too much to have time to keep up-to-date with everything!

[‡] “senior” as in citizen…


I feel fairly confident that I can write SQL queries for basically any result set (On a well designed DB). But the project I am working on is close to breaking me (VERY bad DB design, in progress)

Also: I think knowing your way around triggers and such is quite important too though. I should learn them properly some day.


Triggers can be an absolute nightmare to debug, and basically impossible to retroactively determine why a particular trigger did or didn't fire.

I've had the displeasure of working with databases with triggers that fire other triggers, and all that logic should've been moved into the application itself. They're a powerful tool to have in your toolbox, but should be used sparingly, and should be kept as simple as possible.


I remember looking at a MS Dynamics AX database, not hugely complex by ERP standards with about 5000 tables, and realising it didn't use foreign keys. That was fun.


I can, sadly, one up you on that:

The same table stores: Addresses (primary, secondary), sex, names (up to 3), date of birth, what currency, and many MANY more values.

Yes, that's one table.

Also: Another table literally stores full tables in it. (Basically some kinda key with which to identify the subtable so you can select on it.)

Progress has no real concept of set based queries, instead it accesses all tables like a cursor.

And that's not even scratching the surface. The DB is bad and should feel bad. Just yesterday I went into a 2 hr rant about it with some people I often talk to.

Foreign keys, too, are a foreign concept to progress. But hey, work is work.


In some cases, we aren't in a position to normalize the database due to legacy situations.

However, one thing that could be done is to use views (maybe scoped in their own schema) to make the database look normalized (Facade the database). This would help with writing future SQL.

It is then possible to iteratively normalize the underlying tables by pointing legacy code at the normalized views until the denormalized tables are no longer in use. Finally, "convert" the views into tables and drop the denormalized tables.


That is actually basically what we are doing. Except the whole application gets replaced too.

(I have to write the sync code from the new system to the old one)


> The same table stores: Addresses (primary, secondary), sex, names (up to 3), date of birth, what currency, and many MANY more values.

Some denormalization is sometimes warranted. As for the rest I agree with you. Sounds like madness.


Denormalization implies having it thought out

This DB is many things, but most definitely not thiught out.

Basically nothing is normalized.


I distinguish that practice as “Abnormalization” (HaHaOnlySerious).


DBA's definitely used to have - still have? - a reputation as being paid big. They likely have a decent job in modern-day "big data" as well.

Personally, unless there's a compelling reason, I'll stick to SQL for 'core' data storage. NoSQL is cute, but changing data structures over time is horror.


One of the best paid courses here (if u want to train interns in ur co) is

https://www.udemy.com/course/postgresqlmasterclass/ (created by Adnan Waheed - Founder of KlickAnalytics)

https://www.udemy.com/course/postgresql-from-zero-to-hero/ (created by Will Bunker - founder of Match.com)

Not too expensive ..but very very good.


Nice and interactive, seems like a good place to start.

Although the second question ("Find the director of each movie") does accept "SELECT title as director FROM movies;" as an answer.



The exercises are great, and I like that the feedback is quick, as soon as you solve it you get a checkmark and it's already ready for you to do the next exercise.


This might be one of the best example of Bret Victor - Inventing on Principle talk [0]. I wish there are a lot of tutorial or course with the same quality as this one.

[0] https://www.youtube.com/watch?v=8QiPFmIMxFc.

Original link on Vimeo is dead: https://vimeo.com/36579366.


Folks at Stripe recommend this internally and get a lot of people up and running doing analytics (even some lawyers...). It's pretty good.


I recommend trying out https://www.sql-ex.ru which has some good exercises which can trip up a lot of people. I have used quite a few questions from here as an interviewer. Going through it completely is a solid foundation for tackling SQL at work.

As a baseline, if one can solve https://www.jitbit.com/news/181-jitbits-sql-interview-questi... then you can be confident you can nail entry-mid level positions involving SQL.


If you can add a simple way to do proctored SQL questions for interviewing purposes that might be a good way to monetize this. Though, not everything needs to be monetized but if you wanted to it's an idea.


In the same vein, https://selectstarsql.com/


Another good site to learn the basics and have some fun in the process .. has been on HN before .. is the SQL Murder Mystery ! https://mystery.knightlab.com/


very useful - immediate feedback indicates a fresh approach to an old problem!


Thanks! Immediately sent to my friends who are learning SQL!


Do those lesson allow to learn only the relational tricks of SQL? or does it cover also for example hierarchical, recursive queries, etc?


Thanks a lot for this. It uses sqlite underneath?


Apparently yes, sqlite running locally in your browser.

https://old.reddit.com/r/webdev/comments/34h9i3/sqlbolt_inte...


You can build your own site like this with CourseMaker[1] (disclaimer: I'm the founder). We don't have SQL support yet, but you can create interactive exercises with JS, Python, Go, Rust, C# and Java.

I learned to code through these kinds of sites (codeacademy and code school especially), I think being able to tinker in the browser with no setup is great.

[1] https://coursemaker.org


Fun tutorial! Some form of autocomplete would be nice.


Lesson 14.3 accepts update without any where clause (set all rows) https://sqlbolt.com/lesson/updating_rows


Great tool, definitely will come in handy


old but gold




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: