Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What is the best online resource to learn advanced SQL?
448 points by rorykoehler on Jan 17, 2017 | hide | past | favorite | 97 comments
I too often find myself getting stuck when it comes to more advanced SQL queries. I have never actively learnt SQL and realise that I don't have a great mental model of how it works. I would like to improve this as it is killing my productivity from time to time.



http://use-the-index-luke.com/

Is really good, though its more related to indexing than SQL specifically.


I second this. I prefer reading on pdf, so I bought the ebook[1]. The book explains indexing perfectly, and it shouldn't take you more than a day to finish. I can't recall a book having a better benefit/time ratio. I wish the author would release more books in this vein, but he hasn't. So right now I'm looking at Designing Data-Intensive Applications to learn more about different kinds of databases [2].

[1] http://sql-performance-explained.com/

[2] http://dataintensive.net/


He's started work on his next book[0] which looks to be interesting.

[0] http://modern-sql.com/


There's a lot of good content here online for free as well.


Sightly tangential ... which device do you read PDFs on?


Original iPad (series 1).

I had a Samsung Galaxy Note 12.2 for a while, but the built in reader (iBooks) in the iPad is far superior to everything I tried on Android over a period of months.


I used to have a high res inexpensive tablet that I loved to bits, but I dropped it carelessly and I'm finding it impossible to replace the screen. Right now I use Preview on macOS. Since I'm on my laptop anyway, I make notes in a text editor. That helps me remember what I read.


Kobo Aura One. Bit more expensive yet high quality.

Preview on macOS is also good.

EDIT: I don't know, when that review was written the device was just released. The device received various firmware updates since then. I convert everything to EPUB with Calibre.



I really wish there were at least one mass market device that wedded a good eink screen to a reasonable CPU / RAM. There was a crazy expensive Sony PDF reader. But I really just want a Kindle DX (Aura One okay too) sized screen that doesn't struggle to render PDFs.


Can recommend also. Very good and clear explanations.


Not the parent poster, but I consume a huge amount of writing in PDF form.

* Most modern browsers have it built in

* Preview on MacOS is good

* Kindle and iBooks import PDF files

* SumatraPDF and Skim, depending on your OS


Thanks for the link. This is a very nice resource. Just started reading but I am already finding it very useful.


[self plug]

On https://pgexercises.com I focus quite a bit on developing a mental model for SQL. I would suggest not skipping the easier exercises - even if you can do the SQL, you might find the explanations useful.


I worked through many of these when discussed here 7 months ago https://news.ycombinator.com/item?id=12022953 ... really enjoyable, great refresher, and learned some things I never knew before.


Thanks :-)


Man, thank you so much for creating this. You should find some way to monetize it. I thought I had a decent grasp of SQL before, but this is what really helped me grok it ... especially the consistent 'everything is table' lesson.

Oh and I had an idea for practicing deletions/insertions/updates & other behavior that you couldn't do on web: create an Electron app that uses a local db. I'd buy that in a heartbeat.


No problem - it's really gratifying to know that it helped you. I'm not so interested in monetizing - I've received an awful lot from the wider OSS community, and I did the project as a way to give something back.

That is a really good idea on the app. I do have some ideas for making DDL and DML available via the web interface, but sadly I'm extremely short on free time in the last couple of years.


> extremely short on free time in the last couple of years

I wish you and the flexboxfroggy guy has a patreon account so I could pay y'all to make games/exercises that improved my mental model of things.


I'm fortunate in my career - while more money is always nice, I'm at the point now where it would have to be an awful lot to persuade me to take more time away from my (very young) family. I suspect all a patreon would accomplish would be my still producing slowly on the site but now feeling guilty about it :-).

Maintaining the site is in many ways its own reward - it gives me a great deal of joy to see how much it's used now. Improvements are actually a reasonably high priority for me, it's just that toddlers have a way of creating their own special priority levels :-).

All that said, I very much appreciate the sentiment!


I've been looking for stuff like this - thanks.

I'm trying to start a career as a BA and expect that it will be handy to know SQL, which is fine, but I haven't been able to find anything out there that simulate the kind of scenarios and tasks SQL would be used in. I suppose your site is a step closer to that.


Hows the step into BA going for ya? What kind of things besides SQL do you have to learn?


Not great! Plenty of vacancies for BAs and junior BAs alike, but they all stipulate experience. I've been looking for graduate BA vacancies, but - outside of London - they're few and far between. A few BA roles are also quite heavy on the numerate side of things, which I have to cast aside.

Aside from SQL, I've got UML and Visio on my to-do list. There's also the BABOK reference, but it sounds like it's something that only really starts to make sense when you have something to apply it to and observe.


You could also add some ads, I'd gladly deactivate my adblocker, a site that produces so much value should get something in return.

Great site, thanks


I second this. A few months ago, I went through all the exercices (completing everything but 3 of the most advanced).

Very easy to get going and very hands-on.


Thanks, appreciate it :-)


This looks great. Exactly what I'm looking for. Thanks.


No worries - hope you find it useful.


(Top menu broken on IOS iPad)


Thanks - I'll try to get hold of an ios device to test it on. Sadly my webdev skills are fairly lol-worthy ;-)


Maybe this is obvious, but the PostgreSQL manual [1] is excellent. The first 100 or so pages (at least up to and including chapter 7) will give you a good overview. I suggest you read through them at least once. Then you can dig into the rest of that manual at leisure, exploring topics that interest you. It's a great reference too.

[1] https://www.postgresql.org/docs/manuals/


I've found the Mode Analytics course to be quite useful : https://community.modeanalytics.com/sql/tutorial/introductio...

The HackerRank SQL challenges were also helpful in getting some extra practice: https://www.hackerrank.com/domains/sql/

Finally, this Quora post will also point you to some useful resources and has some great tips that I'm working through now: https://www.quora.com/How-do-I-learn-SQL



Lukas Eder has a fantastic blog about SQL, Java, FP, and his library jOOQ: https://blog.jooq.org/.

One of the things I sometimes find tricky with SQL is pattern matching a problem to a solution. It can be tricky to describe what you want and sometimes direct human help can't be beat so I do recommend Stackoverflow (normally I have mixed feelings about SO). There are few power users on SO like Craig Ringer and the horsesomethingsomething (can't recall the actual handle) that are helpful and friendly.


https://www.postgresql.org/docs/current/static/index.html

Postgres has an excellent manual. It has an internal scripting language called PL/pgSQL which is (to put it politely) not intuitive at all. The manual was enough to help me write a query to implement a binary tree search.


I really recommend the books by Stephane Faroult, especially "The Art of SQL".

Unfortunately I could never find anything online by him, except these videos: https://tonguc.wordpress.com/2008/01/29/good-sql-practices-v... and not sure if this is what you are looking for.


+1 The Art of SQL is pretty old now, but is still the most lucid explanation of how queries should be constructed and why.


I created a set of videos for Udemy, but they complained about audio quality so I added them to YouTube. I cover a variety of basics (joins) but also get into subqueries, correlated subqueries, window functions and all along use a freely available database. https://www.youtube.com/playlist?list=PLImyDqSBQbdm09n4BhGwK...


I just finished learning Postgres! I used codecademy's SQL session for the intro, but then just decided to play around with queries. This diagram was my most useful resource, and one that I still look at on a daily basis when trying to create queries - I hope it helps! https://www.sqlite.org/images/syntax/select-stmt.gif

I also look at the postgres docs very frequently for syntax and format and those help me a lot. Stack Overflow is also a great resource!


I found that course excellent as well - although these days if you don't have the "pro" tier of Codecademy you miss out on the quizzes and other things. Still, it was good and pretty advanced. Learned a lot.


That's very true, I actually ended up subscribing to pro because I was also learning JavaScript and HTML/CSS at the time (new programmer), so I found the pro stuff useful. However, if you're not wanting to drop the $20 or so for the month, I'd say that you can find plenty of fantastic online resources.


There are a lot of good resources to learn the "how" of SQL, but not the "why".

The first reason is that the relational model is a combination of logic and set theory, which happens to be great for a lot of business applications.

The second reason is that people have adapted SQL surprisingly well to other kinds of data, like JSON. Even if you try to build a database system specialized to JSON, it will still probably come out worse because getting things like storage, replication, administration, etc. right takes a long time.

The third reason is that data has more value when combined seamlessly with other data. So if you have business data (which everyone has) and JSON, you are better off with a single system that is great at business data and OK at JSON, than two specialized systems.

Keeping these things in mind makes it easier to understand SQL in my opinion.


I'm not sure if this is the level you're looking for, but if you need a better mental model of SQL, you should take a look at Jennifer Widom's course on Stanford Lagunita on Databases: https://lagunita.stanford.edu/courses/Engineering/db/2014_1/...

Starting from the relational model and going through SQL really makes the language make sense.

If this is too basic for you, then, at worst, you'll spend a few hours reviewing the basics and strengthening your foundation.


That is an excellent course, and a good jumping off point. It is somewhat technical and expounds ideas that most DBAs probably aren't aware of.


One of the useful things that I learned as I began to use SQL is to learn how to use and analyze query plans.

This will take you from the theory of indexes to actually seeing how it is used by the SQL engine.

This will go a long way to building the right indexes and SQL tuning in general.

I would suggest learning how to generate the query plans for your SQL engine of choice.


This!

This is key. I cannot count how many times I go to a client that is complaining that their software is slow and ends up that is because they don't have the right indexes


The difference between a poorly performing product that uses an RDBMS backend and a highly performant one is purely down to SQL query tuning and finding the right indexes.

It is the lowest hanging fruit to improve performance.

Even if you don't have the flexibility to change the SQL on a project, you may still have the ability to create/rebuild the indexes to make the query faster.


I dont know if this is the right place to ask it, but I will go:

Sometimes I see job offers that ask for experience with large websites, large databases, large servers, etc, etc, etc (you get the idea).

How do you land those jobs if getting in that kind of subject is impossible alone? You can't simulate that kind of things at your home, so unless your side project grow and you must learnt it the hard way, or you had luck to be at a company where they allowed you to be involved, how do you learn that? Thank you


You don't need to work for a company with large datasets to start learning and playing with large datasets. There are freely available repositories of data you can play with. Checkout the AWS public datasets - https://aws.amazon.com/public-datasets/. There should be more then enough there for you to start playing with "big data" tools and getting familiar with how to store, ETL, and query large datasets.


Yes, but when a company asks for experience they aren't thinking in playing with large datasets unless you create an open source tool that becomes useful for a lot of people. Just playing with them wouldn't be suffice.


A "large dataset" is basically one that doesn't fit in memory. It doesn't need to be large in absolute terms.




I love the Periscope Data blog. https://www.periscopedata.com/blog/


https://data.stackexchange.com/stackoverflow/queries is a fairly good resource for practising queries and for looking at others queries against a relatable dataset.



Props to Joe Celko's books. I found they helped me reconceptualize how to get SQL to solve problems that I thought required another language.


I'll also thumbs up Celko's books. He takes hard problems and solves them with SQL. His books do assume you already know SQL, and are now looking for better ways to apply that knowledge.


I found that looking at the query plans for the SQL I wrote really improved my SQL skills. It is amazing how it builds a mental model in your head. You learn very quickly what looks good and what actually works, plus you get a really good feel when the optimizer is going to walk off the cliff.


Gamify! Play the SQL game, The Schemaverse [1]. I don't know if this matches the request for "more advanced queries", but it is entertaining.

[1] https://schemaverse.com/


With the ability to use pl/pgsql within the game, it can get as complicated as you want :)

It doesn't necessarily teach each advanced technique, but it does give you a good place to practice what you've learned.

(Thanks for the mention!)


Code School has 3 courses on databases, 2 on SQL and 1 on NoSQL. https://www.codeschool.com/learn/database

The Try SQL course is free, the other 2 you have to pay for. https://www.codeschool.com/courses/try-sql


I would start with the Mode Analytics SQL tutorial. It's fairly robust and gets into advanced topics: https://community.modeanalytics.com/sql/tutorial/introductio...

That will get you pretty far but the advanced SQL topics will require the study of what the underlying database provides and how it works. Every database is different in terms of how it implements advanced features, if at all. For example, MySQL doesn't have window functions. For postgres related topics, the documentation is excellent, postgresguide.com gives a high level overview, or you can follow craig's blog (http://www.craigkerstiens.com/) that provides a gentle introduction to many of these topics as well.


Thanks for the mention, I also have an article that walks through not what I'd call advanced but it's at least intermediate SQL here as well - https://www.citusdata.com/blog/2016/09/12/fun-with-sql-compu...

Whenever I do come across blogs that have good articles we aim to feature them in Postgres Weekly, so if you want a regular stream of that type of content it's worth checking out - http://www.postgresweekly.com


The Postgres Weekly newsletter is a really valuable service. You not only aggregate information but filter for quality. Thank you for providing this! I've said this before but it warrants re-mentioning :)


Not online, but SQL Antipatterns is the book that had the largest impact on my thinking:

https://www.amazon.com/SQL-Antipatterns-Programming-Pragmati...

But as you said, you haven't actively learned SQL, so probably need to find some free data sets to work with.

You can probably start with Data is Plural. That will, at least, give you some raw data sets so you can get started on learning how to build up a database from unorganized data first:

https://tinyletter.com/data-is-plural

Edit to add: First and foremost, you have to learn normalization. Without that, you aren't doing any SQL.


A little bit unorthodox, but check out http://sql-ex.ru/ (even though it is in .ru , website itself is in English) especially their advanced exercises.


+1 Also present at http://www.sql-ex.com/

Not shiny, but very functional. It's been around 5+ years and has lots of great practice for advanced queries.


This was quite a gem when I was practicing some advanced SQL. I find the problems become legitimately hard and take me quite a while to solve. I recommend it to anyone interested in seeing how certain datasets can require quite a bit of creativity to extract using SQL.


The thing about advanced is I keep learning there's further levels of advanced.


I learned in college that the words "Introductory" and "Advanced", when appearing in technical book titles, often don't mean what you think.


I took the Stanford Database MOOC course. A lot of work, but you get a pretty deep understanding from it: https://lagunita.stanford.edu/courses/Engineering/db/2014_1/... https://henrikwarne.com/2011/12/18/introduction-to-databases... My review of the course:


Oh Goodness, please don't suggest the Stanford Database MOOC. They teach in outdated standards.


Everyone teaches outdated standards (eventually). The important thing is to come away with an understanding of the principles (which is what I interpret the OP to be asking for).

I took the Stanford course when it was first offered online, and I regard it as one of the best things I ever did. You're absolutely right in that there's a lot of newer stuff that it didn't teach me, but the database's documentation is usually sufficient to plug that gap.


Two of the best resources I've ever seen/read:

Database Design for Mere Mortals - https://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp... (not an affiliate link)

Legend of the Drunken Query Master - http://www.joinfu.com/2008/09/slides-from-drunken-query-mast...



Others have posted some really good resources. I paticularly like Periscope Data's blog ( ex : https://www.periscopedata.com/blog/effective-revenue-chart.h... ) , they have covered a wide range of topics. Get a public dataset and start querying for unique insights.

I would also encourage you to learn a bit of Relational Algera if you really want to improve your mental model.


> learn a bit of Relational Algera [sic]

This. SQL is just a tool for expressing your thinking and learning it doesn't help you to problem solve. The theory, both relational algebra and set theory, will teach you how to think about the problems, not how to express your thinking. That's what's necessary to solve difficult SQL problems.


You should try to understand how databases in general work, it will help you with your query writing.

One thing you have to realize is that once you get a little advanced, you have to get to the details of the single SQL implementations, it's not about SQL but about Postgres.

I've found these books really valuable

# SQL Performance Explained Everything Developers Need to Know about SQL Performance

https://www.amazon.com/Performance-Explained-Everything-Deve...

This book fundamentally talks about how to effectively use and leverage the SQL indices. Talks about all the important implementations (Postgres, MySQL, Oracle, SQL Server).

# Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

https://www.amazon.com/Designing-Data-Intensive-Applications...

This book gets mentioned a bunch around here and for a good reason. There aren't too many concrete resources on making your systems "webscale" and this one is really good.

# PostgreSQL 9.0 High Performance

https://www.amazon.com/PostgreSQL-High-Performance-Gregory-S...

Discusses all the different settings and tweaks you can do in Postgres. It's crazy how much of a perf gain you can get just by twiddling the parameters of the database, i.e. all the tricks you can do when the single instances are bottle necks.

There's a similar book for MySQL https://www.amazon.com/High-Performance-MySQL-Optimization-R...

# PostgreSQL 9 High Availability Cookbook

https://www.amazon.com/PostgreSQL-9-High-Availability-Cookbo...

Discusses how do you go from 1 Postgres instance to 1+ instance. Talks about replication, monitoring, cluster management, avoiding downtime etc i.e. all the tricks you can do to manage multiple instances. Again there's a similar book for MySQL https://www.amazon.com/MySQL-High-Availability-Building-Cent...

Last but not least check out the postgres documentation, people consider it a standard of what good documentation looks like https://www.postgresql.org/docs/9.6/static/index.html

Also last but not least, read up on relational algebra (the foundation of SQL) https://en.wikipedia.org/wiki/Relational_algebra. I've always found SQL to be extremely verbose (the syntax reminds me of idk COBOL or smth) but there's another query language called Datalog, that's for our purposes similar to SQL but the syntax is much more legible.

E.g. check out these snippets from these slides (page 29) (and check out the whole class too)

https://pages.iai.uni-bonn.de/manthey_rainer/IIS_1617/IIS201...

Datalog:

s(X) <- p(X,Y).

s(X) <- r(Y,X).

t(X,Y,Z) <- p(X,Y), r(Y,Z).

w(X) <- s(X), not q(X).

SQL:

CREATE VIEW s AS (SELECT a FROM p)

UNION

(SELECT b FROM r);

CREATE VIEW t AS

SELECT a, b, c

FROM p, r

WHERE p.b = r.a,

CREATE VIEW w AS (TABLE s)

MINUS (TABLE q);


To get a good mental model of a SQL database, there is this pretty technical but great write up from researchers from Berkeley, MIT, and Microsoft research:

http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf


If I could ask something related to this question - what is the best way to grok the OVER clause, such as RANK() OVER? I'm having a bit of difficulty trying to really understand how it works and how to use it effectively.


Best way is to see it in action and experiment. A few videos I made (b and c relate to common queries written for BI or CLTV stuff).

a) row_number over() and partition: in the abstract: https://www.youtube.com/watch?v=-X3eIyZV728

b) applied to customer value analysis: https://www.youtube.com/watch?v=iHxJvF0tZOA

c) applied to time differences (also uses materials from b above) https://www.youtube.com/watch?v=5f8tF4U70Ic

After a while you can write these from scratch and they generally work the 1st or 2nd time, but it takes lots of trial and error at first when setting up the partition and ordering clause within the OVER() expression.


Someone already mentioned the JOOQ blog .. this post should help you out https://blog.jooq.org/2014/08/12/the-difference-between-row_...



Once I understood what a Cartesian product was and how it really relates to joining tables in a query, then I felt everything clicked and building complex queries was no longer daunting.


For Microsoft SQL server,

I recommend reading all things what Brent Ozar [1] has to say

There is also another blog from Dr. DMV on SQL Server performance [2]

[1] https://www.brentozar.com/ [2] http://www.sqlskills.com/blogs/glenn/category/dmv-queries/


Awww, thanks! For the record, BrentOzar.com is also written by my excellent staff, too.


I recommend Database Design for Mere Mortals https://www.amazon.com/Database-Design-Mere-Mortals-Hands-eb...


I'm super late to the game on this. But I made a website http://selectstarfrommovies.com/ with the intent to help people have motivation to learn SQL. Check it out.


You can learn through SQL Injection too, and it's way more fun :) https://redtiger.labs.overthewire.org/


Kind of depends on the db and engine being used, but for MySQL, this book has been very good in helping me optimize sensitive queries:

High Performance MySQL by Jeremy D. Zawodny


To be blunt, the best online resource is a production server being hugged to death. Human alertness, survival, and performance emerges at its very best.


Any great resources explaining normalization, ideally with real-world examples of when to stop normalizing?


For examples of not normalizing, I would look at books on data warehousing


Joe Celko's books !


This's what I want to know


Amazon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: