Hacker News new | past | comments | ask | show | jobs | submit login

> At the volumes my organization works with, it is very difficult to write performant SQL queries that use JOINs and other relationships as a developer - even as a DBA! - and often much easier, for me, to write performant application code.

How can this possibly be true? Won't that result in sending unnecessary data over the wire, stressing network and SQL buffer?

What are these queries and what are these volumes? I just can't wrap my head around the performance statement. You know better join algorithms that SQL Server is capable of (Loop/Hash/Merge)? Given that you have appropriate indexes in place, perhaps query hints is what you need to control sql plan guides if you know a thing or two about your data and it's distribution more than the sql statistics.




Joins are severely misunderstood and often incorrectly used. I've seen code that had the wrong join and would return 1,000s of rows which then had to be programatically squished down into the data we actually wanted. Some hand crafted SQL usually fixes this

Also some ORMs write dreadful SQL where it comes to joins

A badly written join (or collection of joins) will take a longer time to run that will, when the system is under load, backlog other queries. If these errant queries make up a significant portion of your queries then it will hit performance significantly

It's not the joins themselves just the incorrect use of them


You said what I'm trying to say way more clearly than I did.

A lot of people have the same concern but I'm just gonna reply to this comment.

The ratio of SQL-focused devs to non-SQL-focused devs at my org is not favorable. And we certainly DO write joins... just not complex ones. Likewise, we do use constraints... just not all the time against multi-billion-record tables.

But that's not all. Our biggest tables are also our oldest and most unwieldy. Here is an (admittedly outdated in the specifics) example of what it's like to add constraints to a big table in SQL server that didn't have them already: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/3eb...

So for a dev to come along and add a whole bunch of FK relationships and/or write some big fun queries against one of these tables is asking for a lot more than one realizes immediately. New devs join and run up against this all the time.

Is it a good or great situation? No. But that's not the question. The question is is it a real one and why.


Joins may be severely misunderstood by some people that write front end, middleware and backend code in the same day, but not by dedicated SQL developers. If the app is big enough and important enough, having dedicated SQL devs is the solution.


I'm not GP so I don't know what they meant, but a key upside to complex logic in the application layer vs the database is that the application layer is often much easier to scale out than the db. Where I work, if I run out of memory in the app I just change a configuration variable and k8s gives me more instances instantly. But if the database is memory constrained and I'm already on the biggest server available to me...I need to re-shard or take some other more sophisticated approach (tuning, replication, other stuff I don't know about).

Of course each scenario is different, YMMV, and as always "it depends".


> key upside to complex logic in the application layer vs the database is that the application layer is often much easier to scale out than the db

I think the point of the GP is that all of these application instances are still connected to the DB, doing sub optimal data fetches taxing the database in multiples.


One example: MySQL table locks. The application can do in parallel what the database can't.


Locks ensure you get committed data back and not some data that is in-operation and may or may not end being persisted. If you don't care about it: SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;

And SQL can do parallel operations too.


You made the parent commenter's point for them. You went over the heads of half the developers with join algorithms and index hints. That's just how it is, unless you're at a company with a very high bar for hiring and training.


But, besides index hints, the developers don't need to worry about those things if they use the database to perform joins. The database management system chooses for them and does it pretty well (counterproductive index hints are not unheard of).

If they do it in application code, then they probably ought to learn about fancy sorting and joining algorithms.

But they should really just do it in the database (using read only replicas if the load gets high).


Using read replicas isn't always an option depending on your write throughput and consistency requirements.

The database can only do so well (and will spend a lot of CPU cycles working on your crazy query plan, because getting it wrong is more expensive, so now you effectively limit capacity regardless of how good your storage engine is).

Joins are great, tons of research went into making joins work, and lots of different join algorithms and optimizations based on data sizes, indexes, etc. But you really have to be careful, verses just denormalizing data across multiple tables/collections. Most applications are read-heavy, anyway... I generally plan for things to be successful, in which case joins don't usually work in the hot path.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: