We used a third party tool for mailers and it was having severe performance problems after moving the server to a cloud based solution.
We confirmed what the DBAs said, the latency was only 34 seconds round trip and the database was pretty fast. I installed wireshark to see what was going on. It was querying a table with 80,000 entries, and then for each entry it would do another, "select ... limit 1" query to get the meta data on each record. This loop was abstracted away in an ORM and the third party had very few clients with tables of our size for that particular table.
When it was run on prem it wouldn't take too long, but when it was on cloud it would take 45 minutes. Insignificant amounts of time being spent thousands of times ends up being significant time spent, and this can creep up on you as more things move to be serverless / API driven. A bunch of money and time was spent on figuring out the source of the issue.
I'm kind of curious about what data the select limit 1 queries were returning that the ORM couldn't get in a single query and had to go back per record. Can you shed any light on this?
ORMs have a degree of flexibility. It's possible to write n+1 queries accidentally, especially for developers who are new to the ORM. It's also often possible to address those issues. Sometimes trivially, sometimes not.
The other thing that comes to mind here is that maybe the query was hitting a non-covering index and that was triggering the lookups? In that case a covering index would have fixed the issue?
Maybe it's because the problem isn't directly caused by ORMs but just a very poor usage of ORMs. The same problem would have existed if a loop was written to perform the same query.
Sure you can make arguments one way or another regarding if a hand-written block of SQL would have the same flaw, but if an experienced DBA or developer writes it, I would bet on their output way over anything an ORM outputs.
If you consider the software development process as a whole, the explicit SQL approach intrinsically guarantees additional scrutiny of the actual SQL statements. If you hide all of this behind an ORM, all that is seen during code review time is some beautiful POCO model with a few extra IEnumerables thrown on it. No one is paying attention to the man behind the curtain and the horrible join that was just created automagically.
Perhaps the answer is to just log and profile all the ORM generated SQL - Sure, but if you could look at the ORM's output (which in some cases is abusively large) and quickly determine if its good or not, why not just write it yourself and be 100% sure from the start?
You literally need the same piece of knowledge to to avoid that n+1 problem in both Raw SQL and ORMs: you have to use a join.
Developers have been making n+1 mistakes with Raw SQL for years before ORMs became popular.
It doesn't matter if the "DBA or experienced developer" makes a select query that is better than the ORM. If this select query is inside a loop then all bets are off already.
We confirmed what the DBAs said, the latency was only 34 seconds round trip and the database was pretty fast. I installed wireshark to see what was going on. It was querying a table with 80,000 entries, and then for each entry it would do another, "select ... limit 1" query to get the meta data on each record. This loop was abstracted away in an ORM and the third party had very few clients with tables of our size for that particular table.
When it was run on prem it wouldn't take too long, but when it was on cloud it would take 45 minutes. Insignificant amounts of time being spent thousands of times ends up being significant time spent, and this can creep up on you as more things move to be serverless / API driven. A bunch of money and time was spent on figuring out the source of the issue.