Hacker News new | past | comments | ask | show | jobs | submit login

In the project I'm currently working on, objects are fetched from the database using ORM queries and then the results are iterated over, filtered further and additional queries are made to fetch more data. This is then all sent to the frontend, where only about two or three of the couple dozen fields are actually used.

I found this code because I was wanted to fix an issue where if you sort the list in the UI, it only sorted the current page (because the sorting happened in the frontend only). So I wanted to change the query so that sorting and paging are both done in the database. What a rabbit-hole that was! I was able to refactor the code to not over-fetch quite as badly, and to move the filtering into the initial query (so only additional data fetching happens after the initial query now, but it does still over-fetch fields and that's not easy to change since this function is called by many endpoints and the set of fields they all want varies). I had to log all of the queries so that I could construct a new query and then convert that back to ORM code. It was a very tedious task, but I managed to get the query from taking 9 seconds to under a second.

In a previous project, I was tasked with optimising the database, because requests were incredibly slow. The issue was that no care had been taken 1) with indexes -- indexed fields were too large, and 2) with locking. The locking was the big one as there were some requests that caused entire tables or large amounts of rows to become locked, which in turn would prevent other transactions from completing, so otherwise fast queries were now also slow, which in turn... you get the idea. This was in a large well established codebase, so it was not easy to fix and it still wasn't fully fixed when I left that team.

Many of these issues could be avoided by considering the data and access patterns when designing your database model (which doesn't have to be identical to your application model -- database should IMHO be modeled around query patterns and application model around domain features), with some care taken for indexing and locking. You want to make sure that different users usage patterns overlap as little as possible to avoid locking, that your indexes are small, that your queries fetch only what you need (both in terms of columns and rows).

None of these issues are the ORM's fault per se, but the ORM encourages developers to think in terms of their application model and primary programming languages OO facilities, so people (in my personal experience, at least) tend to mix database queries and application logic too much (like what I described above: pull data from DB, do some filtering that maybe should have been in the query, do more queries that maybe or maybe not should have been joins), tend to overfetch (often the ORM pulls in too many columns unless you make it not do so, which often is not done), using indexes as an afterthought and not considering locking at all. I think because ORM make it look like "just some more application code", these things are often overlooked, while with SQL queries, its a little more obvious that it's executed differently. I would have said its down to junior developers or whatever who just haven't learned to take care when writing queries, but I've only ever worked with two people who actually did this and I can't believe that the rest of all of the teams were just inexperienced.

I'm no database expert, but I've managed big wins with small amounts of care and by understanding what queries I'm running.

PS: The book SQL Performance Explained is quite good, in my opinion, and if you use Postgres, then Mastering PostgreSQL 11 is great (and it starts off talking about transactions and locking).




This fits my experience as well. Tracing the queries SQLAlchemy builds based on the models of our Python application the DB is returning all of the columns for every model instead of just the columns that will then be returned via the API to the client. They’re not even just the queries to compute the relation but all the columns from every relation involved and then at return time only a small subset of columns is used. Its terrible for query perf as we almost never do just index scans because we also have to pull data from the other columns. It’s a waste. I’m not sure how to fix that just yet without breaking the ease of use. The idea of being able to say model.select_one_or_none()... or something similar I can easily tell is quick and painless and easy to train about but doesn’t do the Adan justice. And then I hear the DB is slow! It’s a battle I’m losing but I hope to help improve where I can.


This sounds exactly like my current project. The ORM is pulling all fields for the model objects and related objects instead of the two or three it actually needs and there’s no easy fix because too many unrelated features make use of the same query code, but use different fields. I’ll slowly change it to using multiple single purpose queries probably, but it’s going to be a tedious task.

As an experiment, I changed one to only fetch the fields that it needs and the query ran in approx. 15% of the time. That’s a pretty big improvement! Sigh.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: