Hacker News new | past | comments | ask | show | jobs | submit login

Django's prefetch_related mechanism essentially implements the pattern you describe here for you: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...



Thanks. Yeah, I think I used that years ago when I first ran into this problem, and it worked well. Whether one uses an ORM or not, one needs to know how to use one's tools. My problem (not just with Django, but with ORMs in general) is how they make bad code look good. Like the following (I don't know Django well anymore, but something like this):

  users = User.objects.all()
  for u in user:
    print(u.name, len(u.comments))
To someone who doesn't know the ORM (or Python) well, u.comments looks cheap ("good"), but it's actually doing a db query under the hood each time around the loop ("bad"). Not to mention it's fetching all the comments when we're only using their count. Whereas if you did that in a more direct-SQL way:

  users = select('SELECT id, name FROM users WHERE id IN $1', user_ids)
  for u in user:
    num_comments = get('SELECT COUNT(*) FROM comments WHERE user_id = $1', u.id)
    print(u.name, num_comments)
This makes the bad pattern look bad. "Oops, I'm doing an SQL query every loop!"

The other thing I don't like about (most) ORMs is they fetch all the columns by default, even if the code only uses one or two of them. I know most ORMs provide a way to explicitly specify the columns you want, but the easy/simple default is to fetch them all.

I get the value ORMs provide: save a lot of boilerplate and give you nice classes with methods for your tables. I wonder if there's a middle ground where you couldn't do obviously bad things with the ORM without explicitly opting into them. Or even just a heuristic mode for development where it yelled loudly if it detected what looked like an N+1 issue or other query inside a loop.


I agree that this stuff can definitely be handled better.

https://github.com/django-query-profiler/django-query-profil... has a neat option for detecting likely N+1 queries. I usually use the Django Debug Toolbar for this.

Django's ".only()" method lets you specify just the columns you want to retrieve - with the downside that any additional property access can trigger another SQL query. I thought I'd seen code somewhere that can turn those into errors but I'm failing to dig it up again now.

I've used the assertNumQueries() assertion in tests to guard against future changes that accidentally increase the number of queries being made without me intending that.

The points you raise are valid, but there are various levels of mitigations for them. Always room for improvement though!


Thanks for the thoughtful responses and the link -- Django Query Profiler looks nice!


    users = User.objects.all()
    for u in user:
        print(u.name, len(u.comments))
This is fine if you are working with a small data set. It is inefficient, but if it's quick enough, readability trumps efficiency IMHO.

Django ORM has a succinct way of doing the "SELECT COUNT(*)" pattern:

    users = User.objects.all()
    for u in user:
        print(u.name, u.comments.count())
And you can use query annotations to get rid of the N+1 query issue altogether:

    users = User.objects.annotate(n_comments=Count("comments"))
    for u in user:
        print(u.name, u.n_comments)


> This is fine if you are working with a small data set. It is inefficient, but if it's quick enough, readability trumps efficiency IMHO.

And this is how you end up with the problems the parent is describing. During testing and when you setup the system you always have a small dataset so it appears to work fine. But when it’s real work the system collapses.


A good habit when writing unit tests is to use assertNumQueries especially with views. It's very easy even for an experienced developer to inadvertently add an extra query per row: for example if you have a query using only() to restrict the columns you return, and in a template you refer to another column not covered by only(), Django will do another query to fetch that column value (not 100% if that's still the case, but that was the behavior last time I looked). The developer might be just fixing what they think is a template issue and won't know they've just stepped on a performance landmine.


Sounds like a good habit, yeah. But the fact that an attribute access in a template (which might be quite far removed from the view code) can kick off a db query is an indication there's something a bit too magical about the design. Often front-end developers without much knowledge of the data model or the SQL schema are writing templates, so this is a bit of a foot-gun.


I've wanted a Django mechanism in the past that says "any SQL query triggered from a template is a hard error" - that way you could at least prevent accidental template edits from adding more SQL load.


My SQL is a bit rusty, but, to reduce the queries even further, I think you can do:

> SELECT user_id, COUNT(*) FROM comments WHERE user_id IN $1 GROUP BY user_id


> Or even just a heuristic mode for development where it yelled loudly if it detected what looked like an N+1 issue or other query inside a loop.

Fwiw, ActiveRecord with the Bullet gem does exactly that. I'd guess there's an equivalent for Django.


select_related and prefetch_related are absolutely essential when coding Django.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: