JSONField has been an absolute godsend in combination with Django's ORM. I had been using it with Postgres and will likely keep our backend the same, but I cannot recommend it enough. You will have to write some validation and schema code on top if you want your data to have similar (but weaker) guarantees to the usual typed fields; the benefits from the flexibility you get are immeasurable though.
In projects I've been involved with, storing JSON in the database has often turned out to be a mistake.
Django models create a well-defined self-documenting structure for your schema, are easy to evolve using migrations, and there's a wealth of tooling built on top. IMHO, these far outweigh the perceived convenience of simply storing some stuff in a JSON field.
If you find yourself implementing your own validation and schema code for JSON fields, I'd say it's a sign that you should probably stop and migrate the data to Django models instead.
There are some cases when storing JSON is fine, of course, but in my experience they are few and far between.
> If you find yourself implementing your own validation and schema code for JSON fields, I'd say it's a sign that you should probably stop and migrate the data to Django models instead.
Don't agree. Where JSON fields with validators are useful is when you would otherwise have inherited models for different variants of the same type of thing. It's a lot easier to understand and query one model with 50 Cerberus (or whatever) schema validators than it is to understand and query 50 Django models. With the JSON approach it leaves one part of the codebase that's complicated to understand, but you mostly shouldn't need to understand it. Whereas when you have tons of different models and each one needs to be used in different places and is queried in different ways and has its own serializers, the mess tends to spread throughout the entire codebase.
I am skeptical (with an open mind) of the JSONField pattern. We’ve used it in a few places where you have something like inheritance, so you want to handle all Event objects the same at the top level, and then dispatch differently based on Type. This works fine.
My concern is that this is a pattern which experts advocate, but less precise/experienced engineers are likely to break when it comes time for a data migration.
With normal SQL fields, if you change the schema you need to create a migration, and that gives an easy point to check for datamigrations too. If any commit can silently change the write-schema, it’s a lot harder to police.
So I suspect if your team skews towards very experienced engineers the JSONField pattern probably gains in value. If that’s right it’s probably best to caveat the recommendation.
Interested to know your experiences with schema changes though.
I’m also interested to note that you are pointing out a general issue with Django I’ve experienced - the Models spread through the whole system. You can solve that in other ways, say by having a repository layer that used normal (non-JSONField) models and maps them through to a POPO. I’m not sure you _need_ the JSONField pattern to get the separation you’re looking for.
> If any commit can silently change the write-schema, it’s a lot harder to police.
In addition to schema validators, you really also need a data integrity async task that runs once a day to catch these kinds of issues. They're not that hard to fix if you catch them early, it's if you realize there's a data integrity issue years later and the original team isn't even there anymore that you get bigger problems. If you have a lot of data, a reasonable solution is just having it run on everything from the last 24 hours plus a random sampling of data older than that.
> Interested to know your experiences with schema changes though.
Haven't done a ton, so admittedly there could be complex cases I'm not seeing. I think it helps though, in cases like the example I gave of using it for form-like things, to set it up like:
QuizModel
QuizQuestionModel (FK to QuizModel)
QuizResponseModel (FKs to QuizModel, UserModel)
QuizQuestionAnswerModel (FKs to QuizResponseModel, QuizQuestionModel)
This way you have four models rather than an indefinite number. But because it's not just one big model with lists of JSON objects for questions and responses, this makes running migrations and data integrity checks much simpler to write and easier to reason about.
The idea being also that you have JSON schema validators for the different types of JSON blobs that get stored in QuizQuestionModel and QuizQuestionAnswerModel.
Like I said, there are some cases where using JSON fields makes sense. Good examples would be cases where you wish to retain the structure of the data, but don't have control over it. Sounds like you're describing a case like that.
I’ve used it more for generating forms, e.g. for when there are different types of questions or modules can ask or show the user —- multiple choice, free response, upload an image, watch a video, etc. So I’ve used JSON here both for storing the questions and responses, and then have used validators for each question type and response type. The downside is that it’s hard to do migrations. But in most cases you can just save the rendered HTML each user is shown and that covers you for having an audit trail, rather than maintaining backwards compatibility indefinitely. And if you do need to maintain compatibility forever, the schema validators enable that.
Having done or seen almost the exact same use for JSON at multiple startups, for whatever downsides there are I think it’s 10x better than needing to try to understand a bunch of models, each with their own queries and logic.
100% agreed, JSON fields are good when you want to dump a bunch of data you don't care about the shape of, but if you need a schema, pull the data into actual fields.
For a database like postgres, or even for something like SQLite, this mostly becomes a distinction without a whole lot of difference, since the database can index and access JSON structures just like regular columns.
JSONB columns are incredibly useful if you have data that doesn't fit well into a strict relational schema. And they are very powerful in terms of query ability in Postgres, but still quite far from a conventional relational schema.
Queries that act on the insides of a JSONB column are much harder to write than the equivalent conventional queries. They are also much, much slower in certain cases as the DB has to read the entire JSONB blob in many cases, and because they don't have proper statistics. The performance can range from slightly slower to completely pathological query plans that take ages.
I'm a big fan of JSONB in Postgres, but it is no replacement for a relational schema, there are far too many problems you inflict on yourself if you try to use it like that.
And it's just harder to use, I'm with it being generally a 'mistake' (in the 'you will regret this later' sense) - it's better than text, but I'm in the camp of as much structure as possible ('you will thank yourself later').
As others have mentioned, it definitely does have downsides when it comes to writing queries and especially when using frameworks/libraries to interface with the db.
I've been working on a project using schema.org objects (100's of types of objects) in postgres in a graph/tree like structure. While structured the objects are not consistent enough for me to want separate schemas for them. Using JSONb in postgres allows me to index separate fields within the JSON while still having all the objects in a single graph.
If I couldn't use JSONb with indexes (and triggers for custom foreign key checks) it'd be much harder to do this in in a relational database (and the data is relational, it just isn't consistent across all types).
Perhaps this would be better in a Triplestore but there are a lot of features of postgres that are not available in them.
I used to believe in these database antipatterns (e.g. avoid EAV, avoid attribute tables, avoid denormalization, avoid overnormalization, avoid inheritance, avoid wide tables, only use synthetic keys, only use natural keys, never use autoincrement, always use autoincrement), but these days I'm almost certain all of them are wrong, because they are independent of business domain needs. Business domains have antipatterns for their data, data in general decidedly does not. For example, storing physical addresses is subject to a number of antipatterns, including stuff like "natural keys are not a good idea for this".
GP cited EAV as an antipattern, yet it is one of the most reasonable approaches to user-defined fields (especially when suitably augmented), which are frequently a business requirement.
Not even just user-defined fields, any time where the fields need to be defined as data values rather than part of your concrete data model. Product attributes on a big multi-category e-commerce site being another example. You could just stick them in JSON, but it's still essentially EAV, just with a different storage mechanism.
Another advantage of EAV as a pattern, is it makes it straightforward to add metadata to values, like "where did this value come from?". For systems with audit trails, just having keys and values isn't enough.
EAV wouldn't be the first tool I'd jump to -- I've definitely seen it go wrong -- but it feels like it's going too far to call it an anti-pattern.
I think there was a bit of a confusion on the implied use case of our project. As another commenter pointed out, when the data is user supplied and is not always uniform in nature, creating migrations for each adjustment to the custom fields they can modify becomes virtually impossible. I ended up settling for JSON fields because I don't have to migrate the data or the database schema; I can just do a best-guess conversion for the data when it hits the UI, and supply some validation warnings if it really doesn't make sense (e.g. a non-date string somehow makes it into a "date" field).
>>In projects I've been involved with, storing JSON in the database has often turned out to be a mistake.
I mostly agree. The only JSON I store in the database is UI settings, because it's easy enough to parse it with JavaScript (and fall back in default if the JSON doesn't contain that prop). It also decouples your database schema from the UI, the latter of which can evolve much more rapidly.
This is the best comment I've read on HN in a year, thank you. Some great stuff in here I didn't know existed, will save me a ton of time and frustration in the future.
We've been using it pretty heavily and the ability to do deep queries like Person.objects.filter(data__family__spouse__name__istartswith="oli") is great for one-offs but the performance stinks.
As it well should —this isn't a complaint— just a reminder that keeping denormalised, first party data that can be well indexed is invaluable when you're talking about huge datasets.
It'd also be nice to layer in JSON Schema directly into the modelling but there is a draft-7 compliant Python project that you can use for this in the model's clean() method.
Postgres allows you to create an index on keys within a JSONB column[0]. It may be the case that Django doesn't offer support for this but I would expect you can probably get around that with raw SQL.
Yeah I've not had too much luck with GINIndex. Firstly I can't rule out user error but they just didn't work well. Maybe it was the volume (and the index generation wasn't keeping up with the ingress of data —"only" around 2w/s— or it's the 50M rows...
This was WORM data so denormalising on write didn't need update triggers, and was the difference between 100ms and 10s queries. That was still too slow for us so we started caching expected data outside postgres.
My intuition says this is one of those tradeoffs that is fast and easy in the short run, but over time becomes technical debt. I haven't used these over the long term, though.
The bonus with this is that at the start you are probably only interested in a few of the fields in the webhook, but as your application develops, you may need to extract more. If you've stored the JSON of previous webhooks, you have an easy migration path.
For example, if there are 1000 possible attributes, only 5% are populated for any given row. If you have 1000 columns you are going to have a ridiculously wide and mostly empty table.
In Postgres, NULL values take one bit of storage as far as I understand. So a table has to be very, very wide before this becomes a benefit. Of course if you have attributes that you don't control but are e.g. user-specified, JSONB column are a good choice.
The traditional approach to efficiently modelling data like that with relational databases is to use the EAV[1] model, which works quite well in practice.
Product data on a webshop. If you have a wide selection of product categories you’ll have a huge number of attributes that aren’t relevant across your categories.
There’s a number of ways to deal with the issue, e.g. a model per category or an EAV database pattern (this is what platforms like Magento does). None are really ideal, but storing the product attributes as JSON works pretty well. Even more so when your database supports querying the JSON blob.
Storing arbitrarily complex Boolean queries (think Elasticsearch's query syntqx). While this could be done in SQL tables, there are no gains with referential intregity and while there would be some in consistency, I don't think they're worth it.
Relational schema are flexible and allow for an optimizer to figure out how best to fetch your data based on the data are querying and filtering for and stored statistics about your data set.
Document oriented storage is great when you don't need the optimizer because you already know how the data is written and read. This means you can bundle it up into small, single fetch documents. No statistics or optimizer necessary. This is great if you understand your use cases really well, and they never change (good luck with that) or you have a large distributed data set that would be tough on an analyzer.
Typical use case:
There are a number of required fields that your app won't work without. Those should be columns, probably non-nullable columns.
Then there're a bunch of optional fields. Traditionally, a number of nullable cols would be created. But they're ultimately messy (need to null check every accesses) and unneeded since you can replace them with a single json item. Keys are always optional, so you always need to check for presence or absence, and modern dbs support json natively in many clauses.
All json keys are optional, so you ALWAYS need to check for their presence or absence. Mess exists when some key is nullable and some isn't. Checking for nullability on a key that can never be null isn't extremely elegant.
I'm surprised the discussion has derailed into whether or not to use JSON fields... Certainly you don't have to, but they've been in Postgres contrib for a long time.
Async views have dropped, and that is genuinely exciting. It's taken a lot of work, and there's still a ways to go before Django is truly async native, but this is one of the big steps, and being able to do some concurrent IO on a few routes could be a huge saving for some projects that call out.
Otherwise a lot of stuff has to be done in event queues, to avoid blocking the main thread, and sometimes that means a bad UX when users take actions and aren't offered the guarantee that they are complete - in times where that might be the best option, were you not risking blocking thread.
As an FYI: Using a gevent monkey patch has been a way to get async Django for years. Overhead is inefficient in CPU cycles and you need to stay away from doing CPU bound things like Numpy manipulations, but for an app server that’s bound by external API call latency, it practically gives infinite concurrency compared to a thread-per-request model. And no need to worry about event queues. You can feel free to synchronously call requests.get with a 10 sec timeout and still serve prod traffic from a small handful of threads. And most days we don’t even worry about async from a coding perspective. Anyone can email me with questions.
I'd add that crucially, DB ORM operations just work with gevent. It will be a while before async database operations are supported natively by Django. For me that is a complete blocker.
My impression was that django+gevent requires some care for the db part. The psycopg2 docs state, for instance
"Psycopg connections are not green thread safe and can’t be used concurrently by different green threads."
[1,2].
In addition, gevent cannot monkey patch psycopg2 code because it is C and not python. This is handled by calling psycopg2.extensions.set_wait_callback() [1,3,4]
I just now realize that you're probably referring to making the ORM itself async capable which is on the roadmap https://code.djangoproject.com/wiki/AsyncProject in which case I totally agree.
In practice this is just a one-time setup. We use https://github.com/jneight/django-db-geventpool which uses psycogreen under the hood. It isn't perfect (we're currently tracking down some rare cases where connections aren't properly returning to the pool, though we believe this to be error on our side) but it's more than sufficient.
While making the ORM async capable would be great, I'm not sure it will ever make sense to migrate towards sprinkling "async" explicitly across our codebase. We'll see, of course.
Last time I tried deploying Django (with Channels) on ASGI around 2017 or 2018, it was a total clusterfuck and fell over with about 20 websocket connections, and we had to scramble to revert that deployment. That’s not ASGI’s fault though, as there are several reasonably performant frameworks using it.
Have a production Django site running with Gunicorn / Uvicorn worker and we haven't hit any snags yet. Mind you we're only beginning to use it for anything in particular. The test was to check performance was acceptable, and stable - which has been our experience so far.
As someone that has an active app still written in Django 1.6. The larger my project and more complicate the more I wanted to ditch the model/view/template separation. And attach methods to the model so that it can be called from everywhere, returning html code in a string directly.
Other times, I wished Django had something analogous to a component, where everything is just encapsulated in a single file. I don't want to separate javascript/html/css view/template. I want a single file I need to update and an easy way to call to render.
The template system is also difficult to use, if you get complicated scenarios.
I needed to show event details in a modal if that was the user preference. But the page could also be embedded.
This lead to me having to render differently depending on the device, whether it was embedded and if it was a popup or not. This lead to an explosion of possibilities, a total 2x2x2 = 8 different views for the same data in the same template.
The most practical way was with if / then statements but that still lead to me repeating html code. And being difficult to reason about and test.
I also got into situation where the template wasn't parsed and updated. Probably because of too many includes or inheritance in the template. For example, I wanted to write a custom css file that would use google fonts that the user selected in the UI. The only way I found to work was to bypass the template and write a function in the view that would render_to_string the css file.
Honestly, for complex ui like this, I prefer to write it as a separate frontend app written in react/typescript. Django template system is great for normal websites (e.g. news/blog/ecommerce websites), but for complex application ui you'll start encountering pains like you mentioned. Implementing complex ui with a lot of interactivity is just easier in react. The drawback is now you have to implement an api layer to bridge django and react, which actually not that bad using django rest framework or even vanilla django views.
I've been using 3.1 alpha/beta for a while and it's got some pretty nice improvements. "JSONField for all databases" is a huge deal IMO, as it simplifies a lot of fairly common use cases.
I've been running uvicorn with gunicorn in production, and it's been solid. Django 3.0 enabled this with ASGI support. It'll still be a while before django is fully async capable, but these are steps in the right direction.
Pardon the rant, but I feel that the advantages don't outweigh the fact that with each release some of my stuff gets broken and I need to adjust. Django puts DeprecationWarnings basically everywhere and it's a hell to maintain projects that had been alive for a few years. God forbid you do anything with the interfaces they expose. The problem is only getting worse when you consider your dependencies, which in many cases don't keep up with the release cycle of Django. It's a mess.
My experience has been good, with a Django app maintained for over 8 years now. We've only updated Django when a new LTS version is out. That means once every two years there's an update that may break something, but I don't remember it ever being more than a few hours of work.
We do take some care to (mostly) only use documented interfaces, so maybe that helps?
I'm sorry, but this is just programming. You have two choices: 1. Never upgrade and keep everything stable but go through hell when you inevitably must update something or 2: keep up with the latest and go through some minor pain with every release.
It's a lot like taking care of a house or car.
The world does not stand still, much less programming frameworks.
Hmmm, with well chosen, thought through abstractions, which only presume the minimum they need to presume about their usage, the frequent changes you describe are not a given.
The question is, what it is, what frequently needs to change and why no appropriate abstraction has been found to reach stability.
We're talking about a huge web framework here. Your comment might apply to small isolated pieces of code. But Django is built on top of thousands if not millions of abstractions. It's absolutely unreasonable to think that it won't change.
I tend to keep one CI pipeline with all dependencies unpinned (I made pip-chill for that reason) and Django set to the latest version to get early warnings of the impending doom.
That said, it takes a while for the Django folk to end support for a release so, at least, there is not much of a hurry to update a running app.
This seems like exactly the way that deprecations should be handled. Are you suggesting they should maintain backwards compatibility for much longer? They already have a pretty long deprecation policy.
Can you elaborate on some of the things you've been frustrated with?
I find Django has a pretty solid deprecation policy, with the goal being that if you have no deprecation warnings then upgrading is simple. Third party dependencies do get in the way of that sometimes though.
I'm only a beginner with Django, but I tend to agree to a point - reason being is that I've found old code that I want to use (github, etc.), but it targets Django 1.x (later versions, say 1.10), and will use a load of stuff which is now removed. The issue I've had is that the documentation seems to be difficult to find these removed elements - such as last week, looking for the {% ssi (file) %} tag, which it took me the best part of an hour searching to find out which version it came from - and indeed that it was a standard tag!
A lot of the imports seem to move around (which is easy to fix, once you know where they now are), and some things have just vanished.
I know documentation is a nightmare to produce, but there must be a way to automatically produce it for either import relocations or deprecations so you can find things like this easily?
For me, upgrading Django for a series-A sized startup (e.g. 25KLOC) takes maybe 2 - 4 hours on average, assuming there is good integration test coverage. What parts of the upgrade process are you finding take a long time?
I feel like the last Django release that was at all complicated to upgrade to was maybe 1.9, in terms of getting the test cases to run properly in parallel if they hadn't been properly isolated. And even that was more of a Python3-like situation, where it really only exposed things that people had done incorrectly previously.
Compared to how fast things move in frontend world, Django feel like ultra stable to me. Upgrading an old django project to the latest version typically doesn't affect any fundamental things. Maybe some classes has been moved into contrib or spin off as non-core packages, or some methods has been removed and replaced with another methods, but never fundamental changes that requires you to rewrite majority of your code.
How does HN feel about the recent craze to make web python asynchronous?
To me, the performance gains are dubious in many cases and the complexity overhead of handling cooperative multitasking just seems like a step back for a language like python.
Long term Python user: if I want to do something asynchronously I reach for Go or Elixir (unless it's just way more practical to do it right there in Python). Adding function colors [1] might have been a practical decision, but was IMHO a mistake.
Why do I have to decide if my function is synchronous or not when I write it? I don't want to do that, I want to write only one function that can be called synchronously or asynchronously. In Go or Elixir, the decision is made by the caller, not by the API designer.
Which leads me to a parallel universe: Go-like asynchronicity should have been introduced with Python 3, when backward compatibility was explicitely broken. The gain of switching to Python 3 would then also have been a much easier sell than "just" fixing strings.
Of course, there are probably a thousand things that I'm overlooking, but this is my feeling...
> Why do I have to decide if my function is synchronous or not when I write it?
I feel that sync and async functions are fundamentally different. In python coroutine is really just a function you can pause and while it might seem like it's the same thing as a normal function it's actually very different algorithmic-ally speaking as you can include a lot of low level optimizations which is really what async code is all about — getting rid of that IO waiting.
I love async python but after working with it for better half of a year now it is often a bit tiring as you've pointed out. It feels a bit like a patch to the language rather than a feature even with the newest 3.9 version.
Is that because the Python interpreter has to explicitly handle coroutines differently from normal functions?
One difference for Elixir/Erlang (which GP mentioned) is that the BEAM VM can interrupt any function by default. (There's a few exceptions when you deal with native code and other things)
99% of Django apps are CRUD apps with zero need for this. It's easy to get sucked into new-hammer-ism where you have a new hammer and start seeing nails where there aren't any.
The 1% where this is needed does exist, but I suspect that there are far more people using the new async features than actually have need for them. And if you don't need them, you're introducing a lot of complexity, without mature tooling around it to reduce that complexity.
Probably 5 years from now there will be mature tooling around this stuff that lowers the complexity so that it is a good tradeoff for average websites. But for now, I don't need to be an early adopter.
Well, if you have external API calls in your Django app and you are running sync (which I would absolutely advice, with running async it is really easy to get an unpredictable performance which is sometimes hard to track down) having the ability to run some views async is really crucial.
Otherwise your application might me humming along smoothly at some point and coming to a sudden complete standstill or performance plummets when a random external API endpoint starts to time out. Yes I have been bitten by this :-)
To fix this while running sync I have dedicated separate application processes for the views that do external calls, but this makes the routing complex. Alternatively you can juggle timeouts on the external API calls but this is hard to get right and you need to constantly keep track if calls are not timed out just because the external endpoint is a bit slower at some point.
So I think this solves a very real-world challenge.
> Continuously failing external requests should not make each one of your responses slow.
It is not really a matter of the responses becoming slow, the problem is that if you are running sync with i.e 6 application server processes and you have just 6 hits on an endpoint in your app that is hung up on an external API call your application stops processing requests altogether.
Exactly this. I see the whole django async stuff being far more relevant for applications with lots of traffic or high request rates where you are already running on beefy infrastructure with a ton of workers and any small improvement in performance translates into huge real world cost savings. Your standard blog, no so much.
Isn't that why gunicorn(+gevent) was implemented and does the switching behind the scenes w/o waiting that api call to finish? Is there a good reason I should manually "await" network calls from now on?
Yes; gevent does also fix this problem. But it also gives you a lot of new problems when running all requests async. In my experience mostly with views that (in some specific calls, i.e for a specific customer) keep the cpu tied up, for example serializing a lot of data. Random other requests will be stuck waiting and seem slow while it is a lot more difficult to find out which view is the actual problem.
I have deployed applications both under gevent and sync workers in gunicorn and would personally never use gevent again, especially in bigger projects. It makes the behavior of the application unpredictable.
It does have the same problem. Standard library CPU-heavy functions are not generally async-friendly. You'll be stuck blocking for that 500kb JSON file to serialize.
I've been looking at whether this would be appropriate for something like server-side mixpanel event tracking. Or for sending transactional emails or text notifications using a 3rd party service like mailgun or twilio.
From what I can tell it is not intended for that purpose, and outright will not work.
90% of the time I don't want it. Database, cache, etc, not really that bothered. Web requests take 100-300ms to complete, tying up a worker for 300ms isn't much of a problem.
10% of the time I'm calling an API that takes 3s and tying up a worker for 3s _might_ be a problem. Being able to not do that would be really handy sometimes.
Not web servers, but I also do a lot of web scraping and Python is definitely the best tool I've used for that job (lxml is fast with great XPath support, Python is very expressive for data manipulation), using async for that could dramatically improve our throughput as it's essentially network bound, and we don't really care about latency that's in the 10s of seconds.
I'd love the site to be faster, but it's very hard to do this. For an API called while serving a user request, 100ms is slow, but for the frontend that a user hits directly, it's fairly typical.
As a point of comparison, Amazon's time to first byte for me is 270ms, with a 15ms round-trip time to their servers, so they're looking at about 255ms to serve a page.
To get significantly faster than this, a site must be engineered for speed from the ground up, and the productivity hit would be huge. We've got ~250k lines of Python, which would probably translate to ~750k lines of Go (which is fast, but not that fast), or probably >1m lines of C++. Engineers don't tend to produce that much more or less in terms of line count, so this would likely take ~4-6x the time to create (very rough ballpark). Plus, with a codebase so much larger there's a greater need for tooling support, maintenance becomes harder, more engineers are needed, etc.
When speed is the winning factor, like it sometimes is for a SaaS API that does something important in a hot code path (e.g. Algolia) then this is all worth it. When you're a consumer product where reacting to consumer demand is the most important thing, the speed difference really isn't worth it.
So two examples off the top of my head where it’s the request latency and not python at fault
1. In an incident database, we allow full text search with filtering. Depending on the complexity of the query, and contents of the database this can take 10ms or 10,000ms. This isn’t something easily changed. It’s Lucerne's fault.
2. Querying the physical status of a remote site has variable latency because the sensors are on Wifi and it’s flakey. We can’t easily move the sensors, or make wifi coverage in some warehouse perfect.
Right now, we circuit break and route potentially slow requests to their own cluster via the router, but it’s a poor solution.
That depends on what that request is doing. It could be fetching a single record from the database and serializing it, or it could be running a complex analysis.
Five times later, there are some new frameworks, but much of the ecosystem is still sync-only. This is actually one of the things that is pushing me towards Go lately. Python just doesn't seem to mature fast enough and tools heavily disagree on conventions.
If I look at the database queries on the vast majority of pages in a typical Django project, I see a big list of operations being executed sequentially that could actually be done in parallel.
Additionally, (this is my pet use case) if you implement a GraphQL server on top of Django (using one of the many libraries), you tend to get subpar performance because GraphQL lends itself really well to parallelised data fetching, which is hard to take advantage of at the moment.
Adding new SQL support to the Django ORM is tricky. There isn’t a nice low level abstraction for generating and composing SQL. It’s mostly a bunch of lists containing different kinds of data (strings, Col, Expression) and they don’t compose well.
On top of that, you’d need to come up with a decent frontend syntax that aligned with the existing methods.
I think Django made a mistake when first defining the language of aggregates by overloading the values() method and not using group(). To support rollup, values() would need to support it but only when used after an annotate. Not nice.
I often think about what it’d take to use alchemy for the sql building beneath the Django frontend. That would open up so many more possibilities and features.
Yea I started to bring up the existing ticket for grouping on the dev list and ask what people thought about it.
I also ended up writing a very tiny transformer function and using that directly because core only has a couple supported casts and I needed Postgres timestamp types so I could extract and rollup on the year / month / day. That gave me some insight in to some of the patterns in use in “lower level” Django differ from the expressiveness / composability of SQLAlchemy.
SQLAlchemy has a great architecture, but it comes at the expense of being tied to the use case of accessing a relational database.
Django models are more abstract and anemic in querying, but it means you can use the same API and write access layers for non SQL databases. At work, we have an in memory database, and Elastic Search all queried in an identical way to the Postgres models.
I work in a C# shop that has added Python to our development, because it runs a long side the poweshell scripts our operations guys build in the managed azure services.
I think Django is just a good package. It’s really productive and the ORM is better and easier to use than entity. The real winner for me is it’s stability though. In the time we’ve gone from classic ASP to web-forms to MVC to modern MVC to .net Core and now soon the new .Net version. Django has remained relatively stable and relatively easy to upgrade.
And make no mistake, we absolutely still operate web-forms applications that will likely never get rewritten in something else.
At the end of the day, tech stacks aren’t really that important though.
Curiously, while Django has not changed much, Asp.net has evolved. The new endpoint routing system is very flexible. The DI system is good and templating is excellent.
Additionally, it has amalgameted into a hybrid framework, where adding a high performance API to an existing MVC app has become trivial.
Not to mention, the excellent language and support for tech like gRPC. On the whole, Asp.net core looks poised to evolve and adapt to changing tech landscape.
I do agree that the stability of Django has made it extremely easy to get an MVP off the ground, especially for a seasoned developer.
I have used both and somehow, the strong typing in C# puts enough constraints on me to reason about my web app as a proper app.
In Django and flask, I would often settle into thinking everything in terms of pieces of data, moreso because of the dynamic nature of Python.
If the team/project is huge typing has benefits. If not, python type hints are generally good enough. In the end using MS tooling means their needs come first and so that is to be avoided.
The Django Admin is Django's "killer app." If a significant portion of your application is a back-office CRUD admin interface, Django is perhaps the most productive way to build something like that without writing a bunch of forms or using something even more out-of-the-box {like a Drupal). If you become proficient with Django you can build custom CRUD admins very quickly.
For other use cases, there are better frameworks out there.
PHP on its own was more of a templating system plus arbitrary code when I used it. Django and ASP.NET have ORMs and templating systems.
Lots of people love ORMs although I’ve found complex queries slow with a hefty object model system, to the point where I’d rather write parameterised sql queries than work with a ORM optimization strategy.
The templating system is nice although these days I mostly use javascript talking to json endpoints, with nearly zero need for a templating system.
Honestly when your need is: “javascript talks to endpoint and endpoint talks to database” I don’t see a greater need than python, nodejs, golang or whatever language you prefer plus a couple of libraries. Most server frameworks add more stuff you probably don’t need, unless you can’t work without an integrated ORM.
PHP has Laravel which is very competitive with Django (and Rails), and IMO quite a bit ahead of ASP.NET in terms of what it provides (although ASP.NET is far ahead of both in terms of performance).
We actually had to make that choice 10 years ago. .NET was out because we didn’t want to deal with Windows server, but that’s no longer an issue. Now I think I would skip ASP.NET because everything is confusing as all hell. You don’t know which .NET supports which feature, or even what you’re suppose to be running. It’s the fastest of the three and Visual Studio is still an awesome IDE.
We dropped PHP due to the lack of any really good frameworks, but now we have Laravel. PHP is still a solid choice and it’s fast.
In the end we went with Django because we liked Python and Django is really well documented and easy to learn.
If you use a CMS to build something of any complexity, you will almost inevitably wind up writing some custom code for it. In that case, I think the most important thing, much more important than what language the developers chose, is how extensible and well-documented it actually is.
Then ASP.NET is probably the superior choice.
C# is a better language by far and the .NET runtime is a better runtime by far.
It only ever makes sense to use Python for data-science one-offs and munging if you have competency in Java/C#/etc. It's not a language well suited to application development.
I work mostly in Python and I like it. However, C# is a great, and for some reason underrated, language that is certainly better along a number of dimension.
I didn't downvote, but IMO it's only half true: C# and the runtime really are excellent. But the C# library ecosystem is much smaller than the Java, JavaScript, PHP, and Python ecosystems (and even Go/Rust for a lot of things), and there are even quite a few commercial (paid for) libraries! This may not matter for a particular project, but for me it's significant enough to not make C# a universal recommendation.
Hopefully it changes now that C# is open source, but for now it's quite a big downside.
Try rewriting your comment like an engineering discussion than an opinion piece: share that experience with specific details about things which worked better or worse, using enough detail for anyone else to be able to decide whether your environment is enough like their own to be applicable.
Since this is a Django thread, that could be covering things about, say, why you prefer .Net forms or trade offs from various ORMs.
I think C# is a quite respectable language so you could cover, say, why its typing is more productive than something like mypy or how package management compares.
C# is a powerful language that is highly expressive, has a great type system, excellent compiler and great IDE, excellent async support, fast HTTP stack in standard library, blessed application development framework and object relational mapper, etc. Basically everything you could ask for in an ecosystem for developing a long-lived web application.
Python is a scripting language which is great for one-offs and has great data-science libraries but lacks static typing, poor IDE experience, slow runtime performance, poor portability because of it's reliance on FFI, poor long-term application maintenance track record, no blessed frameworks, competing and incompatible async solutions, etc.
Tbh I don't think there is much contest. Yes you can write a bunch of Python quickly that will do something but if you already know C# which is much better suited for the task it's clearly a better choice.
Maybe but you should always ask for what, that's the main-point. Sometimes Java or even Javascript/Node is the best stack. Many many point's play into a decision for a web-stack, maybe he wants that "quickly", maybe he works in a C# shop and so on....
Something I wish that Django did was user defined functions in the template. It has for loops, which is good but it forces you to write the html in top to bottom procedural manner.
It would be far better to be able to define a function that you can call for bits of html code that might repeat in the same template.
Since I stopped using Django at 1.6 does the new version let you define functions in the templates?
That isn't as nice to having a function inside the same file you can call over, IMHO.
Also lets say you define a block for rendering a button with a different color. Then use 20 different buttons on one page.
Does that mean that the block code would need to be loaded from a file 20 times and parsed each time. Seems like huge performance hit to do it that way.
Is there a technical reason why you can't have a function in the same file.
Django ORM mainly hides the SQL from the developer (which is kind of the idea of any ORM). But if a developer does not understand the underlying SQL concepts, they will soon write performance wise horrific code. But, if you remove the ORM from that equation, I don't see how that same developer doesn't make the same mistakes. So in the end, I don't think that Django ORM (in its core) can help much in this area.
There are of course tools that can help with that (django-debugger and others).
This is a hot-take from Aaron Patterson, Rails and Ruby core team dev, in his keynote this year where he addressed a very similar idea on a perhaps related query generation topic...
To give you the tl;dr (he goes through profiling and a great deal of data to help show what a core dev needs to do in order to help us solve this one specific case, and...)
Aaron comes to the conclusion that there is a "performance pro-tip" which many Rails devs have learned, similar to what y'all are discussing in this thread for Django, and in this one specific case he outlines in some detail, Aaron considers that there is a bug where the programmer needs to know this "pro-tip."
As there was really no reasonable way for the Rails engine to interpret this instruction from the programmer as to mean "do it the slow way" when there's a better way to do the same thing with no drawbacks, and it would have been possible for the engine to safely do the smarter thing (it was IIRC to precompile some better-shaped SQL, bringing back the same result set in fewer queries, (making up the most runtime performance by spending less time in the SQL compiler overall, in the worst performing pathological case, all this is shown with data)...) and that having this behavior here is actually much better for a novice programmer's behalf, in order to make that better performance happen without requiring a developer to manually build such hints into the application code, at abstraction layers where they really shouldn't have to be thinking about those things anyway.
While a good point, I would say the problem is caused by ORMs existing. Somehow somewhere we decided that a person who thinks SQL is too difficult should be using a database.
An ORM saves experienced SQL users from having to write boilerplate SQL and hack on their own garbage ORM, which is what any large project ends up doing, attempting to compose queries and filters in vain.
I have never ever heard of ORMs as an argument to avoid learning SQL, and AFAIK no author of well-known ORMs holds that opinion.
Yeah, my SQL is admittedly rusty due to not hand-writing many queries these days, but my motivation for using an ORM isn't "I can't write SQL", it's that I don't want to be engaging in string building when there are better abstractions available for supporting this kind of operational composition.
The problem with ORMs, as I see it, is that their abstraction is typically too high level and rarely offers you an intermediate layer to let you work around the leaks.
So you often have:
ORM -> SQL
Which is implemented as:
ORM -> private query-builder API -> SQL
When what I want is:
ORM -> public query-builder API -> public SQL-like abstraction -> SQL
Anecdata: in every single company I worked, tiny startups and giant corporations alike, “not everyone knows SQL well” was exactly the argument used to justify the ORM. Every single time, the people who do not know SQL, unsurprisingly, proceeded to write pathological N+1 queries.
I don’t mind much - I get to look like a hero to my manager making their queries orders of magnitude faster. But if someone can’t be bothered to write SQL, they for sure will not bother looking up ways to hint the ORM.
Literally every pro-ORM dev that I've ever worked with was so incredibly weak with basic SQL fundamentals that I must conclude that they preferred ORMs simply due to a reluctance to learn SQL.
There is a strong argument for conceptual compression, the idea that every component has an optimum abstraction layer to understand it at, and with that, less-optimum layers for understanding. An ORM that does its job well allows you to take the complexity and pack it away for another day. We can unpack it later when we need to understand those details, but most of the time we'd rather be focused on the details of a domain-specific problem that we're trying to solve, with intricacies that are specific to the domain, that ideally need not become compounded against problems introduced separately by (for example) a persistence layer such as a database, as that makes it harder for the domain experts by introducing another layer of complexity that they must grok in order to build or spec an appropriate solution.
I would be very hesitant to turn an ORM into a "smart" SQL generator. Depending on the type/distribution of data, there are sometimes very different paths to optimizing a query. The ORM as a "stupid" SQL generator (straightforward mapper) is a great way to allow for the flexibility to control how a query is generated.
The DB engine might be the place for those sort of improvements.
In this case the DB engine was not the place for those improvements, as the majority of the extra time was being spent in the "Compile" method which merely composes SQL. If you're worried about turning the ORM into a smart sql generator, having second thoughts about that, in Rails and in ActiveRecord, I'd have to say that ship has already sailed. The ORM is very smart and usually generates pretty good SQL for you. It's never a bad idea to review what it generated and double check based on your own DB knowledge, which might always be better informed than a robot's, but...
Early in the profiling part of the talk Aaron shows how he nearly got hoodwinked into thinking that differences in the query itself were what was causing the slowdown, and while in the end two different queries are still generated by the two different versions of ORM code, the bulk of the performance capture is reclaimed without any impact from the query difference, at least a ~30% performance discrepancy comes solely from the object side of the equation, in Ruby.
(Is it a string or integer parameter? That might make a difference in the query performance... is it one bound array parameter, or one parameter binding per array element? That could only make a difference on the Ruby side, as bound SQL params are always mapped into a query as individual values, at least in this example. These factors are all in play.)
It's a long talk but it's really interesting, (I set a time index in the link to get you past the most frivolous and off-topic parts, which you usually find in a tenderlove talk... that part is not for everyone) I think this talk probably has something for everyone, even if you're not a Rails dev.
I'm closer to a hobbyist than a professional dev, but the async views seem like a big functionality. Having done some Django apps, getting a synced up view for some changing variable was always a bit painful.
Perhaps django-reactor would be of interest to you? [1] It's targeting LiveView-like functionality for Django. I haven't used it (so I can't vouch for the claim).
async is cool for I/O bound operations. You don't have to wait the request/response to finish in order to start processing another request.
Talking with a db, or doing http requests are IO operations, so instead of blocking the process, django can now start processing another one. When the IO operation is done, it continues where it stopped.
In python world you have few Django processes spawned by WSGI/ASGI app container, and those processes spawn python threads to handle requests. Because python uses GIL, no more than single thread is executed at single time per process. At specific times GIL stops running bytecode for one thread and moves to next one (eg when you do IO). This means other threads keep running, but in theory you can run out of threads.
In async you use tasks instead of threads for http requests, so single process can handle multiple requests at same time, only spawning threads at limited number of situations.
Yes, you can add more processes (or threads), but more processes (or threads) blocked by IO equals more context switches which equals drop in performance. Not a big deal if your app is distributed and you scale horizontally, but most apps don't need scaling that way and async fits the bill without adding unneeded resources.
If your workload is IO bound (and most of web development is) async improves performance in any case.
it would be useful if your code has to execute async code and await it's result from within a view. For example: create 10 tasks and return await asyncio.gather(*tasks) as json.
I don't know much about Python's async tools, but why couldn't that be done in a non-async view?
In pseudo code I would think it would look like this:
def my_view(request):
name = async_get_name() // returns a promise
city = async_get_city() // returns a promise
waitUntilResolved(name,city)
return HttpResponse('Hello '+name+' from '+city)
It sounds like your waitUntilResolved() function is effectively asyncio.gather(), which due to the nature of Python’s async implementation is not something that could be used in a Django view prior to this release.
To run async code, you use 'await' in python. That right there tells you that you have to wait for it to finish. You want to receive the result. The view will wait for a result before returning the http response.
Celery and all task queues are fire-and-forget. You tell it to do something for which you don't want the result, so you can continue executing your code right away. The task queue can even take an hour to process, for example. Or you can schedule it to run later.