If I make the change in code, I can roll it out, test it in prod, and then slowly migrate traffic over in very standard and well defined ways.
If I deploy to the database and something goes wrong, I need to really trust my rollback scripts. If the rollback scripts go haywire, you're in a tough spot. If that happens to code, you can literally just move all the traffic to the same thing that was working before, you don't quite have that luxury with the database.
You can have a bunch of servers, you really only can have one database. This means the database should be very sensitive to changes because it's a single point of failure. I don't like modifying or changing single points of failure without very good and well tested reasons.
Could you version your procs and have the new version of the code call the new procs? Sure, but now you have to manage deployment of both a service, and the database, and have to handle rollover and/or A/B for both. If my logic is in there service, I only have to worry about rolling back the service.
Database logic saves you a little bit of pain in development for a ton of resilience and maintenance costs, and it's not worth it in the long run IMO. Maybe it's because the tooling isn't that mature, but until I can A/B test an atomic database, this doesn't work for a lot of applications.
Not sure I follow in the context of the post. The post is about a database that can be wiped and effectively recreated from scratch. There wouldn't really be a need for rollback scripts when you can just rerun the entire workflow from scratch. Even easier if you have a hot/warm/cold setup.
In this sort of setup, changing a db function isn't unlike making a change to db-interfacing code in that it's not changing the underlying data, just the interface to it.
I've built similar systems that instead of making changes to db functions, changes are made to materialized views. The underlying data doesn't change, but its presentation does. No huge need for rollbacks. It works really well!
No that's a fair critique, if your application is dealing with a large amount of pretty darn static data and "refreshing" the database is as simple as switching to a backup, you could make the argument it's a lot more attractive.
I just think that most applications need stuff to be ACID and deal with a lot of writes (or may want to deal with some of those things later) and make this a bad idea.
If they are made to views, you have the problem of how do you roll out/back those changes. You could obviously make that configuration on the server side, but then you're still needing to roll out both the service's configuration changes and the database, when if you put all the logic in the service, you only need to handle that for the service.
I don't mind abstracting the views into another mental layer, but it's a harder sell to convince me that reduces a ton of risk because doing operations on the database (even if you have a perfect permission setup) is risky, because you can, by definition, impact that machine that's serving live customers.
It works well until you've got an issue, and then you're completely fucked.
I think the sense of the comment was "how can I deploy or test anything once it goes in production". You know, were starting again from scratch is not an option ever.
Using the database as a service layer is dumb for non-trivial scenarios were you need garantees that your service keeps running, or when you want to test new code with the production data.
It's interesting, and probably useful to have a set of functions with the database, but there is real problems once your database stops being a dumb store.
How would you be completely fucked when the rollback is to just rerun the workflow? Worst case scenario here is downtime, which is a far stretch from "completely fucked". It very much sounds like the databases that you work with and the databases that the article writer works with are used for two very, very different purposes. The author made no mentions about SLO-type requirements, so I think it's fair to say that SLO isn't a huge priority. But that can be mitigated, like I said, with a hot/warm/cold standby.
Do you think it's a bad idea to use something like PostGIS because it uses functions that live on the database?
For years I've wanted to try the idea in the article, particularly for a data intensive but not high scale project. It's never been worth the hassle though.
Still, thinking about it has led me to see how we are not really leveraging the capabilities of feature rich RDMSes like Portgres. A judicious use of user defined functions can help, particularly for polyglot companies/microservices where the database can ensure certain routines are standardized across all apps.
Presumably you can revert to a previous tag/label in source control and reload the previous procs/funcs to roll back versions.
The apps I have been working on are monoliths which run at about 2% CPU on the server, so there is no A/B comparison runs in production (e.g. on January 1, some law goes into effect and you WILL implement it). I would happily see ORM usage curb stomped.
> If I make the change in code, I can roll it out, test it in prod, and then slowly migrate traffic over in very standard and well defined ways.
Advanced methods (like blue/green deploys) are invented by people with high-traffic requirements and slowly filter down to everyone else until they are normal practice. 15 years ago server configurations were ordinarily painstakingly handcrafted and somewhat unique.
We can reimagine those techniques around a database without too much difficulty.
> You can have a bunch of servers, you really only can have one database. This means the database should be very sensitive to changes because it's a single point of failure. I don't like modifying or changing single points of failure without very good and well tested reasons.
This is a strong argument; I agree that very good reasons are required. I have seen scenarios where the reasons are IMO quite good, but agree this isn't a widely-applicable technique.
> Could you version your procs and have the new version of the code call the new procs? Sure, but now you have to manage deployment of both a service, and the database, and have to handle rollover and/or A/B for both. If my logic is in there service, I only have to worry about rolling back the service.
It has become common to have (and version/rollback/etc) each of web-frontend, web-backend code, web-backend engine, database schema, database engine.
Cutting that back to web-frontend, database http plugin, database schema & database engine is IMO a plausible win some of the time.
I've been noodling around for awhile with the concept of a precompiler for stored procedures which alters their names to include a hash of their code.
This would give you a safe way to seat multiple versions alongside one another.
* The frontend build process would pull in the hashes so it knew which version to call
* A cron-job could delete the old ones once they're (manually marked?) no longer in use - this would be the most hairy bit since it's got to be 100% sure it's not in use anymore.
Rolling back then constitutes putting the old version of the frontend live so it'll refer to the previous stored procedures.
I’m struggling with a lot of things in this. Why couldn’t you test your stored procedures? Assuming you use an ORM, do you test it’s logic? Because if you don’t, then what’s the difference?
Why wouldn’t you have more than one database? What if you’re global?
Why would a database ever be a single point of failure?
Why wouldn’t you be capable of rolling it back, or test in anyway that’s different from what you do with your other code? Is it perhaps because you don’t know the tools to do so?
Why would database logic save you pain in development? Typically utilising stores procedures would make development less straightforward because you split your logic up, but you do it because database servers are a lot better and a lot more efficient at giving you the data you need.
Don’t get me wrong. I only use stored procedures, or even any SQL tooling, when they are the only way to get efficient or persistent data, but it’s not because or any of the reasons you list. It’s because 95% or the developers we hire don’t really know how to use databases.
A good example is an issue one of juniors had recently. We have a lot of data organised in what is essentially a tree structure where the nodes only know their parents. He needed to pick any node and gets its children + related rows from a range of tables in three databases. The task itself was simple enough, except the tree has millions and millions of nodes and even more millions of related rows, and he needed the data fast. He had hacked around, trying different stuff like loading it gradually, caching it between uses and so on, but he couldn’t get it fast enough. We made a recursive stored procedure that was capable of giving him the data almost instantly, because thats what database servers do. In a couple of years, when this needs tweaking, however, it’s very likely that someone like me will be needed to look at it, because recursive SQL isn’t something a lot of our developers do, so in any other case, doing it in C# would’ve been better.
> Assuming you use an ORM, do you test it’s logic? Because if you don’t, then what’s the difference?
You don't test the ORM, you test your models. If you're writing db procesures you'd test the procedures, not the db's procedure engine.
But yeah, you should be doing your testing in a test database already. The only problem I see is that you might have a tighter coupling between db schema version and software version with procedures (although then again, you could avoid that if you're careful and make it less coupled)
Recursion in SQL is one of the questions I always ask Data Engineers during interviews. The answer in MS T-SQL is to use a recursive CTE. It is not a common problem, but it comes up once in a while. Using a CTE is a better solution than calling an SP recursively because you can only nest 32 calls deep. In the MS Azure SQL Data Warehouse product you are simply fucked if you need to do recursion.
> SP recursively because you can only nest 32 calls deep
T-SQL only gives you 100 calls deep. Also, having CTEs calling other CTEs can create a very slow query plan, which takes unrolling each CTE into either a table variable or temp table to resolve.
I think it depends on the developers style of how they reason. I use recursion a lot when dealing with loose data, to clean it up. Recursion is one of my go to's for deduplicating list.
They way I've seen this done is you version the namespace of the stored procedures (and include the git commit). So at any given time you're definitely only calling the exact function you want.
There's a trend of versioned databases that could address these issues. I've seen at least Dolt [1] for relational:
"Dolt is a data format and a database. The Dolt database provides a command line interface and a MySQL compatible Server for reading and writing data."
... and TerminusDB [2] for graphs:
"TerminusDB is an open-source knowledge graph database that provides reliable, private & efficient revision control & collaboration."
I don't know if Dolt supports stored procedures, but in any case versioning a database would be an amazingly powerful and convenient thing, if it can be done without sacrificing too much DB performance.
Worth investigating and has the potential of changing the way we write apps.
There can be many databases. They can be running in parallel with the same SQL inputs. You can have provisions for this in your middle ware, but there is also stuff for this in the db replication. Replaying SQL logs on test instances. It is possible to run quite sophisticated testing in this sort of setup.
Relying on ORM doesn't make a difference, since you are still sending DDL commands to a live database.
Using stored procedures, views, and triggers makes it easier to add abstraction between your middle ware and the database structure.
But on the whole, I agree it's probably not recommendable to anyone feeling hesitant about it.
I fully agree and treat my databases the same way. However, there’s no reason a stored procedure couldn’t be treated the same way with a random weight determining which version of a stored procedure is run.
Put your functions in a schema e.g. v1, v2 etc.
You get same thing you have with code.
To test on prod data you take a snapshot and start an instance from a snapshot.
as a bonus your whole deployment is now just running a single sql script no fancy tools needed. You scale out by adding sync read replicas and segregating mutations to master and reads to replicas. You can autoscale by taking snapshot and spinning up a replica and spinning it down when no longer needed.
I have taken various runs at stored procedure versioning and agree with you. The database is the ultimate sacred resource. Change things at your peril and test like crazy.
I have worked on an application that heavily used sprocs for business logic. I would not wish it on anyone. It will seem fine (and actually it will seem better - faster and easier to manage) while your project is small.
In a few years when you need to make some major changes you will see why it was a bad idea, and it will be too late. Have fun modifying hundreds or thousands of sprocs when you need to make a large-scale change to the structure of your data, because SQL won't compose. Have fun modifying dozens of sprocs for each change in business logic, because SQL won't compose. I guarantee you will have mountains of duplicated code because SQL won't compose.
21 years ago, I joined a young team that wrote an ERP, with all business logic in PL/SQL. Customers are still amazed how fast these guys are going. 10 years ago I joined Oracle to work on APEX (created in 1999), which is a web app development and runtime environment, written in PL/SQL. We estimate that there are about 260K developers world-wide who are developing apps with APEX. All of their business logic is also on the server. Call me biased because I am working for the man, but my data points are completely different.
I worked with two other ERP class systems that did this, one using Oracle (with an Oracle Forms/Reports front-end), the other PostgreSQL.
I can confirm your experience. There's no doubt that stored procedure/function based business logic requires a certain discipline, knowledge set, and the ability to get a team in marching in the same direction. But if you can achieve the organizational discipline to make it work there are definite advantages, especially in the ERP space.
And that may well be differentiator... those of us working with more traditional business systems have trade-offs that you won't find in a startup. For example, a COTS ERP system is more likely to not really be a single "application", but a bunch of applications sharing data. This means a lot of application servers, integrations, etc., not all from the people that made the ERP, needing access to the ERP data. The easiest integration is often at the database, especially since you can have technology from different decades (and made according to the fashions of their time) needing to access that one common denominator. Since many ERP systems are built on traditional RDBMSs, and talking to those doesn't change much over long periods of time... having the logic be there to make these best of breed systems work sensibly with the ERP can be very helpful. All that said, in this context I'm an advocate of the approach.
Now, take a team that maybe has a high churn, sees the database as some dark and frighting mystical power that can only safely be approached with an ORM, or a team that thinks their "cowboy coding" is their core strength (if perhaps not that thought directly) and database based business logic certainly has many foot-guns. But, there are really few technologies that don't suffer without a good well-rounded, consistent approach.
It's posts like this that make HN great (again). Thank you.
I want to add that your talent pool will also influence where to create a hardened boundary: inside the database server (if you have lots of SQL people), or inside the application server (if you have lots of C#/Java/Python people).
Back in 2009 I helped to put together a small app in APEX, which then was quite new, in the two days, or day and a half, before the lead developer left for an overseas vacation. I had never before used APEX, she had been introduced to it an an Oracle tech day, but that was the extent of either's experience with it. The application is long gone, but got a lot of use over several months.
PL/SQL is an excellent tool for implementing business logic, and I really like APEX largely because I can call PL/SQL at any of several points.
I agree, Apex is awesome. Oracle Apex being written in stored procedures was actually a big inspiration for a low code tool project I did myself, which while not being written in stored procedures took alot of other ideas from Apex (I was an Oracle Apex developer for 4 years)
Oracle used to push the idea that there is no need for an application layer, just use the database. Then they stopped being silly. I used APEX as a user, while at Oracle. It was a decent reporting tool but was only really used because the CRM had terrible latency.
If you write your views, tvf and stored procedures to use common elements defined once and once only (this starts with once and once only tables), maintenance is easy peasy. If you let every developer create their own view for the same business logic, then you are creating an unmanageable creature.
This also applies to "composable" code written in the application layer outside the database.
The issue is when central ideas about how your application work changes. Suppose you’re a US bank and after a few decades you want to open a branch in Europe and suddenly every dollar value needs to be associated with a specific currency.
At it’s core that’s a very basic change, but the issue with this stuff is how your adapting to such changes without introducing massive technical debt. Databases that elegantly to your data are easy to work with. However, it’s always tempting to use the lazy solution, which is why systems relying on stored procedures tends to age poorly. In effect they tend to corrupt the purity of your schema.
I don't see how that has anything to do with stored procedures. For such large changes, your schema and application logic both need to be changed anyway. So if you're already changing the schema, why is it easier to change the application logic if it lives in a separate codebase?
If anything, it seems like it would go the other way around: With stored procedures, you're guaranteed that any application logic is at least consistent with the current database schema, whereas with separate services, you need to maintain that consistency yourself. Not that I'm advocating for stored procedures in general because they come with a mountain of other drawbacks, I just don't buy this particular argument.
> why is it easier to change the application logic if it lives in a separate codebase?
Because stored procedures don’t include the ability to do things like use inheritance to easily add extra layers of abstraction and let the compiler detect when you’re messing up. Adding a column is easy enough, but enforcing that it means something and every one of your prior calculations are now meaningless without taking it into account is difficult. Making such changes elegant and enforcing them to avoid future bugs is even harder.
Remember this example is at it’s core a minor difference. If you’re changing something like amounts being measured by volume instead of weight, then basic assumptions no longer hold making things much worse.
> Adding a column is easy enough, but enforcing that it means something and every one of your prior calculations are now meaningless without taking it into account is difficult.
But again, I don't see how this is any easier to solve in application code than it is at the DB level. I don't know of any programming language or framework that allows you to encode rules such as "this DB column must be used". And even if such a thing existed, it wouldn't prevent you from making mistakes in using that column.
On the other hand, using stored procedures at least prevents errors in the other direction: referencing columns in a way that no longer makes sense (because the type was changed, it no longer exists, etc).
“This DB column must be used” is kind of a complex topic here. OOP for example is about building internal API’s which in theory force users of those API’s to account for these kinds of changes. On the other hand those API’s are generally used by the people creating them so it’s more about assistance than forced compliance.
Stored Procedures have all kinds of other issues, but this is an obvious failure.
Sounds like the issue is a type system then, no? So stored procedures should be no worse than all of that Python, JS, Ruby, Elixir, Clojure, etc etc etc that’s running web backed a all over, right?
In fact, stored procedures (in relational DBs at least) are statically typed, so they will error if you change the type of a referenced column to something that no longer makes sense. This gives you more safety than a dynamically typed application language can, at least out of the box.
The parent was arguing that stored procedures are worse than conventional programming languages because the compiler won't warn them about any errors. I responded by pointing out that lots of web application backends are written in dynamically typed languages such as Ruby where there is also no compiler to warn of type errors. I'm definitely not arguing that stored procedures should look more like Ruby.
I have seen large database heavy apps. They are the cleanest I've ever worked with. I've had no trouble composing SQL. I can't think of one example where this has been an issue. You need to rely on views, functions and tables to store your business rules and look up the rules.
The big down side I see is the learning curve. Writing business logic in the database is not easy and not easy to maintain. It can take half a day to grow a large function. Even longer to debug sometimes. But you learn to structure things well.
Say for example you need two queries, one for fetching a paged list of customers ordered by birth date, and one for fetching a paged list of customers who are male.
You will find you'll need two queries with a huge amount of duplicate logic - the SELECT clauses, even if they're pulling the same columns, will need to be duplicated (don't forget to keep them in sync when the table changes!), because there's no reasonable way to compose the query from a common SELECT clause. The paging logic will be duplicated for the same reason. So although the queries are similar, only sorted and filtered differently, you will need to duplicate logic if you want to avoid pulling the entire data set first and performing filtering and sorting in separate modules.
These problems become significantly worse when you're talking about inserting, updating or upserting data that involves validation and business rules. These rules are not only more difficult to implement in SQL, but they change more often than the structure of data (in most cases) so the duplication becomes a huge issue.
I haven't tried this on a system that uses Postgres functions so I could be way off base here, my experience was pure sprocs in MS SQL.
I'm trying to figure out what kind of code duplication that you'd be dealing with that you can get around by avoid stored procedures. Unless you're relying on an ORM to construct your queries for you, you're going to have SQL queries somewhere in your code. It's either going to be stored in application code or in the database but either way you're going to have the same problem.
MSSQL can certainly do all of that. panarky offered one way of doing it, but you can also write a single SQL Function that takes a parameter for filtering by gender (or not), and encapsulates the pagination logic. You will then be able to select form that function passing different parameters for filtering by gender and only adding a sort at the end.
Likewise the business rule validation could be encapsulated in triggers or in a stored procedure that would validate all changes, much like your C# code would.
Becuase of lack of tooling there is tons of duplicate logic in SQL based systems, I see the same logic repeated across queries which already exists in some view but no knows about it so it ends up being repeated.
Lack of abstractions like modules and namespaces for logically grouping stored procedures makes it hard to understand where is what
Thanks that is very interesting, I would imagine having all the data in one schema, and then few other schemas containing views and functions grouped together logically.
I agree. To me this sounds more like bad programming (high coupling and low cohesion) and like it would be a problem even if a separate application acted as the backend.
I believe this in the abstract, but would be very interested in some concrete explanation for why sprocs compose worse than other programming languages. How would the OP's situation have improved if instead of sprocs he had a similarly complex webapp and had to deal with the schema changes there? I'm not challenging the OP; I don't have enough experience with sprocs to have an informed opinion, but this information is necessary to make sense of the OP's criticism.
The best argument against sprocs that I've heard is that you really don't want any code running on your database hosts that you don't absolutely need because it steals CPU cycles from those hosts and they don't scale horizontally as well as stateless web servers. This is a completely different argument than the OP's, however.
Procedures don't have arrays/structs/classes/objects. This means they can't share data unless through parameters.
Polymorphism is possible, but not really encapsulation. One could argue that arrays/objects would cause consistency problems, and therefore don't belong in a data language.
Someone expecting python-like OO programming will go mad.
The performance argument depends on the whether the database is limited by CPU or bandwidth and what the load looks like. It's shown in benchmarks that the round-trip between db/network/orm/app takes many orders of magnitude longer than the procedures themselves.
Databases are engineered for high reliability - data integrity above all else. That means they develop more slowly and are perpetually behind what you expect from a modern programming language.
If someone created a database with the intention of it being a good development framework, it would probably be more pleasurable to code against, but would you trust it with your data?
Relative to most RDBMS it evolves quickly, but from a language perspective it's barely different to the SQL I was writing at the start of my career. Meanwhile Go didn't exist then and does now, and has better tooling for any normal development workflow.
My impression of Postgres development is markedly different. It seems like each release since 8 or 9 has brought significant, exciting new features. I don’t pay close attention, but IIRC, a lot of investment has gone into JSON, window functions, and lots of miscellaneous but important things (e.g., upsert). Meanwhile Go’s language changes have all been minor and boring (not a bad thing!).
> Databases are engineered for high reliability - data integrity above all else.
So should be most of your (micro)services. I have seen more instances of sloppy system design (non-transactional but sold as such) than coherent eventually consistent ones.
Surely that's situation dependent? Databases are built to handle anything up to and including financial and health data.
On the other hand, does it matter if my social network updoot microservice loses a few transactions? With code running outside of the database, you get to decide how careful you want to be.
Yes it does. Are you mentioning in the internal services documentation that they can lose data second unknown conditions? And on your product hunt page that "your content might be randomly lost"? If you do not then you are lying. Everybody using the service expects transactional consistency.
Note that what you described is not eventual consistency but rather "certain non-determinism", there is an abyss of difference.
I don't mean specifically with transactional consistency (I don't think I even mentioned eventual consistency), but just generally the level of engineering required to write a system as well tested as postgres inevitably will slow down its feature development. This means databases don't have the latest features typically enjoyed within application development environments.
However I believe you CAN tolerate some level of failure and defects in your app code, knowing the more battle hardened database will - for the most part - ensure your data is safe once committed. Yes, there will always probably be bugs and yes some of those bugs may cause data loss in extreme cases, but if you're saying you perform the same level of testing and validation on a product hunt style app as you would on a safety critical system, or as postgres do on their database, I find that extraordinary and very unrepresentative of most application development.
I'm not saying defects are good or tolerated when found, but from an economic perspective you have to weigh up the additional cost of testing and verification against the likely impact these unknown bugs could have. Obviously everyone expects any given service to work correctly - but when is that ever true outside of medical, automotive and aerospace which have notoriously slow development cycles?
Personally I'd pick rapid development over complete reliability in most cases.
Spreading your business logic across all of your code base is a bad idea in any language. Not doing it as a primary design pattern is language agnostic. Your argument has nothing to do with whether the code executes in the database, a JVM, or native to the OS.
Back in the mid-90s, this was how many of the high-end web apps were built, using Oracle. Despite its limitations, and Oracle license fees (but back then you were paying for Sun servers anyway), it worked surprisingly well and was a reasonable model for many database-driven web applications. Some of these systems were used for mission-critical global operations handling enormous amounts of money and complex business. It had a lot of advantages versus other web app stacks at the time (commonly a mountain of Perl).
Oracle was happy to sell database licenses but made zero effort to develop this way of using the database as a first-class capability. They were astonished that it even worked as well as it did, they did not expect people to take it that far.
For some types of applications, something along these lines is likely still a very ergonomic way of writing web apps, if the tooling was polished for this use case.
Not surprising. Databases are built to be really efficient, and with enough hardware, they can forgive a lot. Mediocre developers can produce a lot of "working" code if they stand on the shoulders of giants. (Not saying you or your team was mediocre, I'm just saying that I've seen really bad code run a lot of business processes. At the end of the day, it worked and the company made money.)
Just like I keep seeing plenty of mediocre teams doing for loops over GB of downloaded data and pegging Webserver CPUs for stuff that should have stayed on the DB.
Usually that is one of the things I end up fixing when asked for performance tips, other is getting rid of ORMs and learn SQL.
I did a take-home exercise earlier this week. They provided access to a database and said << do thing >>. Implied was that I should << do thing >> with a program rather than a query.
One of my solutions used Golang. It pulled data, looped a lot, then put results back. It took approximately 4 minutes.
I included a second solution, using SQL. It selected into the results table with a little bit of CTE magic. It took approximately 600ms.
Interesting, and I've seen and implemented similar optimisations plenty of times. The right query in the right place at the right time can make all the difference.
Did you hear back from them with a positive response?
Oh don't get me wrong, there is a time and a place for everything. There are many things that should be in the database, and there are a lot of people that avoid it because they don't want to learn SQL. You can have really good reasons and elegant code in SQL. I'm merely saying when someone immediately says, "that's dumb, you shouldn't do that," they aren't looking at the whole picture possibly and it was the choice that got the programming running which is important.
It should be noted too that the 90s was really a different world back then. Processing capabilities, and development tools were vastly different.
There was just so much stuff that you had to manually build you don't have to now. (There are other complexities that we get to worry about now.) One thing I noticed back then is that putting stuff in the database just made things easier. In some instances, it really did act as like an amplifier for getting work done. Need to sort data? The database has things in place to do that and really efficiently too. I remember working with people, and if they had to make things like a linked list or a binary tree, they were lost (Yes, these were CS people too. We can argue about their education, but yes, they did graduate with a CS degree). There was nothing really like Stack Overflow to ask for help. You were really on your own for a large portion of it. Database code really took care of a lot of that for you. (Truthfully, the companies I worked for used MSSQL Server back then, so your results might have been different.)
Another unusual usage for SQL databases (today!) is as a data type server. SQL databases have a very rich set of data types with rigorous specification and myriad operators, conversions, etc for all the types, more so than just about any other programming environment. If you need correct support for a complex data type operation in a development environment with weak or non-existent type support, you can use a SQL runtime to do that computation for you over a local client connection.
I've used PostgreSQL this way a few times to great effect. The connection is over local IPC and never touches storage, so the operation throughput is quite high.
Yeah nobody in the 1990s with a real CS degree didn't know what a linked list or binary tree was. They just didn't want to bother coding it and testing it, when the database was there and being used anyway.
Using a database as a backend comes up time and again and developers are sharpely divided over this concept. I feel that developers who started their careers before the advent of NoSQL DBs/MEAN stack/etc had to go through the rigor of fine tuning their SQL queries and witness the powers (and quirks) of a performant SQL DB. These devs are usually a fan of using DB-as-a-backend. Personally, I am a big fan of PostgREST and I apply it whenever I find a use case that suits it. I find it quite pragmatic and it just shines when it plays to it's strength. Pragmatic developers should took a pragmatic look at such approaches and try to use the powerhouse that are SQL DB engines.
Shameless plug: I've written an article to showcase it's philosophy and how easy it is to get started with PostgREST.
This is a wonderfully in depth tutorial, thank you for taking the time to write this up! I’ve debated using PostgREST for projects before and I think this will push me to use it on my next one.
"“Database should store data not logic!” — well, no. This is called data-base, not data-bag. Databases are specialized engines to deal with, you guessed it, data. And they are extremely efficient at it. Not to mention - majority of the job backends do, is dealing with data. This is good for prototyping. Unfortunately more often than not — prototypes lands on production with updated connection string and debug=False."
Or, as I like frame the issue, there's a difference between a datastore and a database. Datastores/persistence engines store your data in manner that you can get it back later. Think of a key-value store for example. A database, however, assists you with the management of your data, including helping to ensure correctness of the data, tracking changes, among other things. For most systems lot of work is already managed in-database whether you like it or not.
To add to this, if you need your data to be consistent at all times, it probably makes sense to let the database take care of that instead of managing it at a higher level (and hoping you don't make mistakes). This is where keeping your data-related logic in-DB is really advantageous.
And even the most conservative use of databases usually relies on them enforcing particular application logic. E.g., foreign keys or non null constraints. Or, to be even more radical, maintaining structured data itself, with separate tables comprised of different columns with different data types.
Everything old is new again. This was a common way to write client-server software in the pre-web world, clients talking directly to a central rdbms.
I guess the one true thing in software dev is the cycle/pendulum keeps rotating/swinging. Often without people realizing it's swinging back instead of brand new!
I almost can't believe we are talking about this now. I mean that in the sense of, I have built applications every possible way - when you work in a big company and inherit one of everything, you have to do that often.
In my past, I built one application that was fully database driven, and at the time (2004-2005) it would have been difficult to pull off without stored procedures and table driven logic - fully maximizing the power of SQL - especially in the timeframe I had to do it (<3 months). I mean, I pushed the technology HARD (expert system for fraud detection that worked hand-in-hand with basic machine learning).
I will never forget how I was derided by people for that choice - even though, that system is still running today and working well, in the bowels of an acquirer. I mean, literally, I was derided to the point of getting imposter syndrome for feeling that I made a choice that others regarded as so limiting.
The truth is, I learned everything else about distributed applications and databases because of being derided in that way. Ultimately, I now know how to architect things many different ways and can choose when I feel it is appropriate. I also know not to let the negativity of others prevent success.
I am not sure the moral of this story. If you can build a system, and it serves its purpose well and for a long time, and it works and provides the needed value.. it really may not matter. But, you can choose sometimes, and if there is one truth it must be that there is not always only one way to do something. Try to pick the right tool for the job as best as you are able.
Don't think you're dumb just because other people don't like your idea. Keep your mind open, and be willing to learn, but if you can make it work, and prove it works, you are just as right as anyone else.
Really anything can be a good or a bad idea depending on the problem domain and how it's implemented. He doesn't really go into how he made it work too closely, so it's easy to say it was a success. The real interesting thing to understand would be, "Why was it a success versus other approaches?" What really made it a better choice vs other approaches? What was the team make up? He says that it made the code easy to understand. To whom? Interesting the team made it work, but not enough to say, "Sure let's try it without a lot more forethought."
Exactly! If you're writing a C compiler, it would probably be a bad idea to have that logic in a database, though it could be fun to try. On the other hand if your application is summarizing a lot data by different dimensions, it would probably be a bad idea to transfer all that data over the network only to summarize it in an application layer. Most applications have a little bit of both though and so will need both an application layer and database layer.
I'm currently part of a team trying to UNDO this very concept. It sounded great in the beginning, but after years of headaches and wasted cash; we're building the backend in something else. I wasn't part of the original team. Nor was I part of the decision to migrate off of the system. I just know that for my employer, it was a bad decision those many years ago...
We're staying away from stored procs as far as possible. For us the main driver is that it would make it harder to run older versions of our program alongside newer versions.
We rely on this to be able to bring updates quickly to our customers when their needs change.
We also found the tooling lacking for our database server.
On the other hand, we don't use ORMs, instead mostly relying on handwritten select queries, with "dumb" insert/updates being handled by library and others by hand. So we normally don't pull more data over the wire than we need.
This takes a balance to get right. For some data yes drag it over and keep it on the 'client'. But for other data just join it out and let the SQL handle it. However, for some reason SQL seems to scare a lot of people. So they end up making lots of microservices/functions/methods that just drag the data together and mash it together in a language they understand. That comes at a cost of network bandwidth.
I found that if you are 'happy' with the style of lots of stored procs and data lifting on the server side you usually have a decent source delivery system in place. If you do not have that you will fight every step of the way changing the schema and procs as this monolithic glob that no one dares to touch.
What I've found being the problem with this approach is that sql does not lend itself very well to composability. Sure, you can make functions, views and stored procedures and compose them. But when you start actually composing queries of these parts the could lead to different execution plans having wildly different performance characteristics.
Also tooling around SQL, i.e. refactoring tools and debuggers, is not great - if even available at all.
sql is not a code, like c#, you dont debug it, you inspect results, or inspect execution plan/traces.
how would you debug HTML, for example? Open it in browser, right?
same for sql: run it and see if it works as you wanted
same goes for composability: you can compose SQL same way you can compose HTML, but you still need to understand the big picture
explicit control flow in SQL (cursors, for loops and stuff) is absolutely an antipattern. You need to think in terms of relational algebra and functional programming in order to write clean SQL
For those who looked at list item #9 ("no code versioning") and thought the author's makeshift solution to migrations seems awful, I highly recommend sqitch[0], an extremely simple but well thought out migrations tool that is platform- and database-agnostic. I'm using it on a side project with success, and haven't run into any major hurdles yet.
I'm using this in production for our core Postgres database and am quite happy with it. We've used it to "devops the database" in that all migrations are deployed via CI jobs triggered by git commits.
I do sometimes wonder if Liquibase or Flyway would be easier, but I love the encouragement to write _migration tests_, which do not always look like the unit tests you might write with pg_unit.
Something we struggle with is naming our migrations; the name you assign a migration is the only organizational control you have over it, and it becomes essentially impossible to introspect the current state of the database through the migrations alone.
The one problem this technique has is that data and software normally have two very different speed and correctness requirements. That means that data and software should evolve following different procedures, what heavily implies that you want different people working at them, or at least you want people to look at them at different times.
For that, you really want independent access controls and CI tooling.
Of course, you can't separate much if you are a 1 or 2 people team at early stages of some project. And it may help you move faster.
But:
> "Rebuilding the database is as simple as executing one line bash loop (I do it every few minutes)"
This denounces a very development-centric worldview where operations and maintenance don't even appear. You can never rebuild a database, and starting with the delusion that you will harm you on the future.
For this reason you should always separate data and code and put them into separate schemas. In such a setup, deploying new code essentially means dropping and re-creating the code schema in a single transaction.
I work with one codebase similar to that (DBMS only), and that's exactly what I do there (it's not that clear-cut, some code validates data, some data is short-lived). There is more than one "code" schema, with different lifecycles too.
There are plenty of interesting problems with it. I would prefer to completely separate code from data if it wouldn't impact elsewhere. As it is, separating them would severely harm us, so it's kept this way. It brings a lot of productivity, but we have a very small team working on it, and completely separated procedures for them.
> You can never rebuild a database, and starting with the delusion that you will harm you on the future.
That seems like a silly assumption. Who's to say you can't use databases that you can rebuild from scratch whenever you want? What about append-only logs built with CRDTs and the other ways?
"rebuilt from scratch" means "as if the production server just fell into a woodchipper and I'm taking a new one out of the crate now"
You cannot do that with databases because the data needs to come from somewhere.
I'm sure you could implement some kind of echo server in SQL somehow and yeah that is a "database" that could be rebuilt from scratch with no loss but that's obviously not the kind of thing we are talking about here.
The primary issue I have seen with stored procedures is how you update them. I would be curious how they manage that.
Generally when releasing new code you want to do a gradual release so that a bad release is mitigated. It would be possible by creating multiple functions during the migration and somehow dispatching between them in PostgREST but I would be interested to see what they do.
The other obvious concern is scaling which was only briefly mentioned. In general the database is the hardest component of a stack to scale, and if you start doing it do more of the computation you are just adding more load. Not to mention that you may have trouble scaling CPU+RAM+Disk separately with them all being on a single machine.
Many scaling problems in databases are because the computation is not done in the database. Outside of exotic cases, good database engines are rarely computation-bound in a competently designed system. By pushing computation outside of the database, it tends to increase the consumption of database resources that actually are bottlenecks, typically effective bandwidth (both memory and I/O).
As a heuristic, moving computation to the data is almost always much more scalable and performant than moving the data to the computation. The root cause of most scalability issues is excessive and unnecessary data motion.
That's not been my experience at all. Quite the opposite.
I actually can't remember a single time in 15 years where I've ever seen that. Maybe it's happened, but I don't remember, but I can easily recount tons of poorly performing SQL queries though. Sub-selects, too many joins, missing indexes, dead-locks, missing foreign keys...
I'm not saying it wouldn't cause a problem, I'm saying I've never, ever seen production code where someone dumped tons of data out and then processed it.
I have seen people try and make extremely complex ORM calls that were fixed by hand-coded SQL, but that's a different problem.
That's only works for sql optimized case / queries, aggregation with indexed keys is one of them. However there are cases where application level is more optimized, such as cases where more complex data type is required, or array processing. Furthermore sql logic cannot be easily encapsulated and reuseable, which leaves you with many duplication without proper planning.
> The primary issue I have seen with stored procedures is how you update them. I would be curious how they manage that.
1. You put your stored procedures in git.
2. You write tests for your stored procedures and have them run as part of your CI.
3. You put your stored procedures in separate schema(s) and deploy them by dropping and recreating the schema(s). You never log into the server and change things by hand.
Wouldn't that lose all the cached query plans every deploy? Could be wrong, bit rusty on that stuff now, been a while since I worked on something where we had to worry about that.
rebuilding query plan takes less than 50ms and is routinely done by the engine itself without you ever realizing it.
what you probably wanted to mention is table statistics - but they are cleared only if you truncate your table, but then again - once you populate your table - the engine will recalc statistics by itself.
overall RDBMS does a lot of stuff behind the scenes for you, and you should take advantage of it, and instead think about more important atuff - business logic, data modeling, schema evolution, etc
> Generally when releasing new code you want to do a gradual release so that a bad release is mitigated
Make each stored procedure not depend on any other procedure. If you change one procedure, make sure that every existing procedure can work with both the current and the future version. Once compatible, start upgrading procedures one by one. Add in monitoring and automatic checking of the values during deployment, and you have yourself a system that can safely roll out changes slowly.
I'm not advising anyone to do this, it sounds horribly inefficient to me, but it's probably possible with the right tooling.
I think read replicas scale easily, so if it’s views and such then there is no problem. If you have to compute stuff during writes, then it’s hard, but probably solved by normalizing the data.
Many commentators seem to be getting hung up on "stored procedures." That's understandable given the title of the original post, but it's also unfortunate because it mixes independent concerns: business-logic in the database, and using stored procedures. You can do the former without the latter, and in my experience it works better:
* custom data types and domains
* use schema and namespaces
* foreign key constraints
* check constraints
* default values
* views
* functions (not stored procedures)
* triggers
In my experience, you can get a clean, efficient, easily-understood data model and application with those ingredients, without ever having to touch a stored procedure, a looping construct, or a conditional statement.
Is it just me or is this literally the worst architecture idea that's popped up here? I've seen this done in a few large projects, including some extremely famous products, and it produced endless grief for the engineering team - and ultimately the organization. Among other things, it makes performance scaling much harder to achieve, makes software deployments really scary, and makes it almost impossible to migrate to a different database vendor.
It is true that it can be done. But I really don't think you want to.
I'm very curious about this pattern, but does the tooling exist to make this feasible across a team.
One of the big problems with SQL views is their lack of discoverability.
The lack of tooling also plagues SQL based systems, I would miss goto definition, find all references and sensible autocompletion.
I am curious why no one has built suitable tooling around all this, because it is a great idea for lot's of scenarios
You can setup a view (accessed by postgrest) that returns an error when there are no (correct) filters in the query so that takes care of select *... problem (also you can have the same logic in the proxy (nginx) layer
In my experience I actually got A LOT more from this approach using postgraphile/Hasura in front of my DB and later moved to dosco/super-graph in order to add custom functionality that was kind of a pain to do on the DB.
I really liked the mixed approach of have the DB do everything that it can and have a light backend that can handle http requests, api callbacks, image manipulation and whatever else.
Out of interest (as the PostGraphile maintainer) did you look into https://www.graphile.org/postgraphile/make-extend-schema-plu... for extending the PostGraphile schema to do whatever you need, or were you specifically looking to implement the extensions in Go?
I can't think of an example where CLR integration would be beneficial.
Probably integrating ML model inference, written in ML.NET could be use case, but we have SQL with ML Server with r/python support now, so.
The problem with CLR is that you need to know and understand sql engine internals in order to write good C# code for CLR integration, otherwise your clr code will be blocking the sql engine
At a previous job I was at 2007-2014, we were doing classified ads[1] we had a lot of logic in stored procedures in our Postgres database written in pl/pgsql. It made sense at the time, but after a while these got really unwieldy and deploying things got tricky, since you we needed to update both appserver code and database stored procs at the same time. There also wasn't a great way to update stored procs at the time - we built this into an RPM that just ran a postinstall script to do an atomic swap.
I wouldn't build this into the database were I to build it today.
At a previous job I was at, we put each release of the sprocs into a different schema. Upgraded app servers called sprocs in schema "r102" while the not-yet-upgraded app servers continued to call sprocs in schema "r101" until they were eventually upgraded.
I'm basically doing the same thing with Hasura, but instead of using their Events webhooks, listening to the WAL for changes with Elixir (I stole the listening code from Supabase realtime: https://github.com/supabase/realtime).
I've never written so little backend code for such high performance in so little time. If I were starting today, I'd probably use Supabase for the API as well (which is PostgREST under the hood).
Oracle really pushed this idea too, because they knew that it would mean more load in the database so you’d need a bigger box with with more cpus. Oracle charged per tiered cpu so their incentives were not aligned with the dev’s. IMHO this is pretty damning reason to generally avoid using the db for this reason. It may be true that a db has lots of spare cycles to utilize but so do your hot spare servers, and I’d hope that you’re not trying to wring performance out of those without thinking long and hard as to what hot spare means.
Source: migrated code from oracle to mysql to avoid insane licensing fees once upon a time. Luckily we only had limited stored procedures at the time, most of which I could ignore.
Many large Banks (including Bank of New York where I worked in) and ERPs have stored procedures as their life and blood, have invested and built huge tooling over it – namespacing, dependency tracing of who's using what, monitoring, etc. It's excellent at having a tightly integrated, closed system where the Database is the source for authorization, module ownership, etc.
But there were reasons why it was not successful:
* Severe vendor restrictions. e.g. For a long time, Oracle could not do distributed compile-time PL/SQL checks. If you changed a function definition, other procedures that depended on it would break. But if you change a procedure's definition, it wouldn't tell you anything until you deploy it at runtime and something somewhere will fail, or recompile every other procedure. I don't know if it's fixed yet.
* Testing, or the severe lack of tooling that we have taken for granted – code coverage, easy local machine unit tests, etc
* Lack of modules (availability of an ecosystem or community of modules to solve recurring problems), and the subsequent dependency management and versioning overhead that comes with it
* Debugging, i.e. putting breakpoints, remote debugging, logging, good exception handling, etc
* Change management, graceful rollouts, rollbacks, etc. There is no so-called "Router" for stored procedures, they're all tightly coupled with each other, so there is no easy way to rollout and rollback new versions quickly without recompiling each time
* Connection limits. There is a tradeoff on a database being good at all the above – and how many concurrent connections it can support. It was impossible to simply "pass through" entire web traffic over to the underlying connection, without investing in quite heavy tooling
Overall, if you already have put all your eggs and tooling into one giant, well designed, well governed database – then PSPs are a natural evolution forward.
Have a customer with a 25 y/o Oracle PL/SQL system that basically operates their business and they can barely iterate on it. Every year a new consultancy comes in with their flavour of microservices and tries to subsume some functionality and they choke on the monolith to end all monoliths.
My previous employer did this, but it was fraught with problems.
It can be done properly, but you need your entire development team to have a good understanding of declarative programming, and you need to have as much business logic embedded in the stored procedures as possible.
Instead, we had developers who continued to think imperatively. Whenever they were implementing a new feature, they would write several small stored procedures to obtain data, and then process it iteratively inside the application server.
The ensuing result was that there was far too much chatter between the application server and the database server than was actually necessary; loading a new web page in the application would take seconds with one single concurrent user, and this was a product that was supposed to scale to tens of thousands of concurrent users!
I'm very interested in this, as I come from the web world yet find myself in a company that has a 30-year legacy of putting business logic into the database. It served them really well, and that they run quite lean in terms of hardware. But they're built on Microsoft tech ($$$), and moving things to a traditional database=datastore cloud architecture would require a substantial change. Does anyone have resources they can point me to on the wisdom or folly of keeping this logic-in-database approach? It seems like the tradeoffs in the different approaches are all about where the smarts are, and whether you can horizontally scale those smarts. Does that sound accurate?
> It seems like the tradeoffs in the different approaches are all about where the smarts are, and whether you can horizontally scale those smarts. Does that sound accurate?
The real question is whether you are going to ship your code to your data or your data to your code.
I actually had to complete a dBase project for my last year in high-school (it was math/comp-sci oriented, so it made sense), but man I did not like it at all :) - I was all about Turbo Pascal then!
Love it. I am working on a similar but different concept for a prototype I am working on. The project requires many document/NoSQL style micro services, and I had the idea to do this directly with the file system and JSON serialisation, with thread locking on appropriate read/write calls (some calls that are read only don’t require locks). It is extremely fast and performant, and given the way I’ve delineated the block storage that the file system runs on, my hope is it will also scale reasonably well.
I have also implemented a message queue using the same design/infrastructure.
Er ... SQL Server Management Studio is based on Visual Studio shell, and it does a fine job debugging stored procedures. I also code in C# and agree it also works well for that.
IME I've never needed one. You get a copy of prod and test against that. SPs are typically short, even when they're long their functionality tends to be well defined, it's never been a problem for me.
You certainly can and do add logging, this can be useful.
I don't know what industry you are in or what kind of data your users entrust in you, but "just get a copy of prod" is not an idea that would EVER fly in any organisation I have worked in. Developers do not get access to production customers' financial records, medical data or PII, ever.
Hands up, it's a fair point. I've worked with data sets that are mainly large aggregates of public data so security isn't an issue. It's the code that's carefully protected.
One exception is having access to that data because I was working ON the prod server. Read that again. It was deeply uncomfortable to work like this. Very worrying every day.
In those cases you have a set of test data that you use for development and testing. It's not like you can attach a Visual Studio debugger to production either.
Nice work; PostgREST seems neat. However, the aim of this stack is simplicity, but it is incompatible with serverless products, which I now vastly prefer for ease of use and lower effort- traits I would consider part of "simplicity".
But I'm sure there is a valid argument contrasting simplicity with a managed nature of a tech stack, not to mention a cost comparison.
> However, the aim of this stack is simplicity, but it is incompatible with serverless products, which I now vastly prefer for ease of use and lower effort- traits
The issue with this statement is that you're comparing things that works on different layers. PostgREST is for someone to host the database without any backend, serverless as you describe it is just a SaaS service you use. If there was a SaaS that exposed PostgREST as a service, it'll be a similar experience.
So unless you're comparing running a serverless platform yourself with running PostgREST yourself (which would be pretty obvious which one is the easiest), I feel like you missed with your argument here.
I’m talking about the stack as a whole, which happens to be built on PostgREST.
The OP’s goal was to simplify their stack, which they did do. However, they cannot run this stack without managing a database, as the database and REST API are run in the same container. My point is that another way to lower a project’s required effort is to offload database management.
Sure, OP now only has two layers in the stack instead of three. But now they are locked into managing the database entirely, including updates, backups, and scaling.
Okay, I think this idea has legs. There's a proven market for Serverless offerings, so how would people react to a SaaS product that:
a) Did database management for you, a la Amazon RDS (perhaps later supporting BYO Cloud Database)
b) Provided a software programming layer that simplified coding, versioning, deployment and rollbacks of Stored Procedures
c) Perhaps provided a language environment that lets you program with an SDK in your language of choice, and then "lifted" that into stored procedures
My first thought when faced with this is, how do I do application monitoring, logging, exception handling, paging? Perhaps this is a solved problem, but I'm curious.
Managing business logic in applications is a nightmare to control. That's why things like constraints and unique keys exist database side. If I have triggers that manipulate the data and update another table, you'll know that's always going to happen regardless of what application is writing to it.
The first web application I did, that went in production 22 years ago, was in it's entirety based on stored procedures. This was the main idea of Oracle's web application services at that time.
But, you have a very big tradeoff - you will be stuck with that database. Hardly portable.
What about web applications without using javascript?
Could you send out Content-Type text/html and react on FORM-Posts?
I still develop webapplications with only additional javascript. The webapp on itself is completely usable with javascript turned off (the way it should be!)
I would add that if you have SQL Server, and you want to use it other than in SQL Injection Mode, you more or less have to use sprocs. Anyway, you used to have to--I haven't checked in lately.
I personally would not make this choice for new projects. I only use technology that is appropriate for the domain I'm using it in. Although, I am hoping this trend encourages better tooling and more modernization of SQL.
As others stated, my concerns are primarily that SQL lacks a lot of general-purpose language features (modules, namespacing, composability, no easy way to debug, etc.) which seem to be ideal for writing applications.
I would be interested to see a project written with PL/Python + PL/SQL stored functions in Postgres. PL/Python functions would be the interface between API calls and the database, and SQL would do what it is currently designed to do: straight-forward data manipulation.
Use as many "base" schemas (namespaces) as needed, and only one `api` schema which has only views & functions you want to expose.
I take this further by having an `api` schema per major version (e.g. `api_v2`). The downside is it becomes impossible to maintain more than 2 major versions at once, so you have to get everyone from v1->v2 before you can make much progress on v3, but the upside is that you should really try to avoid hard breaking changes in the DB very often.
This won't answer this question in full but perhaps some part of it.
Projects like PostgREST (and PostGraphile) use schema introspection to generate the API. When the schema changes it's automatically reflected in the API. Sure one has to keep the changes synced on the frontend but the premise is you don't want to decouple.
Well it’s come full circle for me. 40 years ago I started as a professional programmer. The big issue at the time was how to represent business logic, other than as stored procedures.
This is basically how Q/kdb+ applications work in finance. Data and application exists on the same server. Of course, the programming model is much richer than pure SQL.
Using a throwaway because I don't want anyone to guess the product/company.
I'm CTO at a company who's product makes heavy use of stored procedures for business logic. It's constantly the cause of all the biggest headaches. I inherited this architecture, I didn't design it myself. I believe the original rationale behind the design boils down to 'SQL is better at the type of set based operations this product does a lot'. i.e. If you've got an array of 100k values, and you want to perform an operation against a second array of 100k operands, you can do that much faster in SQL with a join than you can by loading them all into memory/code, looping over to do the operations, and saving them again. This is kind of true in the simple cases, but once it grows to require lots of logic and conditionals in those operations, and then chained operations you start to lose the benefit.
Upgrades and versioning are generally a bit awkward, we've got it fairly smooth now, but it still causes pain reasonably frequently when something in the upgrade process breaks. It was worse when I started as many upgrades were a bodged together folder of scripts with lots of 'if exists' type checks in them. Now at least we use a mostly automated diff based upgrade process for the database. Some types of changes still require manual upgrade scripts though. The articles solution of a folders of numbered scripts doesn't really look viable if you need to manage upgrades from different versions to the current latest version. Re-creating customer databases doesn't really go down well when they tend to like to keep their data.
Debugging is awkward. There are tools, but none of them compare to code debuggers.
SQL isn't easily composable, we have repeated SQL all over the place (or nearly repeated with small changes which is kind of worse because you don't spot the differences). Finding a bug in one of these repeated blocks means spending the next few hours hunting down any similar SQL to check over the the same bug.
Performance is unpredictable and all over the place. Stored procs that run fine one release will suddenly start performing like a dog the next release. We often never discover the actual 'cause', the thing that changed that made it slow down. We just end up finding a way to make it fast again by adding a new index, or changing the way a join is composed, or splitting something up. I've not yet got evidence, but I'm convinced that we've made performance improvements in one release only to reverse the exact changes several releases later also as a 'performance improvement'. It feels like playing wack-a-mole. Because of query plan caching, parameter sniffing and other optimizations the DB we have had scenarios where the performance of feature x, depends on if you used feature y before hand in the same session or not. We have some exact duplicates of stored procedures that are only there to ensure that two different code paths that use them don't share the same plan cache because when they do we get problems. Performance characteristics are often very different between dev setup and customer setup. Performance characteristics are different for customers that choose to host on cloud servers compared to customers that host on physical hardware. I don't just mean cloud is slower, I mean it's different. Some things are faster on cloud servers, but it's never predictable what will be what. It makes testing for performance very hard.
The articles statement about 'a database spends 96% of it's time logging and locking' is totally irrelevant. So what if that's what it spends 96% of it's time on. It's still spending that time. And as soon as your database has multiple users all those locks are going to start getting in the way of each other and causing delays or deadlocks.
It doesn't scale at all. Our DB severs are powerful and we can't realistically go much bigger (CPU & RAM wise), yet better performance is probably one of our customers biggest requests.
Deadlocks are not uncommon, hard to defend against, hard to fix, and half the time introduce other deadlocks in other places.
Maybe it's good in some scenarios, but once you have a growing evolving product being built by a team, it's far far harder to manage if a large chunk of the logic is in SQL.
Performance basically. As SQL is a set based language you can often replace cursors (which are basically loops) with joins and that is orders of magnitude quicker for larger workloads.
Not necessarily. You can make it so that cross replicated databases can function independently and then sync the data together. This of course assumes that you don't have something like an ordering system and you absolutely need to make sure that you don't sell more than your inventory.
That makes sense, yeah. I admittedly haven’t studied the pieces of the stack, but maybe something like PostgREST allows routing certain requests to master vs any replica, where you could make the atomic transactions needed for inventory based updates.
This is all completely outside the realm of things I’ll ever get into haha. IMO so what if the current app/db server boundary is “slow” or “adds too many moving parts” I already spent my entire career learning it, there is value in that to me.
If I deploy to the database and something goes wrong, I need to really trust my rollback scripts. If the rollback scripts go haywire, you're in a tough spot. If that happens to code, you can literally just move all the traffic to the same thing that was working before, you don't quite have that luxury with the database.
You can have a bunch of servers, you really only can have one database. This means the database should be very sensitive to changes because it's a single point of failure. I don't like modifying or changing single points of failure without very good and well tested reasons.
Could you version your procs and have the new version of the code call the new procs? Sure, but now you have to manage deployment of both a service, and the database, and have to handle rollover and/or A/B for both. If my logic is in there service, I only have to worry about rolling back the service.
Database logic saves you a little bit of pain in development for a ton of resilience and maintenance costs, and it's not worth it in the long run IMO. Maybe it's because the tooling isn't that mature, but until I can A/B test an atomic database, this doesn't work for a lot of applications.