Hacker News new | past | comments | ask | show | jobs | submit login

Can you elaborate a bit more on duplicated code/composition issue? I would think that functions would ensure that there isn't much duplicated code.



Say for example you need two queries, one for fetching a paged list of customers ordered by birth date, and one for fetching a paged list of customers who are male.

You will find you'll need two queries with a huge amount of duplicate logic - the SELECT clauses, even if they're pulling the same columns, will need to be duplicated (don't forget to keep them in sync when the table changes!), because there's no reasonable way to compose the query from a common SELECT clause. The paging logic will be duplicated for the same reason. So although the queries are similar, only sorted and filtered differently, you will need to duplicate logic if you want to avoid pulling the entire data set first and performing filtering and sorting in separate modules.

These problems become significantly worse when you're talking about inserting, updating or upserting data that involves validation and business rules. These rules are not only more difficult to implement in SQL, but they change more often than the structure of data (in most cases) so the duplication becomes a huge issue.

I haven't tried this on a system that uses Postgres functions so I could be way off base here, my experience was pure sprocs in MS SQL.


Views help with this.

Write one view for selecting customers, with all the joins, sub-queries and case-when logic.

Then select from this view order by birth date, and select from the same view by sex.

Yes, the limit/offset paging clause will be duplicated, but all the joins, sub-queries and case-when stuff can be DRY'd in a single view.


I'm trying to figure out what kind of code duplication that you'd be dealing with that you can get around by avoid stored procedures. Unless you're relying on an ORM to construct your queries for you, you're going to have SQL queries somewhere in your code. It's either going to be stored in application code or in the database but either way you're going to have the same problem.


MSSQL can certainly do all of that. panarky offered one way of doing it, but you can also write a single SQL Function that takes a parameter for filtering by gender (or not), and encapsulates the pagination logic. You will then be able to select form that function passing different parameters for filtering by gender and only adding a sort at the end.

Likewise the business rule validation could be encapsulated in triggers or in a stored procedure that would validate all changes, much like your C# code would.


We do something very similar is mssql using defaults for the input vars and an OR in the WHERE clause.


Becuase of lack of tooling there is tons of duplicate logic in SQL based systems, I see the same logic repeated across queries which already exists in some view but no knows about it so it ends up being repeated. Lack of abstractions like modules and namespaces for logically grouping stored procedures makes it hard to understand where is what


Every database I’ve used has namespaces: most application code ignores SQL schemas, but that’s what they’re for.


Thanks that is very interesting, I would imagine having all the data in one schema, and then few other schemas containing views and functions grouped together logically.


but they're just one level, no?


Yeah, but conventional naming is almost as good: foo_bar_baz.users and foo_bar_other.users


I agree. To me this sounds more like bad programming (high coupling and low cohesion) and like it would be a problem even if a separate application acted as the backend.


Well some systems make good programming really difficult. Stored procedures, especially back then.... Yeah.


I believe this in the abstract, but would be very interested in some concrete explanation for why sprocs compose worse than other programming languages. How would the OP's situation have improved if instead of sprocs he had a similarly complex webapp and had to deal with the schema changes there? I'm not challenging the OP; I don't have enough experience with sprocs to have an informed opinion, but this information is necessary to make sense of the OP's criticism.

The best argument against sprocs that I've heard is that you really don't want any code running on your database hosts that you don't absolutely need because it steals CPU cycles from those hosts and they don't scale horizontally as well as stateless web servers. This is a completely different argument than the OP's, however.


Procedures don't have arrays/structs/classes/objects. This means they can't share data unless through parameters.

Polymorphism is possible, but not really encapsulation. One could argue that arrays/objects would cause consistency problems, and therefore don't belong in a data language.

Someone expecting python-like OO programming will go mad.

The performance argument depends on the whether the database is limited by CPU or bandwidth and what the load looks like. It's shown in benchmarks that the round-trip between db/network/orm/app takes many orders of magnitude longer than the procedures themselves.


Databases are engineered for high reliability - data integrity above all else. That means they develop more slowly and are perpetually behind what you expect from a modern programming language.

If someone created a database with the intention of it being a good development framework, it would probably be more pleasurable to code against, but would you trust it with your data?


Postgres evolves much more quickly than, for example, Go. I’m not sure I buy this argument.


Relative to most RDBMS it evolves quickly, but from a language perspective it's barely different to the SQL I was writing at the start of my career. Meanwhile Go didn't exist then and does now, and has better tooling for any normal development workflow.


My impression of Postgres development is markedly different. It seems like each release since 8 or 9 has brought significant, exciting new features. I don’t pay close attention, but IIRC, a lot of investment has gone into JSON, window functions, and lots of miscellaneous but important things (e.g., upsert). Meanwhile Go’s language changes have all been minor and boring (not a bad thing!).


> Databases are engineered for high reliability - data integrity above all else.

So should be most of your (micro)services. I have seen more instances of sloppy system design (non-transactional but sold as such) than coherent eventually consistent ones.


Surely that's situation dependent? Databases are built to handle anything up to and including financial and health data.

On the other hand, does it matter if my social network updoot microservice loses a few transactions? With code running outside of the database, you get to decide how careful you want to be.


Yes it does. Are you mentioning in the internal services documentation that they can lose data second unknown conditions? And on your product hunt page that "your content might be randomly lost"? If you do not then you are lying. Everybody using the service expects transactional consistency.

Note that what you described is not eventual consistency but rather "certain non-determinism", there is an abyss of difference.


I don't mean specifically with transactional consistency (I don't think I even mentioned eventual consistency), but just generally the level of engineering required to write a system as well tested as postgres inevitably will slow down its feature development. This means databases don't have the latest features typically enjoyed within application development environments.

However I believe you CAN tolerate some level of failure and defects in your app code, knowing the more battle hardened database will - for the most part - ensure your data is safe once committed. Yes, there will always probably be bugs and yes some of those bugs may cause data loss in extreme cases, but if you're saying you perform the same level of testing and validation on a product hunt style app as you would on a safety critical system, or as postgres do on their database, I find that extraordinary and very unrepresentative of most application development.

I'm not saying defects are good or tolerated when found, but from an economic perspective you have to weigh up the additional cost of testing and verification against the likely impact these unknown bugs could have. Obviously everyone expects any given service to work correctly - but when is that ever true outside of medical, automotive and aerospace which have notoriously slow development cycles?

Personally I'd pick rapid development over complete reliability in most cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: