More

benkant · on Feb 16, 2017

By Blizzard you mean Rockstar?

edit: An issue was opened[0] but there has been no response. I find it hard to believe a C&D comes with a gag order but IANAL.

I find it frustrating there's no transparency on this, and I hope I'm not being overly dramatic, but I would have expected more openness here.

There could be a very reasonable explanation for keeping mum on this, I guess.

[0] https://github.com/openai/universe/issues/126

benkant · on Aug 13, 2016

A lot of this content originates from 4chan and other image boards.

mintplant · on Aug 13, 2016

Don't forget Turntable.fm which was ground zero for much of the original Vaporwave movement. Sadly the site itself is no longer with us.

benkant · on Feb 14, 2016

Mathematica not being Open or free is the only complaint I buy. As for it missing features, I'd wager adding them to Mathematica would be less work than building a CAS from scratch with them- in which case I'm sure you'll be missing features from the original.

swiley · on Feb 14, 2016

We already have open source CASs (a handful in fact). So no one is "building a CAS from scratch".

williamstein · on Feb 14, 2016

Exactly. When I started SageMath in 2004, by design it built heavily on Pari, Maxima, GAP, Singular, etc., which were all started in the 1990s (or earlier).

benkant · on Nov 15, 2015

We were writing web based GIS in 1996, using ESRI's MapObjects IMS. Tiling, projections, yes these were solved problems by then using available commercial software.

We were storing data in Oracle or DB/2 with ESRI's SDE sitting on top, because at that time no mainstream RDBMS had spatial types.

I don't recall if SDE supported SQL Server back then. That aside, you could have built TerraServer with COTS tech in 1997.

TerraServer was impressive due to being publically available. Nothing we built at the time was available to the public as far as I remember. But it existed.

And that's just web based stuff, which really is the easy part. Large scale GIS had been around for decades prior.

benkant · on Aug 1, 2015

Reminds me of Career Move, in Martin Amis' 2000 collection Heavy Water and Other Stories. In it screenwriters struggle for their art while poets are optioning treatments of their poems for "mid six figures".

benkant · on Aug 1, 2015

I don't know whether or not judging programming ability is hard. But judging ability in computer science, which is what we're talking about, is not at all "very hard".

benkant · on July 22, 2015

> There's also the "demarcation" issue.

This is probably just as significant as vendor independence or technical issues. But I remember it a little differently.

In the old days you'd have programmers who did the code and the DBAs who were tasked with making sure the data wasn't broken. They would write SPs, use constraints and triggers etc. Depending on the team programmers might do these as well or instead, but it was the DBAs domain ultimately.

Then programmers wrote ORMs and started doing all that stuff in code and somehow managed to seize control of data validation and semantics from the DBAs.

benkant · on July 22, 2015

This is good work and if I ever did web development, it would be like this. Why people in the web world don't use stored procedures and constraints is a mystery to me. That this approach is seen as novel is in itself fascinating.

It's like all those web framework inventors didn't read past chapter 2 of their database manuals. So they wrote a whole pile of code that forces you to add semantics in another language elsewhere in your code in a language that makes impedance stark. PostgreSQL is advanced technology. Whatever you might consider doing in your CRUD software, PostgreSQL has a neat solution. You can extend SQL, add new types, use PL/SQL in a bunch of different languages, background workers, triggers, constraints, permissions. Obviously there are limits but you don't reinvent web servers because Apache doesn't transcode video on the fly. Well, you do if you're whoever makes Rubby on Rails.

The argument that you don't want to write any code that locks you to a database is some stunning lack of awareness, as you decide to lock yourself into the tsunami of unpredictability that is web frameworks to ward off the evil of being locked into a 20 year database product built on some pretty sound theoretical foundations.

Web developers really took the whole "let's make more work for ourselves" idea and ran with it all the way to the bank.

You'd have to pay me a million dollars a year to do web development.

3pt14159 · on July 22, 2015

You are speaking from ignorance with the voice of authority.

I worked on a rails app that handled a billion requests per day. The problem isn't performance of the web framework, those are easy to load balance and split into C or cache when you need it. The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three.

And I know, I worked on a codebase in 1999 that did this because of the "performance gains". It ended up bricking the project due to inability to iterate.

sergiosgc · on July 22, 2015

> The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three.

Your argument has a non sequitur right here. A mess of [foo] is a mess; the layer it is in does not matter; the language it is in does not matter. A mess of application layer code is equally effective in preventing scale, security and effectiveness.

The original post is right. Web developers treat their databases poorly[1]. A database is an interface to your data that maintains integrity. Maintaining integrity almost always means stored procedures, as some validation is not expressible as relational integrity and basic type validation.

Now, if you are at the point where your database fully guarantees integrity of data going in and coming out, a REST interface is a small step away. This project is very welcome.

[1] The typical web developer treats a database as a data store. It is also a data store, but a well designed database is much more than than.

_ondq · on July 22, 2015

A mess is a mess, true, but some are easier to clean up than others.

GP is correct. Methods for scaling/optimizing the application layer are clear and well-known. Scaling the data layer is a huge challenge. This is why the market is filled with snake oil databases promising linear scalability and perfect consistency/reliability, etc.

threeseed · on July 22, 2015

Scaling the data layer is a huge challenge. No doubt. But calling databases that are designed for solving these problems "snake oil" undermines the huge amount of work that serious engineers have invested in this. No one has ever promised linear scalability and perfect consistency/reliability. No one.

Cassandra, HBase, CouchDB etc even MongoDB have built in scalability as a first order priority from day one and have been largely successful at it e.g. iCloud, EA Online, PSN. Databases like this are a nightmare to work with for smaller datasets but work incredibly well with larger ones.

It's always a shame to see HN act like you scale vertically and magically every problem is solved.

AlisdairO · on July 22, 2015

> It's always a shame to see HN act like you scale vertically and magically every problem is solved.

When this is seen (and IME it's a pretty minority opinion) I think it's there as a reaction to the massive overuse and hype regarding a lot of newer-gen DBs. There's absolutely no doubt that there are good uses for them, but those cases are pretty niche compared to the level of their uptake.

volaski · on July 22, 2015

You should read "innovator's dilemma"

AlisdairO · on July 22, 2015

Is that comment intended to imply that companies will go under if they fail to deploy new technology that doesn't target their business needs?

fweespeech · on July 22, 2015

MongoDB heavily implied the linear scalability, consistency, reliability bit in its early material [particularly in their marketing].

Its only in the past couple years they really started mentioning the fact it was "tunable consistency" blatantly rather rather than burying it in a couple places in the manual.

3pt14159 · on July 22, 2015

I purposely chose a non sequitur in the interests of speeding up the prose. It is an acceptable method frequently utilized in language. It roughly translates to:

"Than (what will certainly be given the expressiveness and level of abstraction they provide) a mess of stored procedures."

tomchristie · on July 22, 2015

"The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three."

Perfectly expressed.

e12e · on July 22, 2015

I'm always a little confused that people seem desperate to use the wrong tool, and then blame the tool. If you need to store normalized data and maintain integrety -- you'll end up with the equivalent of an SQL datastore (or, more likely a system that is faster, but subtly broken).

Sure, it's difficult to scale ACID. But if what you need is a way to serialize objects, you'll probably be better off with something like Gemstone/GLASS, a document store or some other kind of object database?

If your problem domain actually fits working with structured data, then using an SQL system makes a lot of sense. The obvious example for "web scale" here is Stackoverflow. Sure their architecture has grown a little since it was 2xIIS+2xSQL Server -- but they got pretty far on just that.

brightball · on July 22, 2015

The bigger issue is this idea that everything needs to live in one place. For the bulk of an application handling a billion requests / day I'd wager that most of that traffic is isolated to certain types of data.

I'd wager that because in almost every case I've ever seen it's true. You just don't tend to see every table in a normalized dataset bearing the traffic load.

If that is the case, rolling that particular piece of data out to a more easily scalable store will largely fix the problem, if caching, async writes and buffered writes didn't already.

Everything else can very easily sit in PostgreSQL, avoid race conditions, maintain data integrity, have permissions controlled and be accessed from multiple languages directly without requiring an API layer. Then you can use a foreign data wrapper to let PG query that other data source (mongo, couchbase, redis, whatever) and join the results with the other data in the database just like it's all one bit happy dataset.

As another poster said, a mess is a mess and honestly I don't know why he takes a shot at Rails since Rails has some of the best first class support for leveraging PostgreSQL features these days.

Wrote an entire post about it: http://www.brightball.com/ruby-postgresql/rails-gems-to-unlo...

batou · on July 22, 2015

We are exactly there. We're having to remove vast swathes of stored procedures and rewrite everything.

snarfy · on July 22, 2015

> The problem is scaling your database

There is only one database for everything in the business? Of course it doesn't scale. The problem you describe stems from solving every business request by adding yet another table to 'the' database.

It's a monolithic solution. It doesn't matter if you use database features or not. There is no difference between a mess of stored procedures and a mess of business logic classes. It's still a mess.

_yy · on July 22, 2015

Web servers usually scale better than (traditional) databases, so it makes sense to not offload computation to the database, even if it means that there's an overhead.

brightball · on July 22, 2015

That's very situational. Read scaling a database is easy. Write scaling a database is harder and doing computational logic while write scaling a database is harder still. Computational is still a very broad word though and the intensity of those computations is a huge defining factor.

The problem boils down to the "the database" idea described earlier. There are very, very few normalized datasets that I've ever seen that have write scaling concerns on more than 1 or two tables.

Move those to a separate datastore that is built for it and you've largely solved your problem. Postgres can even connect to outside datastores to run queries against them for sake of reporting.

scottschulthess · on July 22, 2015

Web server codebases are typically also way easier to modify, unit test, with better tools and languages.

qaqy · on July 22, 2015

there is even pl\brainfuck so as far as choice of langs PG has you covered

rapind · on July 22, 2015

Once you get rid of your N+1s the bottleneck in my experience (working with Rails now since v1.2) has always be Rails / Ruby itself. It is so incredibly slow, even using just Metal (even Sinatra for that matter). The slowdown at the view level is significant.

I always have a caching strategy (usually varnish in front of nginx) with Rails unless it's literally only supporting a handful of users, and anytime I need to support non-cacheable hits like writes to more than 50 or so concurrents I consider swapping in Node or Go or something reasonably performant to handle just those writes.

Lately I've been looking into Elixir as a Rails alternative for APIs for performance and scalability. I am very intrigued by a PostgreSQL based REST API.

3pt14159 · on July 22, 2015

The point is that when Rails gets too slow it is very easy to switch to something like cacheing or C (Or Go, or whatever). Even if you just split it off at nginx or use a worker pool in a faster language. Or if you need lots of concurrency use Go. Or even replace the Ruby code with one fairly nasty SQL statement or a single stored procedure.

The other 95% of your code can be slow Rails. You know those pages where a user adds another email address, or where they report a comment as being hateful, or where they select what language they want the app to be in, or where you have your teams page, or your business partners page, or your API docs and key registration / invalidation.

The database doesn't scale without pain though. You have joins, you're going to need to get rid of them. You have one table on a machine, you're going to need to split it. You have speedy reliable writes, you are going to have to either make due with inconsistency and possibly have a whole strategy to clean up the data after the fact or lose the speediness.

I'm intrigued about shuffling the serialization of JSON to Postgres, but that is different than what the OP was talking about.

brightball · on July 22, 2015

By the same logic though, at the point that heavy write load becomes a reality it's just as feasible to move the heavy write table to an isolated datastore and leave 95% of your data (structurally) in the PG. Even use a PG foreign data wrapper to connect to that new datastore to allow PG to continue any necessary queries against it.

I'm not ever going to argue for heavy stored procedure usage but there are definitely times when it makes sense and more times still when using the features in your database instead of setting up multiple different standalone systems for pubsub, search, json data, etc when your database can do it all makes sense.

It's very similar to the "you can always switch the slow parts" point with Rails to move a part to Go. You can do it all in PostgreSQL and then when you actually reach a point where you've grown it into a bottleneck, move it out.

Postgres isn't SQL Server and it isn't Oracle and it isn't MySQL. It's Postgres. It's a tool that you choose because of it fits your needs, not because somebody told you it was a good database. You choose it as part of your stack. If you are using PostgreSQL because you wanted a dumb datastore then you chose the wrong database and should probably reavaluate your options. That's like getting a Lamborghini to make grocery runs.

http://www.brightball.com/postgresql/why-should-you-learn-po...

vkjv · on July 22, 2015

I am a postgresql novice, but I've used the JSON serialization and it is indeed fast. But, here's my question:

When you do a 1-to-many join and return the same fields very many times, do the binary drivers optimize that or is it return many times? With JSON serialization (or serializing to arrays), you only get the one row.

wslh · on July 22, 2015

If Facebook uses MySQL and PHP there is some truth in the comment.

jeletonskelly · on July 22, 2015

To say that Facebook uses PHP and MySQL is to leave out the truth, honestly. They are a part of the stack, yes, but they aren't what makes the application scale to billions of requests. It would be like saying the local coffee shops website using Wordpress with a MySQL backend is using the same tech as Facebook. It's laughable.

wslh · on July 22, 2015

They choose MySQL vs a lot of other alternatives for some reason and this reasoning can be applied to your use case.

> They are a part of the stack, yes, but they aren't what makes the application scale to billions of requests.

These are not just part of the stack, these are critical components within the stack.

threeseed · on July 22, 2015

To say that PHP/MySQL is just a "part of Facebook's stack" is laughable.

They are the core components of Facebook. Normal people understand that the characteristics of Facebook's architecture is unique to just Facebook. They can get away with sharding/colocating data that nobody else can. The rest of us have a tonne of integrated data that requires complex joins (whether at the application or database layer).

spacemanmatt · on July 22, 2015

They are edge components of Facebook. Just from a brief interaction with FB recruiters, I learned they use a lot of Vertica in their back-office. Please don't propose that they are using MySQL for their main business when it's only powering app nodes which are just POPs fed by their real (internal) services. Approximately speaking.

benkant · on July 22, 2015

I never mentioned performance or scaling as reasons for using a database's features- though they might be worth considering. The fact I never said those words and it fired you up says more about your experience than mine I imagine.

batou · on July 22, 2015

That is quite frankly horse poop.

We use stored procedures. Not by choice; this is a legacy we're stuck with. It is nothing but an unadulterated disaster of a technology regardless of what you use it for. I'm talking 45,000 stored procedures here. 2000 tables. TiB of data. 50,000 requests/second across SOAP/web/desktop etc. It's hell.

Problems with stored procedures:

1) Performance. More code running in the hard to scale-out black box. You're just hanging yourself with hardware and/or license costs in the long run.

2) Maintenance. All database objects are stateful i.e. they have to be loaded to work. The sheer complexity of managing that on top of the table state results in massive cost. Add to that tooling, version control costs as well. Have you tried merging a stored procedure in a language which has no compiler and very loose verification?

3) Orthoganality. Nothing inside the relational model matches the logical concepts in your application. Think of transactions, domain modelling etc.

4) Duplication. You still have to write something in the application to talk to every single one of those stored procedures and map it to and from parameters and back to collections.

5) Transaction scoping. Do you know how expensive it is to introduce distributed transactions? Well it's a ton more cash when EVERYTHING is inside that black box.

6) Lock in. Your stored procedures aren't portable. Good luck trying to shift vendor in the future when you hit a brick wall.

Now I know it's popular to bash on Rails and I wouldn't use it personally but there are people using the same model on top of other platforms, like us.

Sorry but databases are just a hole to put your shit in when you want it out of memory. If you start investing too much in all of the specific functionality you're hanging yourself.

kitd · on July 22, 2015

Sorry but databases are just a hole to put your shit in when you want it out of memory.

You've got to be joking, right?

Data is an enterprise's single biggest asset. A robust, consistent and performant store is vital. SPs can be written as garbage like any other logic, but in the right hands they are a perfectly valid tool for providing useful access to complex data.

batou · on July 22, 2015

No that's about it. Granted they give you efficient paths to get it back again and store it on the right shelves but that's about it.

The key benefit is probably familiarity.

I'm not saying they're an invalid tool, merely just a single tool in a batbelt of a million solutions.

LunaSea · on July 22, 2015

It is but NO database has the chops to help you handle the business logic.

batou · on July 22, 2015

Despite many attempts by people to prove otherwise. Literally every "enterprise" product I've seen tried to do this and fucked it up royally.

arethuza · on July 22, 2015

Interestingly some very enterprise products don't even try and use database features like foreign key relationships - it can be a bit of a shock to open a database with many thousands of tables and realise that there is no obvious way to work out how they relate without looking at application level structures.

batou · on July 22, 2015

Yes that's true as well. JIRA does that in some configurations. Does my head in.

kitd · on July 22, 2015

I'd definitely give you that. SPs are the last place you want business logic.

benkant · on July 22, 2015

I'm sorry to hear your anecdote. I would have to come see your particular situation to see exactly what you mean by 1, 2, 5, because those don't seem unsolvable, but in general:

> 3) Orthoganality

You've introduced that by treating a relational store as a "hole to put your shit in". It's not fair to blame the database for that.

> 4) Duplication

Not with the project that is the topic of this thread you don't.

> 6) Lock in

As I mentioned in my original comment, you can be locked to your database or you can be locked to your ORM/DAO/ActiveRecord/DB client library or whatever it is you're using.

Not using database features isn't the key to heaven anymore than using them is the key to hell. I just meant to point out that in my experience they are underused massively.

batou · on July 22, 2015

Certainly not an anedote. I was an Oracle and SQL Server DBA for a number of years amongst other hats on stupid big datasets and loads. Add to that 25 years' experience getting companies out of deep trouble that everyone else has given up on. I know my shit.

Orthogonality: I haven't introduced anything here. Very rarely does any conceptual model of reality fit into the relational model. It's more imperative than that. Everything is usually crudely shoehorned into it because it's a compromise that people are barely willing to make or because they don't understand how to model a system properly.

Duplication: there is duplication in there. The versioning is very inadequate and API stability is the key to success on this. Plus also, this is a minor part of the application to consider. It's no different to issuing SQL. The protocol is different, that is all.

Lock in: There is no ORM lock-in past the platform. If you isolate everything properly i.e. use command-query-separation then this is a non issue. It's trivial to replace the ORM. You can even do it piecemeal. We've done it. I yanked out bastardisd ADO and EF out and stuck Nhibernate in. If you couple all your logic into the database, no banana. That luxury goes out of the window.

They may be underused, but when your vendor pulls a 26% price hike on half a million quid's worth of kit, can you afford to bend over and take it?

It's a tradeoff, but not one I'm willing to make on medium to large scale systems where there is a capital risk.

mixmastamyk · on July 22, 2015

Interesting, though as this is a way to interface with Postgres I'm not sure your warnings about licensing costs are useful. Still, there are the other reasons you mentioned.

batou · on July 22, 2015

Some organisations won't allow use of a product unless there's a support option. I can understand this after a complete system failure a few years back on an open source unsupported product. I was hired to fix it! :) Mostly though in comes EnterpriseDB and support costs then. Problem is always staff availability here in the UK though so we always end up with SQL Server and Orscle bodies.

threeseed · on July 22, 2015

How can you compare being locked into a DB versus locked into an ORM ?

One of the main features of an ORM has been abstraction from the intrinsic properties of that database. ORM was a concept that was popularised by the original Obj-C/EOF/WebObjects back in the day which supported retrieving data from any database you pointed it at. And it fully supported you enhancing it's access layer with database specific features.

brightball · on July 22, 2015

This is a much longer topic but it boils down to 2 things:

1. Switching your database is not easy with or without stored procedures because it will involve down time for the application while the data is migrated, then verifying that it works as expected in the new database with that ORM. You hope for the best, but it's always more complicated to switch a database.

2. The ORM tends to lock you into the application stack. Switching a part of your application from something like Rails to Go when you need to performance tune is significantly easier and more common than switching the entire database backing the whole system.

Beyond those two are the harsh realities of working with large datasets. As soon as a dataset it non-trivially small relying on the application to do core work on it becomes self destructive by adding network latency and in many cases object creation (check some Rails benchmarks on object creation costs). It becomes a big deal.

This is not to say that doing the bulk of work in the ORM is bad or that everything should be done in the database, it's a matter of balance. The only dangerous opinions on the matter are the "purist to the detriment of all else."

Verifying uniqueness, exclusion and maintaining data integrity should be the job of the database in most cases. That is what it's good at. Performing actual business logic on that data should not unless there is a significant performance based reason for it in most cases.

In Postgres the "stored procedure" thing is a little bit different because they're significantly more valuable thanks to the volume of functionality built into PG. Everything is basically a function in PG.

In PG, you can use functions to create indexes and when the function is used in a where clause that index will be used. You can use functions to create constraints, unique indexes and even notify outside process that are listening of changes in the database with pubsub.

PG is a heck of a lot more than just a "datastore" and that's why these discussions are important. If you want a generic dumb datastore...there are databases built for that. PG is built for a whole lot more than that.

Here's a very incomplete summary: http://www.brightball.com/postgresql/why-should-you-learn-po...

LGBT_2000 · on July 22, 2015

1) Makes no sense whatsoever; stored procedures are typically going to be much faster than the equivalent mix of application-level code written in a scripting language that needs to communicate with a database and is likely vastly slower than PL/pgSQL or PL/SQL. The hardware licensing costs having nothing to do with stored procedures and apply just as much to anything else (there's free as an beer options and costly proprietary ones in either case).

2) This is not a reasonable objection, I could replace "loaded" with "compiled" and your non-argument would make just as much sense. The alternative does not make the "complexity" go away, it just distributes across multiple languages in your application and database.

3) No.

4) Another non-argument against stored procedures. For example, suppose I have a table "time_series(series_id INT, tstamp TIMESTAMP, val NUMERIC)". A common need would be to accumulate all points (tstamp, val) associated with a series_id. Following your logic, you either end up with tons of con the application side sending similar variations of a query that looks like "SELECT tstamp, val FROM time_series WHERE series_id = $x ORDER BY tstamp" or you create one application-level module that acts as an abstraction around a query like that. In the first case, you're doing massive duplication. In the second case, you've essentially made a stored procedure that is distributed across your database and application and all the issues you raised of having to write something to talk to it apply just the same.

5) Again, no. DBs are precisely to place to deal with issues like this as they have means for dealing with things like foreign tables. The application-level alternative just means re-inventing it all yourself and you're probably going to make a lot more mistakes and write a lot more code that way.

6) Non-argument (applies just as well to Rails, Python, Linux, etc.)

spdionis · on July 22, 2015

You had me until your last sentence. Database are very important to (and very good at) store your data. And data is important (duh). All the issues you described above are related to working with the data, which should not happen in the database.

batou · on July 22, 2015

That's a poor argument because we're not discussing the arbitraty boundary of storing versus working with data as the lines between those are very blurred. This is even more the case when you use stored procedures which work with data close to the storage.

While we're on this subject, RDBMS are no better at storing data than any other technology out there[1]. In fact when you start thinking abstractly like this, other tech such as Riak makes sense for a lot of workloads.

The only real benefits of RDBMS' are fixed schema, fungibility of staff, the ability to issue completely random queries and get a result in a reasonable amount of time and the proliferation of ORMs.

[1] Caveated on insane design decisions like MyISAM storage engine and MongoDB as a whole.

AlisdairO · on July 22, 2015

Riak makes sense for some workloads. For the vast majority of workloads out there, you will save time and money with an ACID system. Good ACID systems are mostly RDBMSs, so I would say, for now at least, that typical RDBMSs do have a leg up on other technologies out there.

batou · on July 22, 2015

Definitely. Which is why we still use one. It's cheap, already there and does the job.

jakub_h · on July 22, 2015

It seems to me that least in a part of you argument, you seem to be confusing the deficiencies of particular implementations or server-run code with the idea itself. Stored procedures are bad because your RDBMS has a crappy compiler? Hmm...

(One might argue, of course, that SQL in itself turned out to be a lousy interface protocol for relational data processing, and that it caused a lot of pain to begin with. But that's a different topic.)

batou · on July 22, 2015

I'm actually arguing that my RDBMS is no place for a compiler. I want to compile everything externally, test it and then deploy it. Not deploy it, compile it, cross fingers.

abrookewood · on July 22, 2015

Some counter-points I've heard made: 1) Performance. Stored procedures are fast, meaning it will be longer before you need to scale out. 2) Security. If you only use stored procs, you're a lot less exposed to SQL injections etc.

I don't really have a firm opinion either way, but it's not as clear cut as you are making out.

batou · on July 22, 2015

Counter-counter-points:

1) Stored procedures aren't that much faster than issuing plain SQL over the connection. The main performance bottlenecks in a RDBMS are cache and IOPS. Regardless of where you execute those, they are all inside that black box after the query is parsed. You also get the added pain of cached query plans which fuck up performance when the query optimiser makes assumptions about table statistics and stores them with a stored procedure. (SQL Server and Oracle at least are guilty of this).

2) The only place I've had SQL injection attacks in the last few years is where people have used dynamic SQL inside stored procedures (sp_executesql) and not escaped them properly. Outside of that, both ORM and proper parameter binding make this entirely irrelevant.

It's completely clear cut IMHO.

untog · on July 22, 2015

Performance. Stored procedures are fast, meaning it will be longer before you need to scale out.

That was once the case, but every major DB now caches execution plans for commonly run queries.

FLUX-YOU · on July 22, 2015

>If you only use stored procs, you're a lot less exposed to SQL injections etc.

How does that help you vs. prepared statements in any typical language?

I've seen SQL statements in SPs that are concatenated (|| in oracle) to varchar fields from a table and I thought that would be just as vulnerable?

adamors · on July 22, 2015

For one, stored procedures are hard to test, debug, maintain and check into source control.

But don't let that get in your way of ignorantly generalising about web developers.

glogla · on July 22, 2015

Stored procedures may be all of those things, but they don't have to - it's just that most of the time, developers don't really care, so they have fancy versioning, deployment and continuous integration for all their code, except for stored procedures.

Here is interesting talk about database migrations and stored procedures and unit tests: http://www.pgcon.org/2013/schedule/events/615.en.html

Also, DB procedures are not easy to "debug" in the traditional way, but SQL client is basically the first REPL every programmer becomes familiar with. You can easily step stored procedure by running it's commands one by one, unless it's fancy Oracle forall loop with cursor or something (and the cursor select can still be selected as normal).

Also, databases tend to have more strong data types than programming languages in general so putting constraints in DB means, that bad data are not savable in the system.

jeletonskelly · on July 22, 2015

> so they have fancy versioning, deployment and continuous integration for all their code, except for stored procedures.

Exactly, because those things are extremely important to how code gets shipped and delivers value to the business. Stored procedures become a huge risk to future development which ultimately means it's a risk to the businesses ability to deliver value. What happens when you need to change DB vendors because the business has been so successful that you've outgrown a relational database? You have to rewrite the ENTIRE MC portion of your MVC application. Why would someone ever do this?

batou · on July 22, 2015

Hmm no.

Avoid state at all costs. Stored procedures are stateful. Schema and migrations is pain enough already.

Write me a check constraint that validates an email address being put in a varchar column and reports back a sensible message which can be bound to an entry field with metadata about the error.

Write me a constraint and key arrangement which is unique across two and three columns in separate groups.

No. You're wrong.

nunwuo · on July 22, 2015

> Avoid state at all costs. Stored procedures are stateful. Schema and migrations is pain enough already.

What do you mean by that? How is having a bunch of queries in a stored procedure more "stateful" than having the same queries in the application?

> Write me a check constraint that validates an email address being put in a varchar column and reports back a sensible message which can be bound to an entry field with metadata about the error.

Postgres gives you metadata about the error, though the error message will still be a generic "CHECK constraint violated" or some such.

> Write me a constraint and key arrangement which is unique across two and three columns in separate groups.

I'm not sure what you want to see based on that description, but surely you're not advocating enforcing unique constraints in the application?

batou · on July 22, 2015

Stateful:

If I have to load the stored procedure into the persistence engine then that step is required. This is no more stateful than queries in the application but it means that the relevant state in both the application and the database engine needs to be reloaded and constantly sychronised. Ergo, two times the work.

CHECK constraint violated is no good for humans. Prevention is better than cure here.

Why shouldn't I enforce unique constraints in the application?

1. Open a transaction

2. Get a user by name from the ORM.

3. Exists? Tell user that the username is already registered.

4. Doesn't exist? Save new User instance.

5. Commit transaction.

Steps 2 and 3 can be as arbitrarily complicated as you need them to be, are fully testable and cheap with anything that uses MVCC.

brightball · on July 22, 2015

"Why shouldn't I enforce unique constraints in the application?"

You should to both. For all the reasons you mention, it's often cleaner to just do it in the application especially when you can use a framework with a simple "validate_uniqueness" flag.

But, what you're describing is also the very definition of a race condition. It's the same reason you don't increment counters by retrieving them, adding 1 to it and then saving the number back to the database and instead pass in an increment command.

Check it in the application but let the database make sure it doesn't get violated in a race condition. There's a significant amount of either/or in this entire conversation (not just you, the whole thread) when the database absolutely can and should be leveraged for certain things.

It's extremism and purism where the problems get introduced (in both directions).

merb · on July 22, 2015

I do most what you say and still think both of you are right. I mean stored procedures have some use cases but i've seen people using it EVERYWHERE and I've seen people (including myself) NEVER use it.

I mean currently my dataset is so small I don't need stored procedures, I barely do anything more than CRUD. Okay I have a bigger GROUP BY query but that is all, and at one point I load a HUGE dataset into my application memory (1000 rows) but that works REALLY REALLY fast in scala and I tried to create a stored procedure around it, but I failed, and the application code uses the dataset to generate a big calculation. Currently I just have a Map<String, Map<String, List<Row>> which is easy accessible and usable for my calculation. I mean I could've done similar with stored procedures but the performance gains are really low.

brightball · on July 22, 2015

For what you're describing it doesn't sound like stored procs are worth it. Avoid introducing them unless you find that they are necessary or beneficial, but don't avoid them entirely on principle.

Preserving data integrity tends to be a much more worthy use case for database logic than retrieval display.

giaour · on July 22, 2015

> Why shouldn't I enforce unique constraints in the application?

This tightly couples your database to your application. You can no longer guarantee that your database is reliable when used otherwise.

rapala · on July 22, 2015

Or you could do:

1. Insert new User instance 2. Unique constraint violation? Tell user that the username is already registered.

lpsmith · on July 29, 2015

There is a good chance that your proposed algorithm to enforce a uniqueness constraint in the application won't work. As in, you've left out enough details that would be critical for getting it right, and in my experience, a lot of programmers would only get this right by accident if they get it right at all.

First problem is that the SQL standard provides no way to make this work portably on any standards-compliant database. So right there you are going to have to code to the database to one degree or another.

So, let's say you want to make this work in Postgres. Now, you'll need to be using Postgres 9.0 at least; otherwise your uniqueness constraint won't be a uniqueness constraint.

Try this, in any version of Postgres. Open up two psql sessions. In one, run a `create table unique (x text);`. Then run `begin isolation level repeatable read; select * from unique where x = 'foo';` in one of the sessions. Repeat those two commands in the other sessions.

Neither session sees 'foo'. So now both can go ahead and run `insert into unique values ('foo'); commit;`. Both transactions will succeed, and you can confirm that there are now two instances of 'foo' in the table.

In fact, `begin isolation level serializable` in PostgreSQL 9.0 or later is the minimum isolation level to make this work. And, you will need retry logic around the transaction in case of a serialization failure. (Perhaps your DB access layer or language would hide this latter detail from you, or perhaps not.)

In PostgreSQL 8.4 and before, serializable and repeatable read were equivalent, and both were still SQL standards compliant. In PostgreSQL 9.0, the repeatable read isolation level stayed the same, while the serializable isolation level was strengthened.

Unless you can accept a certain level of degraded accuracy by using a probabilistic construct such as a Bloom filter, by far the biggest cost of maintaining uniqueness is the index. And you'll need that index whether you use the database or the application to enforce uniqueness.

And, judiciously pushing computation onto a database can actually be cheaper for the database as well as its clients. This scenario is likely to be one of those situations.

dragonwriter · on July 22, 2015

> CHECK constraint violated is no good for humans.

Well, sure, an application should respond to DB errors by presenting appropriate messages on the UI, just like any other errors it encounters. You should only see "CHECK constraint violated" if you are bypassing the app and using the DB. Otherwise, you should see something nice provided by the app.

> Why shouldn't I enforce unique constraints in the application?

Because you should do it in the database whether or not you do it in the application, and then once you have, well, DRY.

herge · on July 22, 2015

Not enforcing as the final line, but reporting the error back to the user in a way that can be handled/translated/etc.

dragonwriter · on July 22, 2015

> Avoid state at all costs. Stored procedures are stateful.

Stored procedures are no more state than application code is.

> Write me a constraint and key arrangement which is unique across two and three columns in separate groups.

What does "unique across two and three columns in separate groups" mean? I get that its something more complex than a simple multicolumn uniqueness constraint, but not what it is supposed to do.

I suspect that whatever it is can be done with PostgreSQL -- possibly using the (relatively) new exclusion constraints -- but I can't quite be sure without more clarity on what you mean.

bni · on July 22, 2015

"hard to test, debug, maintain and check into source control."

Why? I have never had problem with any of these. SPs is just imperative code like any other imperative code.

adamors · on July 22, 2015

Well, you cannot isolate the code, you cannot unit test it, you cannot use a debugger, cannot set breakpoint, have stack traces etc. You are tied to the database at all times.

In my experience, every stored procedure that is larger than 2-3 lines is a headache.

bni · on July 22, 2015

All of those are no problem with the right tools. Any database IDE can debug SPs. You can unit test SPs like any other code just use a testrunner.

glogla · on July 22, 2015

Yeah. I suspect that lot of hate of logic in DB is because of bad Oracle setups many years ago, kind of like lot of people think SQL is useless because MySQL is.

arethuza · on July 22, 2015

I'm not a huge fan of stored procedures - but I'm pretty sure you can debug stored procedures in SQL Server pretty easily from Visual Studio - I think you can "step into" the call to SPs while debugging client code.

jasonlotito · on July 22, 2015

I tell that to our DBAs who test, debug, maintain, and check into source control our SPs.

ilitirit · on July 22, 2015

I find it strange when people adhere to these extremely dogmatic ideas about stored procedures. They appear to be either on one end of the scale or the other, ie. they either put all their logic in to Stored Procedures, or refuse to use them at all.

Of course, the reality is that people who use them reasonably do exist and are probably in the majority. You just rarely hear them talk about it because I suspect that they hold the same views about Stored Procedures, Constraints and any other DBMS feature as they do with any other software development tool ie. Use the right one for the job.

glogla · on July 22, 2015

Yeah. Lot of the time, for a CRUD app, the who layer between rendering and data storage doesn't really do much besides validations, and those can be in database, so whole middle layer can be unnecessary.

Sometimes, the logic has to be in database, because it is single point of truth and because many application servers are hard to synchronize with regards to "you get max 3 attempts at login" or "you have to have enough balance to do bank transfer".

Sometimes, lot of your logic is in database, because database can do a lot of things in really fast and practical way, like aggregation and reporting and various data exploration tasks.

Sometimes, the database is just dumb store of object oriented data.

It depends on the application.

kijin · on July 22, 2015

> Why people in the web world don't use stored procedures and constraints is a mystery to me.

You can blame MySQL 4.1 for that :(

Most people who call themselves "web developers" haven't even heard of PostgreSQL, or even if they've heard of it, have no use for it because their usual clients are stuck with MySQL-only web hosts who have only just managed to upgrade to PHP 5.3.

ozzie80 · on July 22, 2015

Most people, really?

chx · on July 22, 2015

Yes. HN is a bubble. There are ~700 PHP questions on SO a day and ~150 node.js. This is just one pair of numbers, you can mine your own whatever you like but you'll realize there are massive amounts of "web developers" with a ... low amount knowledge.

adamors · on July 22, 2015

Thinking most web developers are still shipping PHP 5.3 apps on shared hosts is also a very outdated view.

greghinch · on July 22, 2015

Globally, it's not.

kijin · on July 22, 2015

The latest version of WordPress is still compatible with PHP 5.2.4 and above, so anyone who builds a WordPress site is effectively shipping a PHP 5.2 app.

spdionis · on July 22, 2015

They do recommend php 5.4 though, and afaik they do try to push the community to upgrading.

kijin · on July 22, 2015

It still means they can't depend on any feature introduced in PHP 5.3 or later. No closures, no namespaces.

Ditto for any theme or plugin that tries to be compatible with all versions of PHP that WordPress itself supports.

_lce0 · on July 22, 2015

While I do agree with you, I want to make a distinction between shipping and building an application.

IMHO the term "developer" should not be applied to those that can just ship but rather those who can also build.

It doesn't matter if they are web, desktop, nor low-systems developer actually

Most auto-called web developers are just "web masters"

kijin · on July 22, 2015

Now you're just redefining "developer".

No True Scotsman puts sugar on his porridge, and No True Developer just installs WordPress.

On the other hand, even Rails and Django encourage you to use the ORM whenever possible, so even a "developer" who builds apps on a modern framework is unlikely to be familiar with advanced SQL features.

ectoplasm · on July 23, 2015

For me, if you had to write even a single line of a Turing complete language to a file (so shell scripts yes, but one-time shell commands no) to install WP, that would count as development. Otherwise, it's just installation. Note: I have never installed WP.

Do people really consider ./configure && make && make install and its equivalents to be development now?

_lce0 · on July 23, 2015

Not really. Most web developers are scare of using the cli.

Last time I checked the WP install process was something like...

1. download zip, extract

2. change your some config file

3. upload whole folder using FTP

4. go to /install or something, and from there..

5. click, click, click, edit text, click, click, click ...

That was only if the web developer was in hardcore mode, otherwise it was just _log into cpanel to use one-click installer_

mark_sz · on July 22, 2015

I'm guessing you are not developer, because that kind of comment wouldn't come from someone who's thinking logically.

We don't need another flame war here.

And yes, most of the web is build on Wordpress/PHP - but you don't need to be a developer to install Wordpress.

kijin · on July 22, 2015

What does logic have to do with it?

I'm just stating what I believe to be a fact: that the majority of web developers in this world never think of PostgreSQL as an option. I don't care whether that's a logical thing for them to think. It's just a fact, whether I like it or not.

If you think I'm wrong about the facts, please feel free to open a phone book in any part of the world other than the Bay Area, call up a decent sample of people who self-identify as web developers, and find out what percentage of them have ever heard of, let alone used, PostgreSQL.

GrinningFool · on July 22, 2015

So you start off ok here:

    I'm just stating what I believe to be a fact...

But then you move on to say:

    It's just a fact, whether I like it or not.

So which is it? Do you believe it to be a fact, or is it a fact? And if it is, where's your evidence?

(I happen to agree with your opinion, but the semantics here bug me.)

kijin · on July 23, 2015

Sorry for the loose use of language. Everything that follows the colon after "I believe to be a fact", until the end of that paragraph, is the content of what I believe, including the statememt "It's just a fact." I believe that it's a fact.

Anecdotal evidence: I've interacted with dozens of other people who call themselves web developers over the years, and most of them (outside of Silicon Valley) have never used PostgreSQL, nor any advanced features of SQL in any other RDBMS.

Objective evidence: the large market share of WordPress, Drupal, and other content management systems that don't use any advanced database features; as well as the large market share of frameworks such as Rails, Django, and Laravel that encourage developers to stick with the ORM and not care about advanced database features.

GrinningFool · on July 24, 2015

Thanks for clarifying. It does seem all too common to pretend the database is a black box (via ORM) in the most popular frameworks.

framp · on July 22, 2015

I don't think using stored procedures should be the focus of the project.

PostgREST is great because it lets you kickstart a CRUD application with ease.

I'm mainly a node.js developer nowadays and I'm using some frameworks to kickstart APIs for my clients - and then I jump in and add features.

What I really want is a solution to build a API server which deals with authentication, exposing my models through REST and other boring and repetitive stuff. In this way I don't have to focus on everything, but just on the specific problem I'm solving.

I don't think there is a valid solution out there right now.

That's why I'm contributing to PostgREST and I hope to see even more features coming out of it (eg: better authentication, maybe with 3rd party logins).

kaikokaur · on July 22, 2015

You may want to see http://www.zazler.com/ or https://www.npmjs.com/package/zazler

framp · on July 22, 2015

I'm using hapi + sequelize or loopback actually. Zazler looks interesting, thanks for the link!

psaintla · on July 22, 2015

Around 2005 I worked for a fairly large company that did exactly what you're suggesting with Postgresql and it was a complete disaster. Have you ever tried implementing sharding with all of your business logic in stored procedures? Have you ever tried hiring people who understand pl/sql and WANT to work with it? I have done both and it is a nightmare, once you get to the point of having to shard data you end up in one of two places:

1.) The sprocs become insanely complex because they have to be shard aware.

2.) You slowly start moving more of your code that was in sprocs to your application so now you've got two problems.

As for hiring, put out an ad for an engineer with pl/sql knowledge better yet put out an ad for someone who wants to learn and use pl/sql. Good luck finding enough of those people to get any significant work done.

fweespeech · on July 22, 2015

> This is good work and if I ever did web development, it would be like this. Why people in the web world don't use stored procedures and constraints is a mystery to me. That this approach is seen as novel is in itself fascinating.

Regardless of the other points people brought up...

Sharding a database with stored procedures and constraints as you advise is a nightmare because you now have a completely separate deployment process [deploying stored procedures, if you think this doesn't require a deployment process across a sharded infrastructure...I have no words].

Using an internal web framework is much, much easier than maintaining two separate deployment processes. Especially when one of those processes has to take down nodes to avoid some shards having different stored procedures than other shards.

devnonymous · on July 22, 2015

...and it's not just databases. OS capabilities (eg: vm tuning, bumping up the default sysctl limits, etc) are ignored and the problems arising of such disregard are then dealt with by adding layers to the application like distributed caches and other ^scaling^ solutions.

ecopoesis · on July 22, 2015

Because databases are a very poor fit for APIs? This is one of my biggest problems with high holy REST: it generally just means reimplementing your SQL API in HTTP semantics.

APIs should be about encapsulating business logic. Databases should be about storing data in a reliable, predictable way.

benkant · on July 22, 2015

That's what views are for.

jasonlotito · on July 22, 2015

> Why people in the web world don't use stored procedures and constraints is a mystery to me.

We do. At least some of us, and honestly, it's not something I think about as being exceptional. I don't always use them, but I much prefer having a nice API of SPs to use rather than having to have custom SQL all over the place. DRY applies to writing queries just as much.

benkant · on July 2, 2015

Consider the sequence

1, 3, 5, 7

what comes next? 9 right? Or is the sequence generated by 2n − 1 + (n − 1)(n − 2)(n − 3)(n − 4) for n ∈ N. Then we've got 33.

"among all hypotheses consistent with the observations, the simplest is the most likely"

33 is correct, but it's less likely to be the basis for the generation of the sequence.

Your answer of (x, 2x, 4x) proves the puzzle illustrates the confirmation bias, at least in your case.

Does the unit test that confirms your function returns the expected result given one set of arguments prove it correct?

benkant · on June 8, 2015

I had a rant about this on Twitter yesterday[0].

Google makes money precisely because the web is centralised. If we moved to P2P systems they could still provide an index, but I'd wager far less data would ultimately pass through them. Not to mention that if people were more in control of their data rather than LinkedIn, Facebook, Flickr, YouTube, I bet they'd be less inclined to having it indexed publically simply because they have a choice.

There's all sorts of network effects and shitty incentives at play, and it's a shame.

My twitter rant (not the whole thing even)

> though they have their place, centralised systems reinforce the role of the middle man, which is prime for rent-seeking and lopsided value

> in reality networks exist on a continuum between centralised/decentralised. The web makes it difficult to choose the correct degree per case

> both centralised and decentralised systems have trust issues, but they are different in kind not magnitude

> the incentives are wrong for innovation. Google requires centralised, so HTTP is fine. Ubiquity, so HTML is fine

> web developers have spent years becoming skilled in their corner and are incentivised to defend and perpetuate the platform

> if you think discoverability, zero-install and sandboxes are only possible on the web, I invite you to consult the literature

> we could have decentralised, secure, simple, efficient primitives, but network effects and incentives steer us away

> tech solutions are moot unless they incentivise behaviour that leads to better returns for everyone. layers on HTML/HTTP will never do that

[0] https://twitter.com/benkant

simonw · on June 8, 2015

"Google makes money precisely because the web is centralised"

I see it exactly the other way: Google makes money precisely because the web is decentralised, and hence you can create an invaluable service by crawling it and creating a centralised search index.

benkant · on June 8, 2015

That's another way to look at it, but you're using decentralised to indicate there are many large nodes- I would say that's simply distributed. I'm using centralised to indicate that communications are via those large nodes, which mostly don't communicate with each other.

It's client/server on a massive scale. Just because the servers are public, doesn't make it decentralised.

kbenson · on June 8, 2015

In what way do you think the web is centralized?

jacquesm · on June 8, 2015

That's a good question which deserves an honest answer but this comment box is really too small for that and essay sized comments are frowned upon.

For starters: navigating the web in the beginning consisted of clicking links which caused you to go from one website to another. This all worked well when (a) the web was small and (b) there were (hardly) no trash pages.

Search engines changed that, and once they got 'good enough' the link graph became a mere starting point for crawling the web rather than the way we navigated from site to site. For a little while the link graph was used as a popularity measurement but this too changed (because of the huge number of low value links).

Then we got silos. A 'silo' is a bunch of data locked up under a trade between users and large web properties. The trade is 'you give us your content and a bunch of information about yourself and we'll use that content to attract others and to sell ads'.

Examples of such silos are Google, Yahoo and Facebook.

Finally, if originally (and the internet itself) was strung together by a peer-to-peer approach it turned more and more into a division between producers and consumers, with the producers on the 'server' side and the consumers on the 'client' side.

Mobile devices accessing the net further accelerated this trend, right now the only internet (not web) applications that are still peer-to-peer are torrent applications. For the most part the division on the web is complete and hosting a web server on your very powerful cable modem or DSL line would be grounds for termination of your access.

Servers are hosted centrally and are operated by companies whereas clients are simply terminals that access the content stored on those servers.

I hope that answers your question in enough detail, you could easily write a book about this.

kbenson · on June 8, 2015

The internet is, by it's nature, peer to peer and decentralized. Cut a cable, or take out a large networks, the internet will route around it, either quickly (routes converging on a new peer) or slowly (a poorly connected network finding a new upstream to purchase connectivity through). That companies then build on top of this and implement services where they are the middle of both connections does not change this fundamentally, it just adds an optional layer. To assume our connections have upstream bandwidth that is never or rarely used is false. I would argue that we generate more content per-person than ever in history. The seer amount of pictures, videos, webcams, posts and comments is much higher than ever before. Are they hosting it directly from their connections? Usually not, but that's as much a case of being efficient and reaching an audience as it is in companies wanting control over the data. Even then, there are services which are decentralized from that, such as email. It's not efficient to host content yourself. Even the large networks use dedicated CDNs. For the end user, Facebook is a CDN.

That said, I agree there is a clear move towards our data and services being handled by fewer, and larger entities, such as Google, Yahoo, Microsoft, Apple, Amazon. But they aren't a single entity, and I don't consider that centralized. Any one of those providers could implode today, and very little of their services could not be picked up by some competitor easily. I don't consider that centralized.

jacquesm · on June 8, 2015

We call them datacenters for a reason. When I received mail in '95 or so the machine receiving it was the workstation I wrote the reply on.

Your peer-to-peer view of the internet died roughly in '98.

kbenson · on June 8, 2015

> We call them datacenters for a reason.

And there are many of them, some owned by companies that use them exclusively, some conglomerations of many different providers but owned by yet another party. How is this centralization? I still think you're just arguing that we've compartmentalized certain services to sets of companies, for the most part, but even that isn't centralization, because there are multiple distinct companies using multiple distinct networks and in many cases they are presenting multiple distinct capabilities. Not having something handled at the end point does not mean it's centralized, there's a very large middle ground here, and that's where we are currently at. I'm not sure I see any evidence that we are moving away from that towards actual centralization.

> When I received mail in '95 or so the machine receiving it was the workstation I wrote the reply on.

And many people that used POP3 continued to do so well into the 2000's. It's silly to run a mail server on your workstation. I know, I did it for years myself. You run into all sorts of stupid problems related to your workstation not being always on, badly configured backup MX servers, and other issues. We don't do it anymore not because we were forced out of it (you can still do it now), but because there are solutions that are better for most use cases, and we opt for those.

We don't all wash our own cars, or do our own plumbing, or even clean our own houses. Some people do, some people pay others to do that work. The fact they pay others doesn't mean we've moved towards centralizing those services. There isn't some national bureau of plumbing that is our only recourse when the toilet is clogged and we don't want to fix it ourselves.

jacquesm · on June 8, 2015

Ok. So you say we're not trending towards a more centralized internet because you discard all proof that that is exactly what is happening. That's fine with me but it really doesn't help to move the discussion forward.

The reasons why we are moving to a more centralized internet are what is interesting, such as - you rightly identified those - that stuff isn't always powered up and that keeping a mailserver up and running is work and so on.

But none of that changes that centralization is happening.

Multiple distinct companies != peer-to-peer internet. That's what a decentralized network infrastructure used to mean, where the 'peers' were equals.

Nowadays it means clients in one camp and servers in another, and large scale consolidation of those servers in the datawarehouses of a relatively low number of companies serving up the bulk of the data. If that trend continues it's not a bad or a good thing per-se but it would be good to stop and think about how desirable that is.

So from that point of view a lot of centralization has already happened.

Everybody running their own mailserver: could be a good thing, presuming they can be made easy to set up and easy to maintain (I don't see any technical reason why not). Ditto webhosting, why should facebook host all your content (or google, or Yahoo).

In the end, convenience won over 'peer-to-peer', there are many reasons besides convenience (firewalls, for one) but the results are here and we'll have to live with it (except for a couple of die-hard hold-outs).

kbenson · on June 8, 2015

What I've tried to make clear, and either failed in or you disagree with this as well, is that I don't think saying we are "centralizing" or moving towards a "centralized" internet is correct, largely because that implies we are approaching, or event still moving towards, the end-point of that spectrum, which is centralization, and that implies a single authority.

I think it is correct to say we are, or at least were, decentralizing, to a degree. I think it's correct to say that we are not fully decentralized, which we were close to initially, but I don't think it's entirely constructive to say we are moving in a direction that leads to a centralized internet, and what that implies (a single authority, even if for a single service). I think we are moving towards, or have arrived at, what we see in many markets. Large dominant players that the majority use, but with a large market of smaller players that provide for the niche needs. Take the automotive industry, for example.

I think we are largely arguing over semantics, which is something I don't want to do, but at the same time it's hard to be sure I'm not just reducing your arguments to the point there's no difference and ignoring important points at the same time.

> But none of that changes that centralization is happening.

I think it's cyclical, and there will be periods where we move along the spectrum back and forth, but I doubt we'll get as close to the decentralized end as we started at, but for many reasons. I don't think we'll get all that close to the decentralized end either though.

> Multiple distinct companies != peer-to-peer internet.

My argument has not been "we are decentralized", it's been "we are not centralized". To that effect, peer-to-peer is irrelevant to my argument, and I've tried to make that clear.

> Everybody running their own mailserver: could be a good thing, presuming they can be made easy to set up and easy to maintain (I don't see any technical reason why not). Ditto webhosting, why should facebook host all your content (or google, or Yahoo).

Because it's very, very inefficient. There are upsides to centralization (e.g. discoverability), just as there are downsides (e.g. homogeneity). I think the sweet spot that maximizes the upsides and minimizes the downsides is somewhere between decentralization and centralization.

sdenton4 · on June 9, 2015

I think the accurate statement of your opinion is not "the web is centralized" but rather, "Zipf's law sucks."

In decentralized networks there end up being accumulation points, and Zipf's law (which shows up in piles of different contexts, originally noticed in rank of words used in languages) gives a pretty good idea of how that accumulation plays out in basically an L-shaped curve. Point being that it might have a lot more to do with the structure of human networks and attention than with choice of wire protocols...

jokoon · on June 8, 2015

The nature of http and websites makes the web centralized: there's always a server, users don't really serve data, it's always stored somewhere.

It's true that it's decentralized, that's it's easy to create websites, but in nature, if you shut down dns servers, you shut down 99% of the internet, which inclues HTML website.

And I think that a decentralized web might be more easy to index (proof of work system, etc).

kbenson · on June 8, 2015

That's not centralized, it's just less decentralized. Centralized and decentralized or on opposite ends of the spectrum. It's possible to be less decentralized and still be very far from centralized. There are many, many different entities providing all sorts of services, so I'm not sure how that portion can be seen as centralized at all. DNS, as you not, is probably the most centralized single point that everything relies on, but they simply have authority because we give them authority. If DNS server adminitrators decided to use different root servers, there's not a lot they can do about that. But I'll concede that authoritative DNS is fairly centralized, given it requires checking with a single authority, but even then, man entities(TLDs) have a say in what that authority says (but not the ultimate say).

jokoon · on June 8, 2015

Well you're right, in nature and architecture the internet in decentralized, but the use most users make of it, is centralized.

If you look at what internet.org attempted to do, that's actually how the internet is used most of the time. For consumers and most small businesses, internet is centralized. Technically, most of the internet is just http requests, meaning that there will always be this duality of servers and clients. Without web servers and their admins, there is nothing, and that's a form of control in my opinion: you can easily shut down a website.

kbenson · on June 8, 2015

I still don't see that. A centralized internet, or event a centralized "web" as has been distinctly defined elsewhere here, implies a single authority. That doesn't exist, and I don't see it existing in the future. Which email provider do you want to use? Pick from hundreds. Which social network do you want to use? Pick from from the tens of candidates. Which blog platform do you want to use, pick from hundreds again.

> Without web servers and their admins, there is nothing, and that's a form of control in my opinion: you can easily shut down a website.

There are webservers, and admins. That hasn't changed. There's been a shift to larger sites, but there's still plenty of small ones. You sill have the options to put your site at many different locations, or use a platform such as Facebook, Blogger or Wordpress.

jokoon · on June 9, 2015

Look, here google is trying to solve the problem of government surveillance and security. Web servers are a very weak point because you can shut them down if you have the law on your side, and recently the law has been abusive. And even if you can change your DNS, the root servers are still an important part of the internet, and they're subject to control and legal issues. Control and authority makes those aspects of the internet centralized. This applies to your hundreds of mail and web providers, which are not free by the way (datacenters). Decentralized technologies are entirely free.

What I'm talking about, is protocols that make services impossible to shut down, like bittorrent or bitcoin. That's what I mean by a decentralized internet. Those technologies are different and were made especially with the goal of avoiding control, and they are exactly the solutions to breaches of privacy. Here every computer is equal, and that's a true decentralized internet, in term of hardware AND software. What I was talking about, is generalizing bitcoin and bittorrent to messaging or even hosting databases.

Such software would run on many domestic computers that want to use it and host chunks of data in a redundant manner. The issue is authenticity and signing of data. But other than that, that's where the future is.

I'm sorry but I can't trust the html/http web one bit. HTML and javascript are awful technologies, which are slow to parse, building web browsers have been a race that resulted in no interesting progress and the web2.0 has been a joke. All those techs have been the base google have been making its money on, which also makes easy to mine, so to me centralization is a privacy issue.

thomasrossi · on June 8, 2015

Also DNS servers are a pretty good example of centralized internet. Without 8.8.8.8 your browser turns clueless pretty quick:)

kbenson · on June 8, 2015

No, my browser doesn't. Google's public DNS has little bearing on how I reach sites, unless I've specifically configured it that way. Either you really don't understand how DNS works, or you are simplifying to the point of just plain being wrong.

You could argue that the root servers are too centralized, and that their control constitutes centralized DNS control, but since the only reason they have control is that all the different DNS servers use them as authorities, an argument could also be made that their control is more be convention than anything else, and all it would take is a competitor to ICANN that added some value, and eventually we could have multiple authorities. Whether that would be beneficial or detrimental is another discussion.

markbnj · on June 8, 2015

You could probably argue that Google's near-monopoly on search is a form of centralized control.

kbenson · on June 8, 2015

As much as people like to bandy that term about, I don't think of (less than) 68% of all searches as a monopoly. Two out of three people is a lot, but it's not nearly enough to force some sort of information control (whether that information is result, or other people exclaiming how much better their search engine is working).

benkant · on June 8, 2015

It's distributed, but on the continuum of centralised - decentralised it is definitely centralised. How did my comment get from my computer to yours?

kbenson · on June 8, 2015

Through a complex interrelationship of distinctly controlled networks that advertise routes and addresses and allow traffic based on complex business relationships (peering). The only case where that's not happening is where we both have the same ISP, and ycombinator happens to be hosted there as well. Running a traceroute from myself to news.ycombinator.com, I count two distinct networks not including my local one, and not including cloudfare. If those networks stopped talking to each other, my packets to hacker news would find another route, assuming my first hop had access to other networks (given time for the networks to determine a new route and my first hop had access to other peers).

benkant · on June 8, 2015

We're talking about the web as an application layer protocol. By your definition everything that happens on the internet is decentralised. That's not untrue if you look at it from the point of view of TCP/IP, but that's tangential to the conversation we're having.

You seem to be conflating the web with the internet.

kbenson · on June 8, 2015

But even by that definition, the web isn't a single application, it's many applications, some of them compartmentalized (search, social), some of them not (email), and some in between (websites/blogs). If an application were centralized, I would expect a single provider you had to use, but instead, where it at least compartmentalized, you have a group or providers. Can you name a single service/application that you expect more than 5% of people use that has only a single provider? For search, you have Google, Yahoo, Bing, and other smaller players. Google is dominant here, but still has less than 68% of the market. For social, Facebook is the dominant player, but you yourself used a different social network to communicate on this subject, and there are many other providers with popularity that ebbs and flows. It's the same with anything I can think of. I'm not sure how this is considered centralized under any definition.

benkant · on June 8, 2015

Yep, the web is a distributed system. Yep, the web offers many services, and many providers offer the same class of service.

However, each and every one of those services are centralised in a technical sense on account of HTTP. Why might an alternative be useful? Consider the solution the Google service we're addressing is putting forward cf. Content Addressable Networking systems[0]. I can't spend any more time explaining, sorry. This might help- note the levels of centralisation in each generation of P2P systems:

https://www.cs.cmu.edu/~dga/15-440/F12/lectures/p2p-approxim...

[0] http://en.wikipedia.org/wiki/Content_addressable_network

kbenson · on June 8, 2015

So, I think I'm starting to understand your argument, which is that the web is composed of many services which each is implemented relying on an underlying centralized authority, and you want that to change? If that's the case, then I understand the need, and agree with that poiint of view. But I think to say "the web" or "the internet" is centralized is very big stretch. I wouldn't call a bunch of decentralized services with little shared infrastructure and ownership "centralized".

benkant · on June 9, 2015

I'm definitely not saying the internet is centralised! Perish the thought. I never mentioned it- the discussion was to do with the web specifically.

Forget the web as a whole and consider a single service such as HN. That graph has |clients| >> |servers|. More than the cardinality the client and server nodes are different in kind.

I consider a decentralised architecture to be one where the nodes can in principle participate equally.

You are arguing that the web is decentralised because there are many services to choose from. I don't disagree, but that's above the application layer protocol- which is what I thought we were discussing. In that case decentralisation happens above the application layer. So in humans? By that definition BBS's were decentralised because I could call a different one.

In other words, yes the web is decentralised because I can choose from many Forex APIs. But at the logical application layer of HTTP, OANDA is a centralised service. HTTP addresses point to specific nodes which may or may not be individual servers at the network layer, but from the point of view of HTTP that's what you address. In a decentralised application layer protocol I would expect to that not to be the case.

That Google is proposing this service is proof that individual web services are centralised. There's a single point of failure.

We're talking at different layers. It's just semantics from here on in.

kbenson · on June 9, 2015

No, I'm not arguing the Web is decentralized, at least not as you are using the term. I'm arguing it's not centralized. That's an important distinction, which I tried to cover in a response in a different thread[1]. We wouldn't be having this conversation if you had the web needs to be more decentralized, but you stated the web is centralized. not(decentralized) != centralized. This problem was then compounded by our discussion about services, where you are referring to services as individual protocol definitions, and I'm referring to them as implemented in the wild. While a protocol definition may call for it to be implemented in a centralized (n-1 client server relationship across direct communication), I'm referring to the ecosystem which provides many, many instances of this, which adds a layer of redundancy and decentralization to the service as it exists in reality. That's not as good as a well defined decentralized protocol definition, but it is a manner of decentralization. So again I think we were arguing points that are, for the most part, correct, but using confounding terms.

I think you would have communicated your intent better if you said the web is not decentralized enough. I've been arguing the web is not centralized, you've been arguing the web is not decentralized (but by saying the web is centralized), and the problem is that both are true. The current situation is in-between those two extremes. Arguing that the web is centralized, when it isn't unless you define your scope to be so narrow as to not really encompass what most people think of when you say "web" is counter productive, when your point is a good one, and whether the web is "centralized" is irrelevant. What matters is whether there are benefits to being less/more centralized (or more/less decentralized) from the current state.

Edit: As a suggestion for how to refine your original statements so they are more accessible and understandable to those reading them, I suggest changing "the web is centralized" to "the protocols the web relies on require single centralized authority". It's more verbose, but it doesn't require cognitive leaps in just one of multiple possible directions to get what you are trying to express.

1: https://news.ycombinator.com/item?id=9682206

xg15 · on June 8, 2015

To take Google as an example: 92% market share in Europe in 2014 [1], 81% of the global market for smartphones (Android) [2] - 96% if you also add the single relevant competitor iOS. None of this is technically centralisation. (And won't ever be, as you could always "decentralize" the web by running your own personal search engine on your home box. As long as someone is using it, google doesn't have 100% market share.) However, it doesn't make much of a difference when you want to develop an app that doesn't get accepted into the iOS or Android app store.

But all if this is obviously beside the point that the OP made. Even if you don't want to develop a search engine or a phone app, you still have to tie your users to a central "cloud" service and web site so you can get discovered by google. That's a huge disincentive for p2p services.

[1] http://uk.businessinsider.com/heres-how-dominant-google-is-i... [2] http://www.idc.com/getdoc.jsp?containerId=prUS25450615

kbenson · on June 8, 2015

That's a great argument for how dominant Google is in the smartphone OS category, but that doesn't really say anything for whether the web is centralized. Even with 100% market penetration, there are people that opt to not use Google's included apps (such as Facebook and their messenger app).