Hacker News new | past | comments | ask | show | jobs | submit login

The author is using SQL in the colloquial sense where it stands in for ACID compliant Relational Database. He's not confused.



@zapher: Could please explain this comment:

"SQL doesn't have the notion of "distribution" built into the language. This can be added into the database, but it isn't there in the language."

Are we expecting SQL to need to support "distribution." ??

I write database software on a very small scale with low transaction hits in your classic sense (select, update, etc..) and never needed sharding but one thing that has recently come up in my world is the need for data to be mirrored at geographically separate locations for both selects (reporting) and updates.

I used software and database technologies to over come these but never any crafty SQL calls. Like inserting into two or more databases to keep them in sync...

Are the DB engineers of the world trying to progress SQL to overcome these things at the SQL level?

One Example: Oracle and Postgres both have streaming replication that has nothing to do with SQL (yes I know you can use SQL commands in postgres to start & stop it but it's postgres specific) but as far as I know, these are replication technologies below SQL.

So I was genuinely curious or thought I was missing something; I still kind of do as I've not stayed very current with the state of the art.


I'm certainly no expert but I can take a stab at it.

I think the author was talking less about the syntax and more about the semantics of SQL there. Specifically the query plan. For instance in a SQL statement there is no indication which fields are indexed. You don't even have to use a primary key to find the data. So the database has to take both the data you provide in your AST as well as data about the tables and fields you specify to figure out where to find your data in a distributed database.

Contrast that with the redis example where the client must specify a key that redis then knows how to map to a shard. The logic is much simpler and easier to implement.

Redis has shards builtin to the method you use to query it which allows Redis to simplify the logic of distributed queries.

SQL does not have a notion of sharding builtin in to it so this complicates the query plan logic when it comes to sharding.


Streaming: it really depends on the scale of use-case: denormalized datawarehousing vs. financial tranasctions vs. MMORPG user state vs. etc. For some apps, a subscribe / notify pattern for responding to streams of messages (a-la MQTT) might be more workable.

An explanation of Postgres / Oracle similarities:

Postgres Plus AS (the commercial product) aims to be a drop-in replacement for Oracle, so it's no surprise that Postgres tends to overlap with regard to Oracle functionality. It also makes it safer to choose Postgres as a default starting point, because the migration options to Oracle / Postgres Plus / etc. are there.

http://www.enterprisedb.com/products-services-training/produ...

Disclaimer: unaffiliated pimping Postgres Plus because it likely subsidizes Postgres OSS dev. (Postgres OSS code is generally well-engineered.)


SQL has three parts (at least):

- description of your data (create table basics. This is called Data Description Language or DDL in the standard)

- the queries that is the first thing people think of when they hear 'SQL'

- all kinds of extra info that controls efficiency, resilience, etc. ((partial) indices, choice of storage engine, whether to materialize views, how to grow database files, etc.).

That last part could easily include ways to specify automatic replication (but probably wouldn't be standardized soon)


Some non-SQL query languages have explicit constructs that deal with distributed cases. For example, SPARQL can do federated queries using the SERVICE keyword. [1] gives an example.

[1] http://stackoverflow.com/questions/14980386/connecting-linke...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: