Hacker News new | past | comments | ask | show | jobs | submit login
Compiled or Bust? (codinghorror.com)
49 points by fogus on March 19, 2010 | hide | past | favorite | 23 comments



It's interesting to me that he's got a top 1000 site with standard tools. No "NoSql" or the other such echo chamber flavor of the day fads.

Can get away with this because c#, .net, asp.net and sql server are pretty darn fast?

<fud> It's been my experience that the standard 2 tier app with a compiled language is usually pretty fast out of the box. Databases are an _almost_ solved problem. The db folks have been at it a long time and you can wring some pretty amazing performance out of sql server|oracle|etc.

Problems: php is usually slow as sh!t and requires something like eAccelerator to provide decent performance.

Java could be good, but the App Server/Broker/N-tier architecture usually kills performance rather than enhancing it.

Just me experience. No hard science here. </fud>


Yeah, the MS stack is surprisingly fast out of the box, so long as you avoid using pretty much everything that comes with ASP.NET in the "Rich Controls" department.

We accidentally launched Twiddla and survived our first two Redditings while serving the site from a desktop behind a Business DSL line. Surprisingly, it never batted an eye handling the load.

Probably shouldn't admit it, but up until we finally moved to a real box at a datacenter, the production server doubled as a gaming machine. Even with FarCry running, we never had an issue keeping up with load!


Some domains suit the relational database model better than others. If he was doing a huge amount of machine learning-type stuff he might have a different architecture.

Then again, given that he (like most people, myself included) has a lot of RDB experience, he designed his application around what RDBs do well. Perhaps if he had 10+ years of experience with distributed hash tables he might have designed a site with a different feature set.


Yeah, but services these days that are announcing the usage of a NoSql storage aren't doing any machine learning, they are just storing comments, tags and relationships.


There's real confusion in this post.

Rico's statement "Each execution builds the expression tree, and then builds the required SQL." is talking about turning a LINQ expression tree into a SQL string before SQL Server ever gets a chance to reach into its query cache.

Jeff's conclusion "The computational cost of pre-processing a given query is only paid the first time the database sees the new query....Not so in Linq to SQL." is just plain talking about the wrong thing.

SQL Server doesn't care what abstraction generated the SQL query that it's receiving, it'll still use the same old cache.

Rico is legitimately discussing the time taken for the web server to compile a query to SQL, not the time for a SQL Server to look up a cached or uncached query.


"In short the problem is that the basic Linq construction (we don’t really have to reach for a complex query to illustrate) results in repeated evaluations of the query if you ran the query more than once."

It sounds to me like Rico is advocating the prepare/execute model. With ODBC, it's possible to generate a query plan on the client side. What Rico is saying is that it generates a query and then sends the raw query to the database. This would have been a valid point before SQL Server 2005, but according to Microsoft "the prepare/execute model has no significant performance advantage over direct execution, because of the way SQL Server reuses execution plans."

http://msdn.microsoft.com/en-us/library/ms175528.aspx


I was a bit confused about this as well. The DB builds a query plan from the SQL. Linq builds an SQL query, so they're two parts in a chain, not interchangeable.

The confusion seems to come from "preprocessing" the query twice. LINQ adds another layer of preprocessing.


how does a sql engine cache [results and plans given] queries?

i would naively assume that there are two caches (ignoring compiled queries - see comment below). one, up front, is a literal cache: repeat a query text exactly and bang, you get the previous results.

but then it would make sense for a second cache based on the structure of the ast after parsing a query. if the structure is the same, but some parameters have changed, then it should be possible to re-use much of a previous [plan].

if i understand the article correctly, it's saying that this doesn't happen. the only way to get this is to use a "compiled query" that you then call with a variety of parameters.

is that right? if so, why? seems like my second level cache wouldn't be that hard to implement...

[edit: i realise this wouldn't avoid the needless passing from linq to text and back to ast, and i agree with the general tone of the article that you need to justify worrying about optimisations - i'm just curious about how sql engines work]

[edit2: clarified by explicitly mentioning plans]


What happens is this: There is something called a "query plan." For trivial queries, it is the same as the query. But for more complex queries it works out how to use indices, joins, and so forth. It's kind of a lower-level query language. Figuring out the query plan can be costly for complex queries.

The database maintains a cache of query plans, so when you give it the same query, you get the query plan back right away and the database runs it.

There are data caches for other reasons, but the problem Jeff is describing is that the C# VM must interpret all the steps of building a SQL query in text every time it's run. The database may have the query plan cached, but even though the code never changes, building the SQL query string happens every time.


i thought he was saying more than that.

i thought that he was saying that a new plan is always constructed. what i was saying is that a new plan only needs to be constructed for "sufficiently different" queries, where "sufficiently different" can be something much more stringent than the hash of the literal text.

if we're only worrying about the parsing time (as opposed to the time to build and optimize a plan) then it's even more pointless to worry about this (i'm assuming parsing is significantly faster than actually building a plan).

[edit: i'm confused who is who here; have swapped person twice, hope this is correct]


I think the database is smarter than a simple literal text for query plans. I think it reverse-engineers literals so that if you query WHERE t.initials = 'ac' and then later query WHERE t.initials = 'rb', it re-uses the cached query plan.

But the point of the article is that the problem is not in the database, it's over in the CLR which is spending a lot of time composing the SQL text that gets sent to the database.


UPDATE:

In Oracle, when you use literals any change to the text results in another hard parse on the database side. However, you can use bind variables instead of literals and it can re-use the query plan even when you are searching for different values.

http://www.akadia.com/services/ora_bind_variables.html

The OP is talking about LINQ. I assume that its Oracle implementation is smart enough to use bind variables, but I am not going to look that up right now...


the article gives an example of how to do compiled queries. which strongly implies that "ordinary" linq queries are not compiled as you seem to expect.


In the real world, this probably doesn't matter. Literals always go in as parameters, so the query text itself stays the same.

(usually it's simply "exec User_GetBySandalCount @sandalCount", 4)


not if linq is constructing the queries, according to the article, which i think is what we were discussing.


You started this thread by asking about SQL Server itself (and the way it handles query plans in particular). That's what we've been talking about ever since. Not Linq.

You might want to go back and read the responses to your question again to make sure you've got your context right.


i see your point, and it was wrong of me to describe the context in that way.

on the other hand, your previous statement was qualified with "in the real world", and to some extent the "real world" includes people using linq.

having said that, i agree that the consensus - if i haven't misunderstood something else - seems to be that sql engines don't do what i expected because it's assumed that people will use compiled statements. i don't use linq myself so am having to infer a lot from the article, but at some point i plan to check django and sqlalchemy to see when they use precompiled statements and when not.


ah, ok. thanks.


I know the article is about a MS stack, but this is not true for MySQL. It's text based so even an extra space causes a miss on the query cache.


What happens in MySQL when the queries are character-for character identical except for literals, e.g WHERE t.foo = 'bar' vs. WHERE t.foo = 'blitz'?

update: NVM!

http://dev.mysql.com/tech-resources/articles/mysql-query-cac...


From the conclusions: Only identical queries may be serviced from the cache. This includes spacing, text case, etc.

Which is disappointing. I will go look at Postgres... and it seems to cache only at a lower level (indices, tables etc).


If you cache on the level of SQL-statement strings you can skip the transformation into the AST. It's very cheap to quickly hash the string and look it up. If you have a site that is going to use the same queries over and over (e.g. for front-page data or popular categories) then the query will likely be exactly the same and the string hash is cheaper. If you had to build an AST for every single query you're just adding an extra layer of complexity that won't gain you much.

The only reason I can think of for a query with different literals in the conditionals to have a different plan would be if a histogram of frequencies was actually kept such that different literals created large differences in result sizes which then get fed into a join and affect the join logic. But that's going to be pretty rare and is a level of optimization that is unnecessary. It's also not going to make a huge difference unless the query is very complex, because the plans for simple queries are... simple.


either i've misunderstood, or you're missing the point.

if you cache the statement then you get a miss if the different literals change. remember that (again, unless i have misunderstood) we are not talking about compiled statements, but about linq generate the "complete text".

i agree that the same plan would normally be used; the problem is hitting the cache to find that plan if the value of a variable changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: