Hacker News new | past | comments | ask | show | jobs | submit login

Having written a fairly significant amount of SQLite virtual table module code, I think it isn't quite sophisticated enough to be useful. I regret building my application atop that virtual table functionality. Many parts of queries that are vital for efficient execution are not pushed down to the virtual table provider; for instance there is no way for "SELECT COUNT(*) FROM table" to do anything but retrieve every field of every row and return them to the SQLite engine, because the interface does not push the "COUNT" down to the virtual table provider. Even if your underlying data store can trivially answer the question directly, you have to pull all the data and let SQLite compute the answer. So on and so forth.



Out of curiosity, why would you ever run a SELECT COUNT(*)? What use case demands that versus, say, SELECT COUNT(col) where col is the name of a column in the table?


That was just a simplified example. "SELECT COUNT(col) FROM ..." has the same problem: from the virtual table provider's point of view, the query you submitted is "SELECT col FROM ...". There's no way for it to avoid retrieving the data even if it has a faster way just to count it. This is just one small example -- the virtual table interface is just too simplistic to support a lot of real-life scenarios.

Another good example is joins -- they aren't pushed down, and SQLite can only do joins as nested loops. All you can do is stream raw data into the SQLite engine and allow it to perform the join, even if you can do it more efficiently yourself.

In my case I was passing queries to another database engine, so every single thing that could be pushed down to the virtual table provider, I would have been able to take advantage of. But not enough stuff is.


eh? if you want to know the number of rows in a table select count(*) is the idiomatic way


No. That’s the way one may see the first time they’re learning what a SELECT statement is, but it’s most certainly not what you’d ever do in anything near production code.

The parent made this statement:

> ... for instance there is no way for `SELECT COUNT(star) FROM table` to do anything but retrieve every field of every row and return them to the SQLite engine.”

The only response here is, “Of course! That’s what you told it to do! Why would it do anything else—and why would you ever tell it to do that?!”

Edit: To be clear and not sound as though I’m just being dismissive, you can of course use `COUNT(star)`, but the parent made it sound like this was something non-trivial. There are better ways to write such a query when you care about the exact count of rows and, assuming you can’t depend on grabbing the max primary key due to deletions, you’re complaining about the database wasting cycles pulling back irrelevant info. Get to know your database, as it often has features to query such metadata in a much smarter and more performant way.

PS: I replaced the asterisk with the word “star” because it’s making the formatting of this comment atrocious. Apologies for all the edits. I give up now.


I'm sorry, you've misunderstood. I'm fully aware that you can query it more efficiently, and in fact that's the point. If you're writing a virtual table provider, SQLite does not provide enough information about the actual user's query for you to know that you can do it more efficiently. There's no way to know that the user wants a count at all; the count function is not pushed down into the virtual table provider interface. SQLite only tells you that it needs you to retrieve every field in every row. You can't know that, in actuality, SQLite only wants to count the rows.


Ah. My apologies. I did misunderstand what you were calling attention to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: