Hacker News new | past | comments | ask | show | jobs | submit login
What Is a Query Planner? (planetscale.com)
86 points by samlambert on Dec 15, 2022 | hide | past | favorite | 48 comments



Is this different from 'compiling sql'?


A query planner will consider the idiosyncratic properties of your data to determine the most efficient way to execute your query, whereas a compiler is generally blind to the data your program will be processing.

So if you execute the query "SELECT * FROM (a, b) WHERE a.foo = b.bar", if you have many rows in `a` but few rows in `b`, then it's much for efficient to scan `b` than `a`. The query planner will keep track of properties like this & come up with tricks to speed up execution.

But in the sense that "everything is a compiler", yeah you could totally think of a query planner as a compiler that takes in your query's AST and a statistical description of your data and lowers it to a query plan.


This seems pretty similar to profile guided optimization that say Java or .Net or Javascript can do since since it has access to runtime data during JIT Compilation.


I hadn't made that connection but it's an excellent one!


I think I'd say yea because AFAIK/IIRC query planners tend to use additional environment informations to make better plan. Maybe it would be closer to JIT?


Query planners often choose the algorithm based on data statistics (as described in the article).

Compilers generally just make a lot of constant-factor improvements without changing the algorithm. One exception might be the tail-call optimization, which changes the space complexity of an algorithm. And that's one of the optimizations where the developer needs to know for sure whether it will br applied or not.


I've always thought of it as closer to an optimizing JIT compiler than just a regular translation of instructions to instructions.


Not quite. A query plan is usually represented as a tree of steps to output the desired result. Each node would be a high level operation (e.g. a sort) or source of data (e.g. read rows from table), possibly pulling from other nodes beneath it.

The actual compilation of the plan to machine code is possible and a few database systems do exactly that. But most process then nodes themselves or JIT specific node types represent simpler or more tightly defined operations.


Don't most planners also take table and/or index statistics into account? AFAIK for most of the commonly used DBMSes, the same query will result in radically different query plans if the table contents are different enough.


I think they meant it in a more abstract sense; also see relevant sister threads.


> Anyone that has worked with large databases can testify how slow queries can get. This is often due to the necessary indexes not being there, or something in the query that stops the database system from using the index. Choosing the right indexes to use, and the right order to fetch data in, proves to be the difference between a 10ms and 5s query.

This seems incredibly intuitive. Anyone reading it would get the impression that query speed is very important for database users and that they would be eager to investigate any new technology that promised a substantial improvement in query performance.

Yet, I have found the opposite. I created a new kind of database that shows incredible performance gains when compared with mainstream databases like SQLite (https://www.youtube.com/watch?v=Va5ZqfwQXWI) for simple and complex queries against tables of all sizes. It is able to do this without needing to create separate indexes on all the columns in the WHERE clause. The software is in open beta and available for free download at (https://www.Didgets.com) but it is like pulling teeth to get people to try it out.

If I post something like this on HN, I usually get lots of comments about how my claims are impossible (or at least improbable) but few are willing to actually test it themselves on their own machine with their own data set. I have several videos that demonstrate how quick and easy it is to load and analyze data, but interest has been minimal so far.

I feel like the inventor of a new kind of car engine that claims to get 2x the MPG while delivering the same horsepower, but can't get anyone to take the free test drive and prove me right (or wrong).


Query execution speed is an important aspect to consider when picking a database, but by no means the be-all and end-all for most applications' needs. Your page is incredibly light on details about what the product can and can't do which makes it impossible to evaluate whether it's worth my time to install it (which I can't anyways since I don't have a Windows machine) and look. If a database product website can't tell me basic things about it like: does it support transactions? What SQL syntax is supported? Is it row or column oriented? What wire protocol is used and what clients are supported? It makes it difficult for me to imagine where the product could be useful.


There are several demo videos on the youtube channel (https://www.youtube.com/@didgetsdemos) that should give you a good feel for what it can do without having to install it.


I'm not sure who you think your target audience is, but I can tell you that multi-minute screen recordings are probably about the worst possible medium for conveying information about your software to developers.


I think this is overly harsh. Software developers do sometimes enjoy watching software project videos as a kind of entertainment/edutainment. It takes some skill to make those videos draw someone in, though.

For getting a quick overview of the advantages across it's probably easier to craft something compelling in writing.


I don't really want to have to watch 10+ minutes of screencasts to understand what a product can do and how it should be used. Here's a product that likely has some feature overlap with yours: https://duckdb.org/ Its homepage instantly conveys its value proposition


Well, I can find exactly 0 information on how you've implemented this on your very own website.

It's natural to be skeptical of someone who is afraid to share their implementation details but claims best-in-class results, isn't it?


Didgets is a little startup trying to get some traction. I am definitely not 'afraid to share implementation details', but I have not done a great job so far of prominently posting a lot of technical specifications and details on the website.

So...are you promising to try it out if I tell you how I did it?


>So...are you promising to try it out if I tell you how I did it?

Think of it this way - that kind of information is required for me to even consider something like this.


Didgets is a new technology. While the core technology has proven to be very flexible and performant, my resource constraints have limited what the current product can do.

It is not yet a 'drop-in replacement' for an enterprise, distributed database. It can't yet do everything a file system can do. Instead, I have built a tool on the technology that can do some very interesting analytics against large DB tables. It can create pivot tables in seconds. It can help you find anomalies and clean your data quickly. But some other features like file indexing and log management features still need a lot of work.

If you have an immediate need for the polished features, then it can be very useful as it is. If you have a need for features that are only partially finished, then I need feedback, testing, and resources to get them working properly.


> analytics against large DB tables. It can create pivot tables in seconds.

There are dozens, if not hundreds of products in this space. A popular example is Power BI, where even the desktop edition can produce extremely fast pivot table reports over about 10 GB of data.

The server edition can produce sub-second reports on terabytes of data, not a mere couple of gigabytes like in your demos.

You'd have to demonstrate a significant improvement over the existing state of the art for anyone to be interested.


I never make promises.

I also never use tech that doesn't share "how it works".

Clear?


Crystal! I have have never met someone who knows all the details about every piece of technology they use. You must be very smart.


I think it’s more of

If someone tells you they’re faster than their well-funded, well-developed peers, then either:

1. They’re dramatically better engineers

2. They’ve constrained their data model to make it faster, likely at the cost of some usecases

3. They’ve developed some novel theory or set of tricks

4. Their benchmarks are limited/biased, or plainly wrong

5. They’ve gotten to drop backwards compatibility constraints their competitors couldn’t

With usually the answer being 2 or 5 if they’re legitimately faster.

In your case, based on glancing at videos

1. You seem to have widened the data model significantly

2. You’ve got no claims to novelty except the notion of didgets, which is an abstraction I’d expect to cost a pretty perf penny

3. You’ve got no claims to particular competency

So… I’m instinctively betting on 4.


I don't blame you at all for betting on 4. There are plenty of snake-oil salesmen out there. That is why I invite you to download the software; load in your own data set; and perform whatever benchmark test you like. Don't trust me. Trust your own eyes.


The problem is that I need to be first convinced it’s worth bothering with (though you won’t convince me personally, because I don’t have any need for a better-performing DB and I’m lazy).

But if my theory is right, you’ll have a lot more success getting people to test it if you resolve the other items. Which I think you could do in probably two paragraphs on the site.

Even if I could convince myself that it’s faster, I still wouldn’t be comfortable with it if I couldn’t explain, even at a high level, why it’s faster. And I would never assume it’s faster for anything except exactly what I tested (which inevitably is only somewhat representative of what I actually use it for)


> someone who knows all the details about every piece of technology they use

> "how it works" in quotes

Well, see, I can reason about SQL semantics via knowledge of B-trees, and I can understand how there's a time vs space tradeoff made in SQL implementations around indexing - add indexes to save time, but they write to disk.

Didgets... I don't even know what it is. Do I query it with SQL? Does it only work from your UI DB browser? I could watch videos to find out but I find you a bit combative, so my desire to bridge the gap from my end is pretty low atm


My impression when browsing your site is that your product is in the same space as MS Access. It may be that your target audience is typically not on Hacker News.

I don't want to be critical but do want to give some advice. The fact that most information is embedded in videos detracts from the ability to understand your product. Kicking the tires on technology may be free but it's also a time investment thats competing against other priorities.


I have very limited resources so I have devoted everything I have to make the code as performant as possible and make it easy to import data in a variety of formats.

I have little sales and marketing experience so I haven't done enough yet to promote it with proper documentation and other literature. I thought that the videos were the best way to show its speed and flexibility. People can see with their own eyes how to do something and also see it isn't just vaporware. Clearly, by little startup needs a lot of help to compete with the 'big boys' even if the underlying technology is superior.


Well, you've got the performant code part down. Now for the part which may seem harder: the somewhat harsh reality is other people won't like what you've done until they can understand what you've created.

Duck DB's landing page has a lot of great lessons about how to pitch your idea so that it will land well. For one, notice that it is much more concrete than Didgets':

"DuckDB is an in-process SQL OLAP database management system" vs. "A WHOLE NEW WAY TO MANAGE ALL KINDS OF DATA: Beta testers wanted"

Any creator will like their idea because they can see the novel value. But other people probably aren't as interested that it's new. They want to know what it is in the most concrete terms possible.

One other thing: is anything stopping you from putting the results you showcase in your videos into a table on the homepage?


> Yet, I have found the opposite

No, you haven't. You have found that so far, you have not convinced people to look into your product.

For reasons you are explaining yourself here: https://news.ycombinator.com/item?id=34008598

No matter how good your code is, it will go unnoticed without proper communication. Written. Especially if we cannot read the code.

> you don't have the time and skills necessary to import it into a big database system, then maybe we can help.

You are specifically telling people "This is for noobs, if you use a real DB you are not the target", so of course people with DB-related perf issues (and who most probably already have a "big database system") will pass. This is far from the only blocking issue on your website, in your HN comments and your HN description though.


The intent was to say that the code is not yet ready to fully replace your current database. Other systems generally have decades of development and although Didgets does many things, it has not replicated full database functionality at this point.

But it does allow you to quickly import a data set into a table and do a variety of analytic or cleaning operations on it and it is easy enough to do that even a novice without any SQL skills could do it.

Our hope is that enough people will notice its strengths that we can get the resources needed to finish the remaining pieces.


"Prove me wrong" is a phrase you've used in previous HN posts as well. It has the opposite effect of what you hope it has.

Elsewhere on HN you mention that you're looking for "innovators" and "early adopters", but your target market seems to be "database neophytes". Is there a market at the intersection of those two sets?

In other posts and comments across the internets, you don't seem to have decided if you're creating a database or a filesystem replacement. Once you have internal clarity on that, you'll be better at talking about it with people who might find value in what you're building.


The system is a general purpose data management system. It was originally designed as a file system replacement where you could store a few hundred million files in a single container; put meta-data tags on each of them; and then be able to find subsets of them (e.g. all photos with more than 10MB of data and with tag .person.FirstName = Robert) in just a second or two (even if the query produced a few million results).

The objects I invented to store the meta-data tags turned out to be very performant Key-Value stores that could also be used for other purposes. It turns out they could also be used as columns in a relational database table. I think I was surprised as anyone when queries against my tables that use them turned out to be substantially faster than for other databases with highly indexed tables.

I am sorry if the term 'prove me wrong' came across in a bad way. I guess I just got tired of people telling me my system couldn't possibly work and refuse to put forth any effort to try it out. In some ways I feel like a Tesla salesman surrounded by people who are sure it couldn't possible be better (or even different) than a regular car, but won't get in and take it for a spin around the block.


Seem to me like an interactive Demo would be a good next step.


The core thing here is that people simply do not care. There is nothing that incentivizes me to spend one hour figuring out how to clone your repo, compile it, read the benchmarks, and run the benchmarks. There would be much less doubt if your website had any details on what your product is, how it works, and why it's faster, how to migrate to it, that someone can read in 30 seconds while sitting on the bus.


I asked you some technical questions on this forum or another, it was some time ago. You were answering others around me so I got the distinct impression you really didn't want to answer. Wish I could recall the question haha. I vaguely remember you had a benchmark on the site or videos that really didn't show what you thought it showed.

I don't want to poopoo your product, but the material you've provided does not give me confidence to try it. Claims are all very well, you need to show your working and get it right.

Something like postgres has a lot of people working on it, plenty who understand it and the various performance tradeoffs. Remember someone demoing how postgres could achieve much better performance than some competing models by turning off all the important stuff that makes it transactional and atomic and reliable.


All the videos are still available. I would be very interested to know which one you think did not show what I thought it showed.

> Claims are all very well, you need to show your working and get it right.

What does this mean? I have un-doctored videos that show various features in action. How do you suggest that I otherwise 'show your working and get it right'?


I agree with the other commenters; I honestly don't think you've figured out how to productize this or even how to communicate it. The database market is saturated with all sorts of solutions, and you don't seem to have found any kind of sweet spot.

As a fairly seasoned database application developer, I'm perplexed by your web site. It's very sparse and requires the reader to watch a bunch of videos. Even then, there's a lack of clarity about what the product does. Apparently this is some kind of Windows-specific desktop application that can load data and let you query it? Is that the only way? Is there a CLI version I can run on Mac or Linux? Is there a server version? What kind of data can it deal with? How many columns and rows does it scale to? And so on.

On the surface, a slick tool to quickly explore data "ad hoc" — as opposed to being the backend for an application running many types of queries — would be useful, but what exactly is the value proposition? It can't be the GUI, which seems rudimentary. You focus a lot on performance, but if I load a bunch of data into SQLite, it doesn't really matter to me if one query takes 100ms or 2000ms to run a query. That's a "one sip of coffee" level of annoyance that is not going to drive me to investigate alternatives. On the other hand, if I have gigabytes — or even terabytes or petabytes — then the problem becomes less tractable, but there are good solutions (PostgreSQL, BigQuery) that I can use to run complex queries in seconds, using my full set of SQL skills. Your web site gives me zero confidence that it's built for that kind of scale and that I would be productive using it in such a case. So what kind of sweet spot does it fill?

Honestly, I think you would get more traction by making it open source and spending more time just describing its possible utility, rather than complaining that people don't understand. The hacker community tends to judge by merit alone, not claims. It is your job to convince people; it's not people's job to convince themselves. As a developer, I'm mildly piqued by the prformance claims, and I would love to learn more about how it organizes and queries data, but I get really suspicious when all that's offered is a magical black box. Even commercial, closed-source databases like Oracle and SQL Server describe their internals in a fair amount of technical detail.

With all due respect, you also come across as combative and even somewhat rude here on Hacker News. If you're not getting the responses you want, maybe there's a reason for that.


Thanks for your feedback.

The database engine is cross-platform and is capable of running in a number of environments, but so far we only build the Windows version of the browser application. The engine has an API that the browser app calls. Any other app could also call it, but so far we just have a few test programs that run outside the browser GUI. We built a Linux version and tested it, but at this point we just don't have the resources to build and support multiple platforms for every revision.

I have personally built tables with over 100 million rows and over 2000 columns, so it scales pretty well.

I definitely need to work on presentation materials so that developers can better understand what it is and where it can go. I really don't mean to come across as combative, it is just a bit frustrating when people criticize something that they have not personally tried.


If performance is your biggest selling point, consider running some of the industry standard TPC benchmarks on your database. They have a bunch of variations and database people are used to looking at those benchmarks.


This description and some benchmarks on standard TPC-H or TPC-DS setups should help promote ots advantages. See for example: https://www.databricks.com/blog/2021/11/02/databricks-sets-o...


SQLite is very far from what I would call a “mainstream” database, it is only mainstream in being well-known but architecturally has not much in common with MySQL, Postgres, Oracle, MSSQL, etc. which are the more traditional dbs.


Then you should like this video (https://www.youtube.com/watch?v=OVICKCkWMZE) that does a direct comparison with Postgres.


What guarantees can you give me that your database won’t eat my data? Query speed is important, but I’m conservative with my choice of data store for a reason: its the worst possible thing to find a bug in.


Sorry, but I can't guarantee that there isn't a bug in it that could corrupt data. While I have run thousands of tests against it, I don't think any database startup can offer you a solid guarantee that you won't find a bug (especially in the first few versions).

I take data integrity very seriously and it sounds like you do as well. You would make a good beta site.


So, I feel ya. I'm inventing a new thing, and competing in the data space is hard. Can you focus on a user/b2b product to drive your DB?


I agree that competing in the data space can really be hard. I have worked with data (file systems, databases, etc.) for 40 years. Some of the products I worked on were successful (PartitionMagic, Drive Image) but others were not as impactful.

Whenever you offer an alternative to a mature technology like database, it can be a real challenge. Your product is expected to do everything that the well-funded and long developed product can do, and more.

Good luck on your venture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: