Hey Simon! Thanks for sharing this. I have long admired your work on Datasette :...

simonw · on May 1, 2023

The approaches look pretty similar. My chatgpt() function is pretty much the most basic possible implementation of that pattern - it's just a SQLite custom-SQL function written in Python.

You should absolutely try pointing Datasette at that SQLite database, I imagine it would work really well!

jarulraj · on May 1, 2023

Thanks so much for sharing your thoughts! I also felt that they are pretty similar. But, I am guessing that SQLite (similar to most relational database systems) does not automatically cache the results of functions, do non-trivial cost-based optimization for functions in queries, or reorder function-based predicates based on the estimated cost of running the functions, etc.

Edit: I have shared more details on the function-aware optimization in EVA in this post (in case you are interested) -- https://news.ycombinator.com/item?id=35764355#35773608

Sure, we will try it out and keep you posted :)

simonw · on May 1, 2023

You can cache function results yourself in Python if you want to - my implementation also sums up the tokens used by the calls to the functions.

Influencing optimization isn't possible using regular Python-based custom SQL functions though. I think you can influence that stuff in SQLite if you create more complex virtual table functions, but those aren't exposed through the regular Python sqlite3 module yet.

jarulraj · on May 1, 2023

Thanks for the clarifications. Token summation is a cool optimization :)

Query optimizers in SQL database systems typically optimize based on the time to execute the function on a local server. The token summation optimization generalizes time-based optimization of local functions to dollar-based optimization for remote functions.

Execution Time-based optimization: FunctionFoo(input 1) = 2x FunctionFoo(input 2)

Dollar-based optimization: ChatGPT(prompt with 100 tokens) = 2x ChatGPT(prompt with 50 tokens)

We are also exploring dollar-based optimization in EVA, and will check out your openai-to-sqlite tool for ideas [1].

[1] https://datasette.io/tools/openai-to-sqlite