Hacker News new | past | comments | ask | show | jobs | submit login
Strict Tables – Column type constraints in SQLite - Draft (sqlite.org)
331 points by thunderbong on Aug 21, 2021 | hide | past | favorite | 107 comments



I've created ~five new projects over the past year using SQLite and I've got into the habit of creating tables like this:

  CREATE TABLE tIssue (
    id   INTEGER PRIMARY KEY NOT NULL CHECK (typeof(id) = 'integer'),
    col1 BLOB NOT NULL                CHECK (typeof(col1) = 'blob'),
    col2 TEXT                         CHECK (typeof(col2) = 'text' OR col2 IS NULL)
  );
This has saved me from a lot of errors and has been useful, but it's just so easy to let things drift, e.g. removing `NOT NULL` constraint and forgetting to also update the check constraints. I'm also horrified at the auto-conversions that I've just learned about from this link that I'm not protected from.

I'm very much looking forward to using strict tables.


> I'm also horrified at the auto-conversions that I've just learned about from this link that I'm not protected from.

Most of the common databases that I'm aware of (PostgreSQL, MySQL, SQL Server) do implicit type conversion on insertion (as well as in in many other places). I haven't checked this, but it wouldn't surprise me if that's actually ANSI SQL standard behavior.


PosttgreSQL has only very limited implicit type conversion (e.g. only between different text types and between different integer types) so I presume that what you refer to is how single quoted SQL literals are untyped and their content will be parsed depending on e.g. what you try to insert them into.

    $ CREATE TABLE t (x int);
    CREATE TABLE
    $ INSERT INTO t (x) VALUES (42);
    INSERT 0 1
    $ INSERT INTO t (x) VALUES ('42');
    INSERT 0 1
    $ INSERT INTO t (x) VALUES (CAST('42' AS text));
    ERROR:  column "x" is of type integer but expression is of type text
    LINE 1: INSERT INTO t (x) VALUES (CAST('42' AS text));
                                           ^
    HINT:  You will need to rewrite or cast the expression.
    $ INSERT INTO t (x) VALUES ('42a');
    ERROR:  invalid input syntax for type integer: "42a"
    LINE 1: INSERT INTO t (x) VALUES ('42a');

    $ INSERT INTO t (x) VALUES (CAST(42 AS bigint));
    INSERT 0 1


ANSI SQL provides “CREATE CAST […] AS ASSIGNMENT”, which will define a cast that gets used implicitly to get from type X to type Y when you have a tuple-field (e.g. table-row column, composite-value member field, stored-proc parameter) of type Y, and an expression-value being passed into it of type X.

Quoting Postgres docs (because ain’t nobody paying for the ANSI SQL standard just to quote it):

> If the cast is marked AS ASSIGNMENT then it can be invoked implicitly when assigning a value to a column of the target data type. For example, supposing that foo.f1 is a column of type text, then:

    INSERT INTO foo (f1) VALUES (42);
> will be allowed if the cast from type integer to type text is marked AS ASSIGNMENT, otherwise not. (We generally use the term assignment cast to describe this kind of cast.)

Presumably, in any ANSI SQL-compliant DBMS, you could redefine whatever assignment casts are annoying you to be non-assignment casts. (AFAIK, there’s no ANSI SQL standardization on what casts must be assignment casts; so lowest-common-denominator SQL-standard-compliant queries shouldn’t be relying on the presence of any defined assignment casts.)


Isn’t what they are referring to more that quoted literals are untyped and the db engine will attempt interpret them based on the target / context, with various levels of flexibility (discarding or munging data) depending on the engine?


> This has saved me from a lot of errors

Are your projects in a dynamically typed language? I struggle to think why a statically typed language would have this problem, because the data being transferred at the boundary between application and database is already well-defined and strictly typed.


What if you manipulate the data using SQL eg. UPDATE t SET col1 = SomeFunc(col2). That would expose you to possible type issues too.


Say you’re using the C API (not really relevant which language, any statically typed language would work) and had the line

    sqlite3_bind_int(stmt, 3, 45);
to bind the value 45 to the third item in whatever INSERT/UPDATE statement you’re doing. But that was actually wrong: the third item is supposed to be a string, you meant to bind the int to item 4.

A strictly typed database would error out here, a dynamically typed one will just assign the int to the string column, potentially corrupting data.

This is exactly equivalent to messing up types for variables in a dynamically typed language, all pros and cons of typechecking apply equally for SQL. The fact that the host language C is statically typed is irrelevant.


> A strictly typed database would error out here, a dynamically typed one will just assign the int to the string column, potentially corrupting data.

I would assume posgres is considered "strictly typed", yet it will never error on the correct version of this: all parameters are necessarily passed as `char*` equivalent to "normal" untyped literals/strings.

The best you can do is provide an explicit type (via `paramTypes`), which skips inference and is equivalent to typed constants[0].

[0] https://www.postgresql.org/docs/current/sql-syntax-lexical.h...


If you used SQLite like the grandparent described (adding check constraints to the CREATE TABLE statement), it would error out when running the statement with the incorrectly bound types. The proposal here is basically to (if you opt in to it) automate that process so it just happens automatically.

I have no idea how the postgres C API looks, but the SQLite API has different functions for binding different kinds of data types to prepared statements, like `sqlite3_bind_int` for ints, `sqlite3_bind_blob` for binary blobs, `sqlite3_bind_text` for strings, and so forth. So SQLite "knows" the source type you're binding.


I don't know much about SQLite, but I have used other relational DB's client libraries. And I don't see how static typing in the programming language helps here? The DB's type system and the language's type system are not hooked up to each other. So there is still effective dynamism here even with a statically typed client language. If the SQLite library has various sorts of built-in content coercions, it would still be a problem.


This problem can definitely happen in statically typed languages too. E.g. if you write one data type in one place and read a different type in another place.

You need more than just a statically typed language to prevent it. Either you need something like an ORM or you need some clever hackery like sqlx that interrogates your schema at compile time.


It is possible that the db is shared between many processes with potentially different types


This is really welcome. One idea for an approach I had a while back (but haven't attempted writing code yet) was to see if "personalities" could be implemented.

For example a "PostgreSQL" personality (set via PRAGMA maybe?) which dynamically imposes the same type names and "limitations" on SQLite as PostgreSQL.

Ditto for other SQL dialects (MySQL, etc).

SQLite already has the flexibility to allow the data types, so this idea is a way of putting constraints in place that are really beneficial for a _lot_ of common use cases.

eg "You're using SQLite as a local test for a [PostgreSQL/MySQL/etc] prod database"


Honestly in that case you should just use Postgres locally. SQLite was never meant to be a stand-in for a true RDBMS. As they say, it's intended to be a replacement for fopen(). The fact that it works for local development is a fluke.


You say that like a networked RDBMS isn't meant to be a replacement for fopen() at the end of the day.

Yeah it does a lot more than fopen(). So does SQLite. It's still, bottom line, just secondary storage with a fancy API.


Exactly! There are numerous applications where SQLite is replacing an ad hoc file-based solution, with nebulous at best durability and atomicity, let alone scalability. Like an application that would otherwise persist state in XML/JSON/protobuf and have to manage the challenges of regularly persisting and reading state; challenges with become unwieldly as the data size grows.

This is not a challenge commonly faced on on the enterprise server side, hence why many of us cannot imagine a use case for SQLite. Our workloads involve multiple readers/writers for both scaling and availability. An RDBMS cluster is the obvious choice.

But for mobile/embedded, all the data is local to a single device and commonly also a single process. In this case, an RDBMS would add unnecessary overhead. Additionally, there's desktop software that has to work with relatively large datasets, which require reliable persistence and efficient searching/reading. Adding an RDBMS would complicate installation and support.

Further, researchers and data analysts benefit from SQLite databases when the data is too large to hold in memory, yet not large enough to warrant a proper data warehouse. Even ~100 MB datasets can benefit from SQLite if you're performing a lot of random writes/reads or want to execute complex queries. There are other alternatives such as Apache Arrow, but SQLite is tried and tested option. It can be populated and queried similar to SQL-based data warehouses, and it also includes secondary indexes, an efficient query engine, and fast random writes with durability and atomic transactions.

There's a reason why SQLite can reputably claim to be the "Most Widely Deployed and Used Database Engine" with an estimated one trillion SQL database in use. [0]

[0] https://www.sqlite.org/mostdeployed.html


There are also many use cases where a network service would use per-user SQLite files, which can support concurrent reads from multiple processes (but global write locks). If all processes accessing the file are on the same host, you can also enable a WAL but that has its own pros/cons.

Sqlite is a beast, there's a lot you can do with it. But there's no shared buffer cache or strategy for keeping disk pages in-memory across transactions and files, so you pay for it with random disk I/O access proportionate to your client load. But for many workloads, it's a really low-maintenance option with lots of low tech and effort back-up/disaster recovery options and depending on how chatty your clients are you can get a lot out of modern hardware/clouds.

You could also get clever, and provide pure in-memory access when pinning clients to hosts and treat it like a write-through cache.


Not many people are willing to make PostgreSQL a dependency of their program.


How about DuckDB:

https://duckdb.org/

It keeps databases in a single local file like SQLite, and it supports PostgreSQL syntax (not sure which version). It does a load of funky stuff to be really fast at analytics workloads, but I would guess it's fast enough at transactional workloads for small-scale use.


Not a dependency in the sense of not a statically linked binary? Sure.

Not a dependency in the sense of not requiring a specific version of a Postgres db provider, or a specific version of a Postgres itself, or able to do any work without Postgres existing? Laughably no.

DB agnostic software is very uncommon, even though ODBC is well-established and mature.


I always found this very interesting. With the exception of enterprise software built to run in customer datacenters with whatever infrastructure that may entail, practice seems to dictate that the overhead of making something DB agnostic isn't a worthwhile trade off.

I haven't spoken to too many folks who've had to make this decision, but off the top of my head I think there's a number of factors that aren't obvious at first glance that factor into this equation:

1. You need authors of queries to understand the least common denominator of supported SQL features and syntax (probably ANSI SQL, but there may be cases where it isn't)

2. Such projects would need to test or otherwise verify that all SQL queries execute in the ways they're expected to. This can be tricky when you have regressions in query performance from one database flavor to the next.

3. Your application can't have features that are made or broken by features specific to a particular database. I can see this being a very limiting constraint in a world where business intelligence and data analysis at scale has become a the norm.

Are there other factors here that would compel projects to stick to something like ODBC (or your database abstraction layer of choice) rather than against it?


The most important factor for me has always been simplicity. If i know my app is going to hit a postgres, and postgres has a feature that perfectly fits my problem. Why would I spend a bunch of time and effort building it again. Just so my app can run on a database it's never going to run on?

For me it's about using the full power of my underlying technologies. Just like i don't restrict myself to a subset of C that would compile with any C compiler ever, i don't restrict myself to generic SQL.


ODBC, like JDBC, isn't really a database abstraction layer. It lets you connect to multiple databases in a standard way. That's it. Your queries still need to be database specific, or stick to some limited syntax. I've never seen a real project that doesn't eventually require some vendor specific SQL syntax.


Not a dependency in the sense that nobody is going to ship PostgreSQL with their 20MB mobile app. SQLite is absolutely ubiquitous in mobile and desktop software. It also works well as a file format, you can download a DB full of data from the internet or a git repo.

It's designed to be embedded into software and used to replace raw files. That's why the database is a single file, which is something that is a feature in every situation except if you have a server doing many concurrent writes.

The only reason that SQLite even works as a stand-in for an RDBMS is because they did such a good job implementing the SQL standard.


It's actually disappointing that PostgreSQL doesn't support an embedded mode like MySQL does. I guess it's because of its concurrency model of one process per user (MySQL uses threads AFAIK), but for some embedded use cases like unit/integration tests, single-user single-connection mode would go a long way already.

Testcontainers are the proper solution for automated tests, but it is not trivial to set up CI as AFAIK it requires access to the Docker socket.


No, that's specifically not the use case.

A common use case is using SQLite as a local stand in replacement for "whatever production database" people are using.

Commonly PostgreSQL, but not always. So, my idea was just a way to make that work better, nothing more.


With WAL2 mode, SSDs everywhere, and machines with gigabytes of RAM, SQLite makes sense in more situations than you might think.


Yeah, I've never understood people's aversion to running Postgres or MySQL locally. They're super easy to install, and they'll run in the background using minimal resources ready for you when you need them.


Yeah for development Postgres is easy to install. For deployment it comes at a complexity cost.

I have small scripts that do things like send an email on a given business event. These scripts pull data from the main ERP. To keep the system upgradable we avoid changes to the ERP's database. So these scripts need their own DB to store state, such as when the email was sent. At first this involved one Postgres instance with databases for each script or application. If that Postgres server moves IP or has a version upgrade often every script or app needs a config change or library upgrade. Everything gets coupled together. By using an SQLite DB per script a range of problems go away. Of course anything that has multiple users or needs to be reported on has to go into a central DB but a surprising amount can happly run on it's own independent SQLITE DB.

To move hosts or backup the application and DB you take a copy of the folder. It's so easy.


Oh sure. Nothing against SQLite. If your needs are simple then it's a great solution. It's the "Postgres/MySQL in production, SQLite for development/testing" pattern that puzzles me.


Ah right yeah, I'm not sure that is a great idea. If you don't develop against the same system you want to deploy with you are asking to find bugs in production.


The H2 embedded database has compatibility modes for other databases, which seems helpful for testing, but ends up being incompatible in various ways for anything non-trivial. You end up fighting the compatibility and skipping useful native features of the target database, while still not having confidence that the code works on the target database.

We now use the awesome Testcontainers library (available for multiple languages) that makes it easy to launch the real database in a Docker container. Testing applications against multiple databases is now trivial.


Thanks, that's interesting and a really useful data point. :)


It looks like this check-in from yesterday contains the implementation and tests: https://sqlite.org/src/info/f9c1d3441b87ee29

As always the TCL tests are worth admiring:

    # STRICT tables have on a limited number of allowed datatypes.
    #
    do_catchsql_test strict1-1.1 {
    CREATE TABLE t1(a) STRICT;
    } {1 {unknown datatype for t1.a: ""}}
    do_catchsql_test strict1-1.2 {
    CREATE TABLE t1(a PRIMARY KEY) STRICT, WITHOUT ROWID;
    } {1 {unknown datatype for t1.a: ""}}
    do_catchsql_test strict1-1.3 {
    CREATE TABLE t1(a PRIMARY KEY) WITHOUT ROWID, STRICT;
    } {1 {unknown datatype for t1.a: ""}}
    do_catchsql_test strict1-1.4 {
    CREATE TABLE t1(a BANJO PRIMARY KEY) WITHOUT ROWID, STRICT;
    } {1 {unknown datatype for t1.a: "BANJO"}}
    do_catchsql_test strict1-1.5 {
    CREATE TABLE t1(a TEXT PRIMARY KEY, b INT, c INTEGER, d REAL, e BLOB, f DATE) strict;
    } {1 {unknown datatype for t1.f: "DATE"}}
    do_catchsql_test strict1-1.6 {
    CREATE TABLE t1(a TEXT PRIMARY KEY, b INT, c INTEGER, d REAL, e BLOB, f TEXT(50)) WITHOUT ROWID, STRICT;
    } {1 {unknown datatype for t1.f: "TEXT(50)"}}

    do_execsql_test strict1-2.0 {
    CREATE TABLE t1(
        a INT,
        b INTEGER,
        c BLOB,
        d TEXT,
        e REAL
    ) STRICT;
    } {}
    do_catchsql_test strict1-2.1 {
    INSERT INTO t1(a) VALUES('xyz');
    } {1 {cannot store TEXT value in INT column t1.a}}


I listened to a podcast yesterday with the creator of SQLite who said they have billions of test cases running against the project at all times across hardware and software dating back decades at this point.

It really is a great project.


The Untold Story of SQLite With Richard Hipp - CoRecursive Podcast

https://corecursive.com/066-sqlite-with-richard-hipp/


I wouldn't go as far as to say I'm aghast at the lax typing¹, but I look forward to this change hitting a release. Adding some sugary support for a strict DATETIME type in a future release would tick every box I want from sqlite.

In a couple of projects I work on the main reason for plopping an ORM between sqlite and the application is to implement type strictness, and removing that need would be excellent.

It would also neatly shut down a discussion that comes up far too often when someone wants to use sqlite as data store, which is surely a good thing as it is often just a stop motion argument in my experience.

¹ I am closer to that camp iff I have to choose an option from the article.


I think "other developers are aghast" is a bit strong for technical documentation. I was certainly never aghast: knowing that SQLite started its life aimed at TCL, the decision made sense, and the commitment to backwards compatibility is admirable. Nonetheless I certainly welcome this change.

The "doesn't enforce foreign key constraints by default" is much more surprising / annoying to me.


Scan this HN thread to see comments from devs who are aghast. :-)


And even more, scan the HN thread from 16 days ago:

https://news.ycombinator.com/item?id=28050198

"Aghast" feels like an appropriately accurate characterization to me. ;)

I'm actually curious if this change was a direct (and incredibly fast) reaction to that thread, or if the timing was just coincidental?


coincidence


It's not completely clear to me how this will impact backwards compatibility: if I create a table with STRICT will versions of SQLite prior to the one that ships the feature still be able to open that database file and read from and write to that table?

If they can, and they insert an invalid row, what will happen when a modern version of SQLite attempts to read from the table?


I'm an SQLite fan and not a developer, so I don't know for sure, but I think that they will do the same thing as with generated columns.

Link: https://www.sqlite.org/gencol.html#compatibility

Quote:

Generated column support was added with SQLite version 3.31.0 (2020-01-22). If an earlier version of SQLite attempts to read a database file that contains a generated column in its schema, then that earlier version will perceive the generated column syntax as an error and will report that the database schema is corrupt.

To clarify: SQLite version 3.31.0 can read and write any database created by any prior version of SQLite going back to SQLite 3.0.0 (2004-06-18). And, earlier versions of SQLite, prior to 3.31.0, can read and write databases created by SQLite version 3.31.0 and later as long as the database schema does not contain features, such as generated columns, that are not understood by the earlier version. Problems only arise if you create a new database that contains generated columns, using SQLite version 3.31.0 or later, and then try to read or write that database file using an earlier version of SQLite that does not understand generated columns.

End of quote.

I can't find it easily in documentation, but I've read somewhere that SQLite stores schema in database as text - and reasoning behind this was that it allows SQLite devs change internal representation of schema without worrying about backwards compatibility too much.


Yeah, I seem to have made the mistake of actually reading the documentation. I think that's mostly your fault though, for making the documentation so interesting and accessible.


I suspect if I had to answer as many questions/rants about type strictness as the authors do I'd probably choose much stronger language. It definitely wasn't intended, but I guess my earlier comment could be read as the ten-thousandth entitled complaint of the day too.


I'm so happy to see this.

Personally I don't particularly care - SQLite's loose typing has never caused any problems for me - but I've seen SO many programmers dismiss SQLite as an option because they find its loose typing distasteful.

Eliminating one of the most common reasons that people reject SQLite feels like a big win to me.


I inherited a PHP/SQLite project years ago. I found timestamps with three different formats in a column because different PHP pages had different ideas of what a timestamp is. OK, the problem was the original developer that wrote all of that code but the database made it too easy to write that horror.

I wish they add a real strict type for dates and times too.


Ouch, that sounds nasty!

There's a pattern for enforcing datetime formats using a CHECK constraint which I've not tried myself yet but which looks like it could work: https://sqlite.org/forum/forumpost/4f4c96938f4bef32?t=h


That's hardly unique to SQLite. I'm a huge fanboy of SQLite, though I have to use MySQL in pretty much all of my client work since I more often get hired to work on existing projects than start new ones. There's three common ways to store timestamps in MySQL and they all stink;

- Use an integer column with a Unix timestamp. Not human-readable and no sub-second accuracy if you need it (unless you store the time in milliseconds instead of full seconds, requiring larger int columns, or just get completely crazy and use a float field of some sort.

- Use a `datetime` field. Human-readable, supports milliseconds natively, and supports any time between years 1000 and 10,000, but doesn't store a time zone - so if you define the column with something like `'created' TIMESTAMP NOT NULL DEFAULT current_timestamp`, who knows what will actually be inserted in that field when the row is created?

- Use a `timestamp` field. Like datetime fields, you get a human-readable field with sub-second precision, but the range is only from the Unix epoch to Y2K38, which is seventeen years away now. These fields store the date unambiguously as UTC - hooray! But… when you retrieve the date, the time will be adjusted "back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. The time zone can be set on a per-connection basis." You can make an unwieldy cast to get the date as UTC - `SELECT CONVERT_TZ(created, @@session.time_zone, "+00:00") AS created_utc` - but obviously this is easy to forget about and performs an unperformant double-cast.

All three of these options suck in their own distinct ways.

All this rant is to say that date storage in at least two of the most common databases is a bit of a mess right now (it's been long enough since I've used Postgres that I can't recall if things are any better there, but I'm sure one of its fanboys will let us know soon enough if it is). So don't get too hard on your previous developer or blame SQLite in particular.


PostgreSQL is much better, but there is a footgun. If you want to store timestamps in PostgreSQL you should use the timestamptz type, not the timestamp type. Both types are very well designed and work excellently but the names are confusing.


And there's the Postgres fanboy I knew was coming eventually. :)

Could you elaborate on why timestamptz should be used? The name would imply it stores a time zone; is that correct, and the normal timestamp type does not as in MySQL?


Yes, the name does imply that but it does not actually store any time zone. The timestamptz type stores a point in time in the UTC time zone with microsecond resolution while the timestamp type stores a point in time in an unknown time zone (which time zone is left up to the application/user) also in microsecond resolution. These types are really useful and closely match the LocalDateTime (=timestamp) and Instant (=timestamptz) classes in Java, but in java they have better names.

You almost always want to use timestamptz in columns and only use timestamp when doing certain types of timestamp arthmetics.


> There's three common ways to store timestamps in MySQL and they all stink

I used to think that, but now I realize those problems would disappear if you set server time to UTC and only work with UTC values. It's not an excuse for MySQL's absolutely flawed approach to datetime values, but it's a pretty sane workaround.

- it doesn't matter that DATETIME lacks timezone information, since you implicitly treat all timezones as UTC.

- TIMESTAMP would add still the timezone offset, but it wouldn't do anything since the timezone is UTC +00.

With server timezone set to UTC, TIMESTAMP functions identically to DATETIME but with massively reduced capacity, so I don't see any reason to use TIMESTAMP anymore. Compared to BIGINT unix epoch, DATETIME has the advantage of being compatible with MySQL's datetime-manipulating functions (such as ON UPDATE CURRENT_TIMESTAMP).


Explicit timezone is information that cannot be resurrected once lost. “Select all employees around the world who left after 18:00 too often to advise them to care more of their health”. Of course in practice most employers could not care less, so most time libraries feel free to lose it, and developers are aware of that.


You don't drop the timezone information. You manually normalize the timezone to UTC time, application side, before storing it in the database. This is to get around MySQL's crap handling of datetime types:

- TIMESTAMP 'stores' timezone, but it has a limit of year 2038.

- DATETIME has a limit of year 9999, but it does not store timezone at all.

There really is no better way. Pick DATETIME for its higher capacity, then only ever use UTC time in the database to get around the lack of timezone storage in DATETIME.


> There really is no better way. Pick DATETIME for its higher capacity, then only ever use UTC time in the database to get around the lack of timezone storage in DATETIME.

From what I understand, this would mean that you would always have to set the value of datetime columns manually (with a UTC-adjusted value) instead of being able to use 'ON UPDATE current_timestamp'.

Of course, that's unless you set the server time to use UTC (as you suggested above), which strikes me as an extreme and not-very-portable approach.


I also struggle with datetimes. One idea that is much worse by space is store UTC only as string in this format: A date-time without a time-zone in the ISO-8601 calendar system, such as "2007-12-03T10:15:30". (See: https://docs.oracle.com/javase/8/docs/api/java/time/LocalDat...)

I'm pretty sure that format is also string comparable for sorting.


Same, I think as long as you are strict with types in your business logic and test units, SQLite loose typing is not an issue.


Awesome. The weird column type behavior is possibly the only thing I dislike about SQLite3. Having to add the word "strict" at the end of each "create table" statement is a slightly annoying thing I might forget, but it's a reasonable concession to compatibility, and I can write a schema test to catch omissions.


Nice to see. I almost never want mixed values in a column, so being able to turn on a strict mode preventing inserting a string into an int column is very welcome.

Also the per-table opt-in is reasonable.

sqlite is really becoming this bedrock thing in the computing landscape, it's silently everywhere and all agree that it is kind of a good thing.


Great news, domain constraint enforcement is a welcome new feature. However, let's not throw the baby out with the bathwater; being able to specify an ANYTYPE/VARIANT column on a STRICT table would make this feature more useful. The canonical use case for ANYTYPE is a BIGTABLE or Entity–Attribute–Value (EAV) model.

I haven't checked recently, but I was unable to determine the type of ? parameters in SQLite prepared statements. Maybe this is something that can also be accommodated with STRICT tables.


In something like this:

CREATE TABLE t1(a INT, b TEXT); INSERT INTO t1(a,b) VALUES(1,'2'); SELECT * FROM t1 WHERE a=?;

The type of the ? is ambiguous. You can say that it "prefers" an integer, but most RDBMSes will also accept a string literal in place of the ?:

SELECT * FROM t1 WHERE a='1'; -- works in PG, MySQL, SQLServer, and Oracle


I'm interested in the API interface while the string literal example is more about the SQL language. JDBC has long provided prepared statement metadata that is used extensively by numerous tools.


I also proposed once something like "PRAGMA pedantic" where any type conversion would be reported to stderr see the changes and a sample output:

https://sqlite.org/forum/forumpost/692ae69237f2537b?t=h

https://sqlite.org/forum/forumpost/103d7294464fcde8?t=h


Here is a sample to test the "pedantic" patch:

====

select "23 % 3", 23 % 3;

select "12.3 % 3", 12.3 % 3;

select "12 % 2.5", 12 % 2.5;

select "23 / 3", 23 / 3;

select "12.3 / 3", 12.3 / 3;

select "12 / 2.5", 12 / 2.5;

create table ta(a text, b integer, c float);

select 'insert declared types == value types';

insert into ta(a,b,c) values('a', 1, 2.0);

select 'insert declared types != value types text';

insert into ta(a,b,c) values('b', '1', '2.0');

select 'insert declared types != value types';

insert into ta(a,b,c) values('c', 1.0, 2);

select 'update declared types == value types';

update ta set a = 'a' ,b = 1, c = 2.0 where a = 'a';

select 'update declared types != value types text';

update ta set a = 'a' ,b = '1', c = '2.0' where a = 'a';

select 'update declared types != value types';

update ta set a = 'a' ,b = 1.0, c = 2 where a = 'a';

select 'update one value declared types != value types';

update ta set b = 1.0 where a = 'a';

select 'update one value declared types != value types';

update ta set a = 49 where a = 'b';

====

Output default sqlite:

====

23 % 3|2

12.3 % 3|0.0

12 % 2.5|0.0

23 / 3|7

12.3 / 3|4.1

12 / 2.5|4.8

insert declared types == value types

insert declared types != value types text

insert declared types != value types

update declared types == value types

update declared types != value types text

update declared types != value types

update one value declared types != value types

update one value declared types != value types

====

Output of sqlite with "pedantic":

====

23 % 3|2

FP Remainder received non integer values 3.000000 :: 12.300000

12.3 % 3|0.0

FP Remainder received non integer values 2.500000 :: 12.000000

12 % 2.5|0.0

23 / 3|7

FP Division received non integer values 3.000000 :: 12.300000

12.3 / 3|4.1

FP Division received non integer values 2.500000 :: 12.000000

12 / 2.5|4.8

insert declared types == value types

insert declared types != value types text

Affinity applied on make record 1 : 1 : D

Affinity applied on make record 2 : 1 : E

insert declared types != value types

Affinity applied on make record 1 : 2 : D

update declared types == value types

update declared types != value types text

Affinity applied on make record 1 : 1 : D

Affinity applied on make record 2 : 1 : E

update declared types != value types

Affinity applied on make record 1 : 2 : D

update one value declared types != value types

Affinity applied on make record 1 : 2 : D

update one value declared types != value types

Affinity applied on make record 0 : 3 : B

====


This is nice. Very nice. But to be honest, having used SQLite in many hobby/toy projects for ~15 years, I have never ever encountered a type error with it. Stricter handling of types is a good idea, IMHO, but if the alternative had been, say, a reasonable type to represent date/time values, I would have chosen the latter.


Where I ran into an error was when I developed a new technique by running Python from a Jupyter notebook, and then moved everything over to Golang once I'd got the basics of the algorithm down. Python didn't care that it got occasionally got back a 'float' rather than an 'int'; Golang did. (So I had to make a custom type unmarshaller that could handle both incoming types.)

EDIT: The fact that Golang is type-checked should make most of the issues here moot (since the only way to get the wrong type would be to have a malformed INSERT query, and Golang would choke the next time one of those rows showed up in a query); but it would mean I wouldn't have to be quite so paranoid when frobbing about manually with the sqlite3 command-line tool, for instance.


Golang did

That really depends on how “golang” fetches data: https://www.sqlite.org/c3ref/column_blob.html

C API makes it clear that if you request an int, you get an int, no matter what. The only way it could care is a client/wrapper library that looked at a column type, called a specific accessor function and then tried to push that value into non-compatible golang structure field(?). It doesn’t make the database incompatible, it’s just a wrapper with a poorly thought out type consistency. At C level, there is no such issue.


Same here, haven't ran into any problems with the loose typing. I personally love sqlite's dead simple type system though, and its date handling compared to mysql & postgres - a lot of power in a few simple primitives - would hate to see it add a million new types.


> I have never ever encountered a type error with it

The problem is opposite - not having type error. The simplest example probably is import from CSV with incorrect column order. Using other DB you get an error, SQLite will just import everything.


Is your usual language strictly typed?


These days, I am a Go person, so mostly yes, but I used to be a Python, Perl, and Ruby person, and I did not encounter any problems there, either.

I think if you expect your database to use certain types, you write your code accordingly. Why would you ever want to insert a BLOB into an INTEGER column? Some kind of data validation/sanitizing is necessary anyway, the database engine being strict about it is just an additional layer of protection, but it shouldn't be one's first line of defense against mistakes. IMHO.


you and I might not. but bugs happen. and sqlite being so permissive can aggravate the issue.


Like I said, I welcome the option to enforce stricter type checking, and I definitely will make use of it once it becomes available.

OTOH, if type safety is a concern, one can already enforce it using CHECK constraints or triggers. But yeah, it's like a seat belt in a car - in order to benefit from it, you have to actually put it on.


Side note, Richard Hipp was on this weeks Changelog podcast. Super interesting as always: https://player.fm/series/the-changelog-software-development-...


This makes SQLite even better than it already was. I am 100% onboard with this effort.


This looks like a reasonable proposal to me.

What sort of performance impact could we expect to see when accessing strict tables?

Also, is there going to be an ALTER TABLE variant?


It would be nice if the entire database could be set into strict mode, so that you wouldn't have to specify it on every table.


Ignoring the side show of "wait it doesn't have types" and "lol it's bad because it's lax on types"....

Is having a "STRICT" keyword appended to the end of the CREATE TABLE syntax the best option? I'd have thought this would be a PRAGMA option if ever implemented.


PRAGMA doesnt enable you to incrementally adopt the feature.


How so? I'd have thought a PRAGMA is sort of a sticky, sometimes persistent flag; and this only affects CREATE TABLE statements so you could toggle it inbetween such statements if your refactoring for it would be large enough to nesesitate doing so in stages?

I use sqlite a fair bit but admittedly know little about the layers beneath, and it feels like tagging this feature to the end of CREATE TABLE just means more "magic invocation boilerplate" that might be straying from SQL syntax?

Again, not too clued into the deep minuta of SQL so maybe suffixes to the CREATE TABLE are "valid"? This would count similar to engine or charset configuration in a mysql statement?


It sounds like you're asking why PRAGMA cant be used to affect the table only during its creation. The main reason for this with SQLite is that SQLite stores the table definition as text. If you run `SELECT sql FROM sqlite_master WHERE type='table'` you can see all the statements that are used to create the tables. Therefore, in order to specify which tables should be strict, you would have to have something in that statement that indicates it. The other option would be to have the flag control all tables (even ones created previously) which prevents incremental adoption.


Wondering if this is going to bring some noticeable performance optimizations apart from the other benefits?


Probably not - the on-disk storage format still includes the types in the rows (in a very space-efficient form) so this would just add an error check / message rather than accepting a differently-typed value.


I'm very happy to see this! I'll still keep my CHECK constraints but having the table opt-in STRICT mode is unloading a boulder off of my shoulders.

Very grateful to the SQLite team! Kudos for not digging your heels in and trying to accommodate your diverse community.

Big respect.


Nice to see this. Curious what cases people have encountered where sqlite's loose typing has proved really useful or simplified a design?

I always knew about it but basically never used it, just treated tables as though they were strongly typed.


I tried a quick search for “sqlite [dynamic] typ{e,ing} pitfalls”, but got mostly irrelevant results.

Can anyone please share common pitfalls of $subj issue? (Apart from obvious “you can insert a mess into a column and get hit by it later”)


for me mixed value types killed index, which drastically affected performance


YES YES YES PLEASE. I would also like a mode where no type coercion happens.


One disadvantage is that inspecting the column datatype will no longer hint to the application whether the column contains a date.


It would be nicer if this were a per-column constraint rather than a per-table one, but it's very welcome all the same.


Interesting that values are still coerced if possible. I'd expect STRICT to be absolute in this case.


I hope you realize that most of the common databases (PostgreSQL, MySQL, SQL Server) do implicit type conversion on insertion.


Only in a quite limited fashion as I mentioned in another comment.


I am not surprised that many comments show surprised about the column types are flexible by default. I guess most people don’t read the original documentation before starting using it.

This happens because most tutorials do not mention this either. I will not be surprised if most authors of the tutorials don’t know this either.

I did read some webpages on the official website, but I don’t remember seeing this either. It’s possible that I read pass this information but did not pay attention to it.


SQLite's flexible datatype approach is mentioned in "Distinctive Features Of SQLite"[1], "Quirks, Caveats, and Gotchas In SQLite"[2] and detailed in "Datatypes In SQLite"[3].

I find it surprising that apparently many devs use SQLite without knowing any of this. Pretty much the first thing I try to find out about languages or databases new to me that I might want use is the datatype support.

[1] https://www.sqlite.org/draft/different.html

[2] https://www.sqlite.org/draft/quirks.html

[3] https://www.sqlite.org/draft/datatype3.html


I think many people use SQLite come from analysis/statistics background, not programming background. For them, restrict is the obvious default, so obvious that they never think about the existence of the alternative.


This STRICT feature is a brand new (currently marked as DRAFT) and hopefully scheduled for the next release of SQLite. I believe this document only showed up on the website today.


Flexible is the default. This is my comment about, flexible not strict.



Huh, thought columns already had strict datatypes. Only used postgres before.

Very scary sqlite doesn't do that already...


I had no idea SQLite was so fluid with its column types - that's kind of alarming. I guess it's fine if you still have an ORM in between you and it, and this STRICT syntax is probably the most reasonable workaround, but it just seems like a band-aid on top of not following the sensible pattern of every other database engine.


It's designed to integrate with Tcl with minimal fuss over typing. The world latched on to it because of its dependability and relative light weight.


I find in practice this limitation is not an issue and column flexibility does not get in the way.


The problem is not it does not get in the way. The problem is that some expect it to get in the way when the data is not right and assume it will do when in fact it won’t .


That's kind of the problem. The database absolutely should get in your way if you try to put nonsense data in it.

For the kinds of uses SQLite sees, the loose approach to data integrity may not have caused much grief, but I can't imagine many situations where it would have helped either.

Having support for enabling sane behaviour is welcome even if it has to be opt-in for compatibility.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: