Hacker News new | past | comments | ask | show | jobs | submit login

I've created ~five new projects over the past year using SQLite and I've got into the habit of creating tables like this:

  CREATE TABLE tIssue (
    id   INTEGER PRIMARY KEY NOT NULL CHECK (typeof(id) = 'integer'),
    col1 BLOB NOT NULL                CHECK (typeof(col1) = 'blob'),
    col2 TEXT                         CHECK (typeof(col2) = 'text' OR col2 IS NULL)
  );
This has saved me from a lot of errors and has been useful, but it's just so easy to let things drift, e.g. removing `NOT NULL` constraint and forgetting to also update the check constraints. I'm also horrified at the auto-conversions that I've just learned about from this link that I'm not protected from.

I'm very much looking forward to using strict tables.




> I'm also horrified at the auto-conversions that I've just learned about from this link that I'm not protected from.

Most of the common databases that I'm aware of (PostgreSQL, MySQL, SQL Server) do implicit type conversion on insertion (as well as in in many other places). I haven't checked this, but it wouldn't surprise me if that's actually ANSI SQL standard behavior.


PosttgreSQL has only very limited implicit type conversion (e.g. only between different text types and between different integer types) so I presume that what you refer to is how single quoted SQL literals are untyped and their content will be parsed depending on e.g. what you try to insert them into.

    $ CREATE TABLE t (x int);
    CREATE TABLE
    $ INSERT INTO t (x) VALUES (42);
    INSERT 0 1
    $ INSERT INTO t (x) VALUES ('42');
    INSERT 0 1
    $ INSERT INTO t (x) VALUES (CAST('42' AS text));
    ERROR:  column "x" is of type integer but expression is of type text
    LINE 1: INSERT INTO t (x) VALUES (CAST('42' AS text));
                                           ^
    HINT:  You will need to rewrite or cast the expression.
    $ INSERT INTO t (x) VALUES ('42a');
    ERROR:  invalid input syntax for type integer: "42a"
    LINE 1: INSERT INTO t (x) VALUES ('42a');

    $ INSERT INTO t (x) VALUES (CAST(42 AS bigint));
    INSERT 0 1


ANSI SQL provides “CREATE CAST […] AS ASSIGNMENT”, which will define a cast that gets used implicitly to get from type X to type Y when you have a tuple-field (e.g. table-row column, composite-value member field, stored-proc parameter) of type Y, and an expression-value being passed into it of type X.

Quoting Postgres docs (because ain’t nobody paying for the ANSI SQL standard just to quote it):

> If the cast is marked AS ASSIGNMENT then it can be invoked implicitly when assigning a value to a column of the target data type. For example, supposing that foo.f1 is a column of type text, then:

    INSERT INTO foo (f1) VALUES (42);
> will be allowed if the cast from type integer to type text is marked AS ASSIGNMENT, otherwise not. (We generally use the term assignment cast to describe this kind of cast.)

Presumably, in any ANSI SQL-compliant DBMS, you could redefine whatever assignment casts are annoying you to be non-assignment casts. (AFAIK, there’s no ANSI SQL standardization on what casts must be assignment casts; so lowest-common-denominator SQL-standard-compliant queries shouldn’t be relying on the presence of any defined assignment casts.)


Isn’t what they are referring to more that quoted literals are untyped and the db engine will attempt interpret them based on the target / context, with various levels of flexibility (discarding or munging data) depending on the engine?


> This has saved me from a lot of errors

Are your projects in a dynamically typed language? I struggle to think why a statically typed language would have this problem, because the data being transferred at the boundary between application and database is already well-defined and strictly typed.


What if you manipulate the data using SQL eg. UPDATE t SET col1 = SomeFunc(col2). That would expose you to possible type issues too.


Say you’re using the C API (not really relevant which language, any statically typed language would work) and had the line

    sqlite3_bind_int(stmt, 3, 45);
to bind the value 45 to the third item in whatever INSERT/UPDATE statement you’re doing. But that was actually wrong: the third item is supposed to be a string, you meant to bind the int to item 4.

A strictly typed database would error out here, a dynamically typed one will just assign the int to the string column, potentially corrupting data.

This is exactly equivalent to messing up types for variables in a dynamically typed language, all pros and cons of typechecking apply equally for SQL. The fact that the host language C is statically typed is irrelevant.


> A strictly typed database would error out here, a dynamically typed one will just assign the int to the string column, potentially corrupting data.

I would assume posgres is considered "strictly typed", yet it will never error on the correct version of this: all parameters are necessarily passed as `char*` equivalent to "normal" untyped literals/strings.

The best you can do is provide an explicit type (via `paramTypes`), which skips inference and is equivalent to typed constants[0].

[0] https://www.postgresql.org/docs/current/sql-syntax-lexical.h...


If you used SQLite like the grandparent described (adding check constraints to the CREATE TABLE statement), it would error out when running the statement with the incorrectly bound types. The proposal here is basically to (if you opt in to it) automate that process so it just happens automatically.

I have no idea how the postgres C API looks, but the SQLite API has different functions for binding different kinds of data types to prepared statements, like `sqlite3_bind_int` for ints, `sqlite3_bind_blob` for binary blobs, `sqlite3_bind_text` for strings, and so forth. So SQLite "knows" the source type you're binding.


I don't know much about SQLite, but I have used other relational DB's client libraries. And I don't see how static typing in the programming language helps here? The DB's type system and the language's type system are not hooked up to each other. So there is still effective dynamism here even with a statically typed client language. If the SQLite library has various sorts of built-in content coercions, it would still be a problem.


This problem can definitely happen in statically typed languages too. E.g. if you write one data type in one place and read a different type in another place.

You need more than just a statically typed language to prevent it. Either you need something like an ORM or you need some clever hackery like sqlx that interrogates your schema at compile time.


It is possible that the db is shared between many processes with potentially different types




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: