> temporality will become important as well because it matters how information changes over time. Current systems are not able to account for this because of the large overhead of storing extracted information
This quote happens to be in the context of video frames but I think temporal indexing in general is widely under-utilised, both for providing consistent queries (i.e. the database-as-a-value) and for integration of late-arriving data (i.e. bitemporality). It seems particularly relevant when considering how best to synchronise information across the "omnipresent Internet of Things" (not to mention inter-planetary devices, which also get a mention in the post!).
Unfortunately, Temporal Tables in the latest ISO SQL spec and MS SQL Server have issues. At least their current implementation.
First-off, there’s no history of schema changes. While you can easily go from a NOT NULL to a NULL column, you can’t go from NULL to NOT NULL. This is a deal-breaker when using TT for LoB data when schema changes happen somewhat regularly. TT should have been designed with schema versioning from the start.
The second main issue is the still lack of tooling and ORM support for TT. While EF6 is compatible with TT (it will safely ignore the SYSTEM TIME columns) it doesn’t let you directly query the history table.
Third - any UPDATE statement, even when it doesn’t actually change any data, causes a new history row to be added, including a full copy of any mvarchar(max) values. Copy-on-write does not seem to be used. That’s a huge waste of space.
Finally, you cannot exclude columns from temporal tracking - so if you have frequently-updated columns containing inconsequential data (e.g. display sort-order columns) then you’ll end up with history table spam.
I don’t know why the SQL Server team invests massive amount of engineering effort into features like these when other massive priorities exist - like modernising T-SQL’s syntax to be less verbose (parameterised object identifiers and optional predicate clauses, please!) or adding support for column array-types, which would greatly improve the platform.
I also don't know why duplicating timestamps for each row to store the transaction times is needed for a kind of history table.
Other data models with schema after write semantics (for instance trees or graphs) don't have this problem of having to store schema changes.
As I'm working myself on a temporal storage system in my spare time (as well as during my student times at the University of Konstanz), I'd say that you should index each revision instead, storing the timestamp of the transaction commit in a RevisionRootPage for instance, which is the root and main entry to deserialize a revision. Unchanged pages should be shared amongst the different revisions.
Furthermore due to fast random access of flash drives and even parallel I/O with PCIe SSDs you can store page-fragments instead of whole pages. The SSD might fetch too much information due to misalignments, but I'd say that access to persistent storage also gets more fine granular in the future (byte level granularity instead of block-level or the blocks are smaller-sized -- I think even 512 bytes is not that unusual nowadays).
Not only storing the incremental or differential changes from a database page, but changes which fall out of a sliding window doesn't make intermediate full dumps of pages necessary, such that write and read peaks don't occur and a predictable number of page fragments can be read to reconstruct a page in-memory, preferably fetching the fragments if possible in parallel.
Linux support for SQL Server is my canary in the coal mine.
It makes zero sense.
Meanwhile, the Windows version, which is used for 99.999% of installs still does not support Vista-era APIs like elliptic curve certificates because of legacy code nobody has touched in 10+ years.
There's a crazy amount of low-hanging fruit in that product that just isn't being addressed.
This quote happens to be in the context of video frames but I think temporal indexing in general is widely under-utilised, both for providing consistent queries (i.e. the database-as-a-value) and for integration of late-arriving data (i.e. bitemporality). It seems particularly relevant when considering how best to synchronise information across the "omnipresent Internet of Things" (not to mention inter-planetary devices, which also get a mention in the post!).