Ok, first you need to decide: is this going to be a purely transactional database (for business processes) or do you also plan to do data analysis straight inside this database (meaning you won’t be extracting, transforming and loading data into another database and analyzing it there).
If it’s transactional, I recommend keeping calculations only if you need to access summarized data frequently. For example, if you are tracking inventory by storing the history of transactions that occur into and out of inventory, it’s trivial to find out how much of each item you have in stock at any point in time by doing a sum of the change in quantities for each item type up to that point.
If you were usually interested in the “current” count, it would be expensive to perform this sum every time, so instead you could keep a separate table for calculating the running total of inventory per item and referring to that. Keep this table up to date through the use of triggers on insert events (Note that your log of inventory transactions would thus be an immutable stream of events)
An example of something not worth storing is derived data that is a combination of separate columns in a table. For most queries it’s probably trivial to be lazy and wait to perform such a calculation until you actually need it. If you still want to have a ready made “table” that has all the computations you want already entered as columns for easy querying, use a view. If you find yourself making liberal use of views, you’re on the right track.
If you are using a separate data warehouse for data analysis, then precompute and denormalize as much as you can.
*Disclaimer: most of what I’m talking about is from a Postgres perspective.
If it’s transactional, I recommend keeping calculations only if you need to access summarized data frequently. For example, if you are tracking inventory by storing the history of transactions that occur into and out of inventory, it’s trivial to find out how much of each item you have in stock at any point in time by doing a sum of the change in quantities for each item type up to that point.
If you were usually interested in the “current” count, it would be expensive to perform this sum every time, so instead you could keep a separate table for calculating the running total of inventory per item and referring to that. Keep this table up to date through the use of triggers on insert events (Note that your log of inventory transactions would thus be an immutable stream of events)
An example of something not worth storing is derived data that is a combination of separate columns in a table. For most queries it’s probably trivial to be lazy and wait to perform such a calculation until you actually need it. If you still want to have a ready made “table” that has all the computations you want already entered as columns for easy querying, use a view. If you find yourself making liberal use of views, you’re on the right track.
If you are using a separate data warehouse for data analysis, then precompute and denormalize as much as you can.
*Disclaimer: most of what I’m talking about is from a Postgres perspective.