>If you’re running a query like “give me the average price of all transactions over the past 5 years”, a relational database would have to load all the rows from the previous 5 years even though it merely wants to aggregate the price field; a columnar database would only have to examine one column — the price column.
Columnar databases can also have indices. If there is an index by date (or time) then you DB will know row range from the index and will read only price column within given row range. If there is no index by data it would be two columns, but it is still much less than a full row with many columns.
Maybe "past 5 years" mean "over everything" and then it might technically be one column.
Or if you're using something like S3 with Parquet for "columnar database" and it is partitioned by date, then the date values are stored in metadata - so you would logically read two columns but physically only read one from storage. Same story for something like Redshift and using date as sortkey.
If the number of transactions is small enough to query quickly, a simple SQL query will give you that. If the number of transactions is very large, querying a random sample of perhaps 1 in 1000 will give you reasonably accurate results. You can calculate the probable error from the sample size.
There are questions which require examining all the data, but often you don't really need to.
Wouldn't that be two columns?