Storing as parquet, and using hive path partitioning, you can get passable batch performance assuming you queries are not mostly scans and aggregates across large portions of the data. On the order of ten seconds for regex matching on columns for example.
I thought it was a great way to give analysts direct access to data to do adhoc queries while they were getting familiar with the data.
I thought it was a great way to give analysts direct access to data to do adhoc queries while they were getting familiar with the data.