And now for the great thing ... Prophet uses Stan underneath [1] and thus is built on foundations of 'regular' Bayesian statistics. Andrew Gelman has written about Prophet as well [2].
After reading this blog I am tempted to get the ML for time series book though. I'd love to try and compare some less than trivial examples with covariates involved.
Basis function regression is a very under-appreciated method for producing time series forecasts. I've found that it beats most of the methods described in this article. Maybe I should make a blog post...
Just by looking at the graph, you can see that fitting a linear combination of a constant term, t, t2 and sin(w*t) will give you a very accurate model that has five tuning parameters (four weights and one frequency).
Yes, it's an extension of linear regression. You can incorporate different basis functions to model trend, seasonality, the effects of external regressors, different frequency components, etc. It gives you a lot more control over the forecast model.
This is a less studied senario for time series research: contextual forecasting. With enough contextual information, forecasting doesn't need to squeeze the information from its own history that hard.
Normally the method for dealing with 'time series' is really just finding ways to turn a non-stationary distribution into a stationary distribution, where you can then apply classic statistical methods on them. So you're just finding ways to factor out the time component in the data so you can use the standard non-time sensitive regression models on the transformed data.
I don't think it's untill you get to the NN based models that they start treating time as a first-class component in the model.
* If I'm wrong please explain why instead of downvoting
A (weak) stationary distribution has no trend, no seasonality, and no changes in variance. With these properties, predictions are independent of the absolute point in time. The transformations to turn non-stationary series into stationary ones are reversible (usually differencing, log-transformations and alike), thus the predictions can be applied back to the original time series.
Treating time as a first-class component really just means to factor in the absolute point in time into the models at training time. This only makes sense if the absolute time changes properties of the distribution that cannot be accounted for with regular transformations. If that's the case, then we assume that these changes cannot be modeled, and are thus either random or follow a complicated systematic we can't grasp. In the first case, a NN wouldn't improve either, in the second case, we either need to always use the full history of the time series to make a prediction, or hope that a complex NN like LSTM might capture the systematic.
In any case, I think one of the more compelling reasons to use NN is to not have to do preprocessing. The trade-off is that you end up with a complicated solution compared to the six or so easy-to-understand parameters a SARIMA model might give you. And the latter even might give you some interpretable intuition for the behavior of the process.
> I don't think it's untill you get to the NN based models that they start treating time as a first-class component in the model.
They're bad at prediction for the past several m1-m4 time series tournament for univariate. The best one for m4 is a combination of NN and traditional statistical regression (time series) but it is often deem too tailor to the data.
And I don't think differencing out the trend, season, etc... means we're treating the time component as second class. It's just that stationary data is what we know the most currently. There is GARCH/ARCH method too. The nonstationary methods aren't used as often and from the tournaments the current set of time series are the best so far.
So I think this comment is misleading.
There is also longitudinal analysis, survival analysis, etc... and they all keep time in mind.
I'm not sure I agree; in a sense the exponential smoothing approaches deliberately factor in time by looking back in time by the seasonality stride. But yes, the reason it's called triple exponential smoothing is that you're repeatedly transforming the data into a form amenable to classical methods. Zero centering and detrending are exactly that of what you speak. But at the end of all this, you still use these components to produce a prediction.
Here are some old references for the problem of the OP:
David R. Brillinger,
Time Series Analysis:
Data Analysis and Theory,
Expanded Edition,
ISBN 0-8162-1150-7,
Holden-Day,
San Francisco,
1981.
George E. P. Box and
Gwilym M. Jenkins,
Time Series Analysis ---
Forecasting and Control:
Revised Edition,
ISBN 0-8162-1104-3,
Holden-Day,
San Francisco,
1976.
Brillinger was a John Tukey
student at Princeton and
long at Berkeley.
Time series problems are indeed temporal, but not all temporal problems are time series problems. Time series deals with time-specific features of a discrete sequence like autocorrelation, trends, seasonality, etc. whereas frequency domain methods deal with, well, frequency.
Support you’re are looking at sales patterns over a long period of time, which has certain patterns. FFTs are unlikely to tell you much that is useful or predict much whereas time series methods can reveal patterns where the t is the independent variable.
FWIIW I've used it all the time, along with wavelets, cepstrums, lifters and Hilbert transforms. For timeseries with nonlinearities and lots of data points, it's the way to go. 9/10 times I'd rather hire a EE with signal processing background than a statistician or data scientist for time series work.
I mostly feel these methods are quite overkill for most applications. As a purist, I'd recommend starting out with a simple linear regression and then moving on to adding methods to cover the letters of SARIMA by showing the need for each. It may not be as flashy, but linear regression is a stupidly powerful and very cheap tool for all kinds of situations.
As a complete non-purist I’d suggest chucking the data into Auto.arima and seeing where that gets you. Not only will it save a lot of time, in my experience it tends to produce better models.
I saw an article which I can't remember that warned in which contexts auto.arima might not work. But in the majority of times auto.arima outperforms the old Box-Jenkins methodology.
In the multivariate time series forecasting problem i found out that fine-tuned xgboost (and its variants) performs much better than fbprophet, sarimax, RNN variations. Predicting time series with RNN is like killing a bird for bazooka.
After reading this blog I am tempted to get the ML for time series book though. I'd love to try and compare some less than trivial examples with covariates involved.
[1] https://peerj.com/preprints/3190/
[2] https://statmodeling.stat.columbia.edu/2017/03/01/facebooks-...