Hacker News new | past | comments | ask | show | jobs | submit login
Time Series Prediction – A short introduction for pragmatists (liip.ch)
393 points by makaimc on Nov 9, 2019 | hide | past | favorite | 55 comments



And now for the great thing ... Prophet uses Stan underneath [1] and thus is built on foundations of 'regular' Bayesian statistics. Andrew Gelman has written about Prophet as well [2].

After reading this blog I am tempted to get the ML for time series book though. I'd love to try and compare some less than trivial examples with covariates involved.

[1] https://peerj.com/preprints/3190/

[2] https://statmodeling.stat.columbia.edu/2017/03/01/facebooks-...


A good explanation for prophet that made it click for me was its recreation with python in pymc

https://www.ritchievink.com/blog/2018/10/09/build-facebooks-...


Facebook prophet is really impressive and has gotten us better precision with shorter iterations even with large datasets compared to CNN+lstm.

We run it over millions of inventory over years of data and it has given satisfactory results in majority of the cases


have you tried xgboost? in our problems none of the LSTM/Fbrpohpet/arima performed better fine-tuned xgboost model.


Just looked it up. Looks super interesting, I will surely give it a shot. Thanks


Which book are you referring to?



Basis function regression is a very under-appreciated method for producing time series forecasts. I've found that it beats most of the methods described in this article. Maybe I should make a blog post...


Works if you have enough domain knowledge to prescribe the functions (priors). Not so good if it's a poorly understood or niche domain.


Sorry for ignorance. Did you mean to say Simple Linear regression or something else? Do you have any reference for that?


Not OP, but as an example, imagine you want to fit the time series of measured monthly CO2 at Mauna Loa:

https://www.esrl.noaa.gov/gmd/ccgg/trends/

Just by looking at the graph, you can see that fitting a linear combination of a constant term, t, t2 and sin(w*t) will give you a very accurate model that has five tuning parameters (four weights and one frequency).


Yes, it's an extension of linear regression. You can incorporate different basis functions to model trend, seasonality, the effects of external regressors, different frequency components, etc. It gives you a lot more control over the forecast model.


Are you referring to Generalized Additive Models (GAM)?


I think GAMs are more for the case where you have many underlying input variables and a single resulting response.


This is a less studied senario for time series research: contextual forecasting. With enough contextual information, forecasting doesn't need to squeeze the information from its own history that hard.


Here's an explanation: https://www.youtube.com/watch?v=rVviNyIR-fI

It's basically fitting on non-linear transformations of x.


You can fit a basis function regression model with least squares.


Normally the method for dealing with 'time series' is really just finding ways to turn a non-stationary distribution into a stationary distribution, where you can then apply classic statistical methods on them. So you're just finding ways to factor out the time component in the data so you can use the standard non-time sensitive regression models on the transformed data.

I don't think it's untill you get to the NN based models that they start treating time as a first-class component in the model.

* If I'm wrong please explain why instead of downvoting


A (weak) stationary distribution has no trend, no seasonality, and no changes in variance. With these properties, predictions are independent of the absolute point in time. The transformations to turn non-stationary series into stationary ones are reversible (usually differencing, log-transformations and alike), thus the predictions can be applied back to the original time series.

Treating time as a first-class component really just means to factor in the absolute point in time into the models at training time. This only makes sense if the absolute time changes properties of the distribution that cannot be accounted for with regular transformations. If that's the case, then we assume that these changes cannot be modeled, and are thus either random or follow a complicated systematic we can't grasp. In the first case, a NN wouldn't improve either, in the second case, we either need to always use the full history of the time series to make a prediction, or hope that a complex NN like LSTM might capture the systematic.

In any case, I think one of the more compelling reasons to use NN is to not have to do preprocessing. The trade-off is that you end up with a complicated solution compared to the six or so easy-to-understand parameters a SARIMA model might give you. And the latter even might give you some interpretable intuition for the behavior of the process.


> I don't think it's untill you get to the NN based models that they start treating time as a first-class component in the model.

They're bad at prediction for the past several m1-m4 time series tournament for univariate. The best one for m4 is a combination of NN and traditional statistical regression (time series) but it is often deem too tailor to the data.

And I don't think differencing out the trend, season, etc... means we're treating the time component as second class. It's just that stationary data is what we know the most currently. There is GARCH/ARCH method too. The nonstationary methods aren't used as often and from the tournaments the current set of time series are the best so far.

So I think this comment is misleading.

There is also longitudinal analysis, survival analysis, etc... and they all keep time in mind.


I'm not sure I agree; in a sense the exponential smoothing approaches deliberately factor in time by looking back in time by the seasonality stride. But yes, the reason it's called triple exponential smoothing is that you're repeatedly transforming the data into a form amenable to classical methods. Zero centering and detrending are exactly that of what you speak. But at the end of all this, you still use these components to produce a prediction.


Here are some old references for the problem of the OP:

David R. Brillinger, Time Series Analysis: Data Analysis and Theory, Expanded Edition, ISBN 0-8162-1150-7, Holden-Day, San Francisco, 1981.

George E. P. Box and Gwilym M. Jenkins, Time Series Analysis --- Forecasting and Control: Revised Edition, ISBN 0-8162-1104-3, Holden-Day, San Francisco, 1976.

Brillinger was a John Tukey student at Princeton and long at Berkeley.


My very first move with timeseries data is to get to frequency space as fast as I can.


Genuinely curious -- how do you create predictive time-series models in the frequency domain?

(background: control systems)


speech recognition immediately jumps to mind


Speech recognition is generally not considered a time series problem though.


what about speech is nontemporal?


Time series problems are indeed temporal, but not all temporal problems are time series problems. Time series deals with time-specific features of a discrete sequence like autocorrelation, trends, seasonality, etc. whereas frequency domain methods deal with, well, frequency.

Support you’re are looking at sales patterns over a long period of time, which has certain patterns. FFTs are unlikely to tell you much that is useful or predict much whereas time series methods can reveal patterns where the t is the independent variable.


In the spirit of the article, how can frequency domain representation be used for prediction / forecasting? Any examples you could share?


+1. FFT is under appreciated.


What do you mean? It's one of the most used algorithm in the history of mankind.


Under appreciated by people just now getting into time series regression, who may be really excited about throwing deep learning at the problem.

Edit: literally saw this happen two weeks ago by a PhD in electrical engineering.


So much that! Also under appreciated: plots and OLS.

Start plotting before firing up the GPUs, then compare to standard OLS - like in this article!

Suggested reading if you don't know OLS: ML from scratch.


Well to be fair if you use a convolutional recurrent architecture, you're doing it without realizing it.


Is that true? I would think that would be more like a wavelet transform where the wavelets are learned?


Yeah, that's true.


and not a single DS/ML person seems to have ever heard of it.


Because most folks in DS/ML are working on predicting likelihood to press a button. You don’t need Fourier transforms for that.


CNN is essentially a FT with learnable kernels.


well, it’s certainly usually convolution. Hardly anyone knows why or how that connects to the FT though.


you do, depending on your feature preprocessing.


Most DS/ML people are useless chimps.

FWIIW I've used it all the time, along with wavelets, cepstrums, lifters and Hilbert transforms. For timeseries with nonlinearities and lots of data points, it's the way to go. 9/10 times I'd rather hire a EE with signal processing background than a statistician or data scientist for time series work.


Can you give an example pls?


“bandpass with a butterworth then plot the FFT and phase”


If you would indulge me, what sorts of insights would this get you? (genuinely curious)

All I can think of is identification of frequency modes.


I mostly feel these methods are quite overkill for most applications. As a purist, I'd recommend starting out with a simple linear regression and then moving on to adding methods to cover the letters of SARIMA by showing the need for each. It may not be as flashy, but linear regression is a stupidly powerful and very cheap tool for all kinds of situations.


As a complete non-purist I’d suggest chucking the data into Auto.arima and seeing where that gets you. Not only will it save a lot of time, in my experience it tends to produce better models.

https://cran.r-project.org/web/packages/forecast/index.html


I saw an article which I can't remember that warned in which contexts auto.arima might not work. But in the majority of times auto.arima outperforms the old Box-Jenkins methodology.


I really recommend Prophet as an easy to use option like the article says.

I needed anomaly detection for prometheus metrics integrated with grafana for marking "anomalous" regions so the model doesn't learn them.

Took me a week to set it all up including packaging up as a Microservice and deploying.


In the multivariate time series forecasting problem i found out that fine-tuned xgboost (and its variants) performs much better than fbprophet, sarimax, RNN variations. Predicting time series with RNN is like killing a bird for bazooka.


I really appreciate the philosophy of defining metric, and measuring performance going from simple to complex methods.

I've also used Prophet library, and find it works well out of the box.


Gaussian Processes seems to be pretty popular, might want to include that to the comparisons.


Gaussian process is marvelous. Demos like https://statmodeling.stat.columbia.edu/2012/06/19/slick-time... just seem magical.


Anyone patched prophet into graphite? Curious if the two are easily combined.


This is one type of time series, but how about audio time series?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: