Time Series Prediction – A short introduction for pragmatists

wjnc · on Nov 9, 2019

And now for the great thing ... Prophet uses Stan underneath [1] and thus is built on foundations of 'regular' Bayesian statistics. Andrew Gelman has written about Prophet as well [2].

After reading this blog I am tempted to get the ML for time series book though. I'd love to try and compare some less than trivial examples with covariates involved.

[1] https://peerj.com/preprints/3190/

[2] https://statmodeling.stat.columbia.edu/2017/03/01/facebooks-...

antman · on Nov 9, 2019

A good explanation for prophet that made it click for me was its recreation with python in pymc

https://www.ritchievink.com/blog/2018/10/09/build-facebooks-...

ganeshkrishnan · on Nov 9, 2019

Facebook prophet is really impressive and has gotten us better precision with shorter iterations even with large datasets compared to CNN+lstm.

We run it over millions of inventory over years of data and it has given satisfactory results in majority of the cases

ayayecocojambo · on Nov 11, 2019

have you tried xgboost? in our problems none of the LSTM/Fbrpohpet/arima performed better fine-tuned xgboost model.

ganeshkrishnan · on Nov 16, 2019

Just looked it up. Looks super interesting, I will surely give it a shot. Thanks

objektif · on Nov 9, 2019

Which book are you referring to?

wjnc · on Nov 10, 2019

In OP [1].

[1] https://machinelearningmastery.com/deep-learning-for-time-se...

chroem- · on Nov 9, 2019

Basis function regression is a very under-appreciated method for producing time series forecasts. I've found that it beats most of the methods described in this article. Maybe I should make a blog post...

starpilot · on Nov 10, 2019

Works if you have enough domain knowledge to prescribe the functions (priors). Not so good if it's a poorly understood or niche domain.

amrrs · on Nov 9, 2019

Sorry for ignorance. Did you mean to say Simple Linear regression or something else? Do you have any reference for that?

semi-extrinsic · on Nov 9, 2019

Not OP, but as an example, imagine you want to fit the time series of measured monthly CO2 at Mauna Loa:

https://www.esrl.noaa.gov/gmd/ccgg/trends/

Just by looking at the graph, you can see that fitting a linear combination of a constant term, t, t2 and sin(w*t) will give you a very accurate model that has five tuning parameters (four weights and one frequency).

chroem- · on Nov 9, 2019

Yes, it's an extension of linear regression. You can incorporate different basis functions to model trend, seasonality, the effects of external regressors, different frequency components, etc. It gives you a lot more control over the forecast model.

riyadparvez · on Nov 9, 2019

Are you referring to Generalized Additive Models (GAM)?

semi-extrinsic · on Nov 9, 2019

I think GAMs are more for the case where you have many underlying input variables and a single resulting response.

XuMiao · on Nov 9, 2019

This is a less studied senario for time series research: contextual forecasting. With enough contextual information, forecasting doesn't need to squeeze the information from its own history that hard.

tgv · on Nov 9, 2019

Here's an explanation: https://www.youtube.com/watch?v=rVviNyIR-fI

It's basically fitting on non-linear transformations of x.

clircle · on Nov 10, 2019

You can fit a basis function regression model with least squares.

platz · on Nov 9, 2019

Normally the method for dealing with 'time series' is really just finding ways to turn a non-stationary distribution into a stationary distribution, where you can then apply classic statistical methods on them. So you're just finding ways to factor out the time component in the data so you can use the standard non-time sensitive regression models on the transformed data.

I don't think it's untill you get to the NN based models that they start treating time as a first-class component in the model.

* If I'm wrong please explain why instead of downvoting

k_f · on Nov 10, 2019

A (weak) stationary distribution has no trend, no seasonality, and no changes in variance. With these properties, predictions are independent of the absolute point in time. The transformations to turn non-stationary series into stationary ones are reversible (usually differencing, log-transformations and alike), thus the predictions can be applied back to the original time series.

Treating time as a first-class component really just means to factor in the absolute point in time into the models at training time. This only makes sense if the absolute time changes properties of the distribution that cannot be accounted for with regular transformations. If that's the case, then we assume that these changes cannot be modeled, and are thus either random or follow a complicated systematic we can't grasp. In the first case, a NN wouldn't improve either, in the second case, we either need to always use the full history of the time series to make a prediction, or hope that a complex NN like LSTM might capture the systematic.

In any case, I think one of the more compelling reasons to use NN is to not have to do preprocessing. The trade-off is that you end up with a complicated solution compared to the six or so easy-to-understand parameters a SARIMA model might give you. And the latter even might give you some interpretable intuition for the behavior of the process.

anthony_doan · on Nov 11, 2019

> I don't think it's untill you get to the NN based models that they start treating time as a first-class component in the model.

They're bad at prediction for the past several m1-m4 time series tournament for univariate. The best one for m4 is a combination of NN and traditional statistical regression (time series) but it is often deem too tailor to the data.

And I don't think differencing out the trend, season, etc... means we're treating the time component as second class. It's just that stationary data is what we know the most currently. There is GARCH/ARCH method too. The nonstationary methods aren't used as often and from the tournaments the current set of time series are the best so far.

So I think this comment is misleading.

There is also longitudinal analysis, survival analysis, etc... and they all keep time in mind.

jldugger · on Nov 10, 2019

I'm not sure I agree; in a sense the exponential smoothing approaches deliberately factor in time by looking back in time by the seasonality stride. But yes, the reason it's called triple exponential smoothing is that you're repeatedly transforming the data into a form amenable to classical methods. Zero centering and detrending are exactly that of what you speak. But at the end of all this, you still use these components to produce a prediction.

graycat · on Nov 9, 2019

Here are some old references for the problem of the OP:

David R. Brillinger, Time Series Analysis: Data Analysis and Theory, Expanded Edition, ISBN 0-8162-1150-7, Holden-Day, San Francisco, 1981.

George E. P. Box and Gwilym M. Jenkins, Time Series Analysis --- Forecasting and Control: Revised Edition, ISBN 0-8162-1104-3, Holden-Day, San Francisco, 1976.

Brillinger was a John Tukey student at Princeton and long at Berkeley.

hprotagonist · on Nov 9, 2019

My very first move with timeseries data is to get to frequency space as fast as I can.

wenc · on Nov 9, 2019

Genuinely curious -- how do you create predictive time-series models in the frequency domain?

(background: control systems)

hprotagonist · on Nov 10, 2019

speech recognition immediately jumps to mind

wenc · on Nov 10, 2019

Speech recognition is generally not considered a time series problem though.

hprotagonist · on Nov 11, 2019

what about speech is nontemporal?

wenc · on Nov 13, 2019

Time series problems are indeed temporal, but not all temporal problems are time series problems. Time series deals with time-specific features of a discrete sequence like autocorrelation, trends, seasonality, etc. whereas frequency domain methods deal with, well, frequency.

Support you’re are looking at sales patterns over a long period of time, which has certain patterns. FFTs are unlikely to tell you much that is useful or predict much whereas time series methods can reveal patterns where the t is the independent variable.

windsignaling · on Nov 9, 2019

In the spirit of the article, how can frequency domain representation be used for prediction / forecasting? Any examples you could share?

1996 · on Nov 9, 2019

+1. FFT is under appreciated.

new2628 · on Nov 9, 2019

What do you mean? It's one of the most used algorithm in the history of mankind.

sgillen · on Nov 9, 2019

Under appreciated by people just now getting into time series regression, who may be really excited about throwing deep learning at the problem.

Edit: literally saw this happen two weeks ago by a PhD in electrical engineering.

1996 · on Nov 9, 2019

So much that! Also under appreciated: plots and OLS.

Start plotting before firing up the GPUs, then compare to standard OLS - like in this article!

Suggested reading if you don't know OLS: ML from scratch.

dnautics · on Nov 9, 2019

Well to be fair if you use a convolutional recurrent architecture, you're doing it without realizing it.

sgillen · on Nov 10, 2019

Is that true? I would think that would be more like a wavelet transform where the wavelets are learned?

dnautics · on Nov 10, 2019

Yeah, that's true.

hprotagonist · on Nov 9, 2019

and not a single DS/ML person seems to have ever heard of it.

teej · on Nov 9, 2019

Because most folks in DS/ML are working on predicting likelihood to press a button. You don’t need Fourier transforms for that.

XuMiao · on Nov 9, 2019

CNN is essentially a FT with learnable kernels.

hprotagonist · on Nov 9, 2019

well, it’s certainly usually convolution. Hardly anyone knows why or how that connects to the FT though.

hprotagonist · on Nov 9, 2019

you do, depending on your feature preprocessing.

scottlocklin · on Nov 9, 2019

Most DS/ML people are useless chimps.

FWIIW I've used it all the time, along with wavelets, cepstrums, lifters and Hilbert transforms. For timeseries with nonlinearities and lots of data points, it's the way to go. 9/10 times I'd rather hire a EE with signal processing background than a statistician or data scientist for time series work.

objektif · on Nov 9, 2019

Can you give an example pls?

hprotagonist · on Nov 9, 2019

“bandpass with a butterworth then plot the FFT and phase”

wenc · on Nov 9, 2019

If you would indulge me, what sorts of insights would this get you? (genuinely curious)

All I can think of is identification of frequency modes.

_v7gu · on Nov 10, 2019

I mostly feel these methods are quite overkill for most applications. As a purist, I'd recommend starting out with a simple linear regression and then moving on to adding methods to cover the letters of SARIMA by showing the need for each. It may not be as flashy, but linear regression is a stupidly powerful and very cheap tool for all kinds of situations.

mr_toad · on Nov 11, 2019

As a complete non-purist I’d suggest chucking the data into Auto.arima and seeing where that gets you. Not only will it save a lot of time, in my experience it tends to produce better models.

https://cran.r-project.org/web/packages/forecast/index.html

in9 · on Nov 11, 2019

I saw an article which I can't remember that warned in which contexts auto.arima might not work. But in the majority of times auto.arima outperforms the old Box-Jenkins methodology.

cube2222 · on Nov 9, 2019

I really recommend Prophet as an easy to use option like the article says.

I needed anomaly detection for prometheus metrics integrated with grafana for marking "anomalous" regions so the model doesn't learn them.

Took me a week to set it all up including packaging up as a Microservice and deploying.

ayayecocojambo · on Nov 11, 2019

In the multivariate time series forecasting problem i found out that fine-tuned xgboost (and its variants) performs much better than fbprophet, sarimax, RNN variations. Predicting time series with RNN is like killing a bird for bazooka.

oli5679 · on Nov 9, 2019

I really appreciate the philosophy of defining metric, and measuring performance going from simple to complex methods.

I've also used Prophet library, and find it works well out of the box.

TrackerFF · on Nov 9, 2019

Gaussian Processes seems to be pretty popular, might want to include that to the comparisons.

sanxiyn · on Nov 9, 2019

Gaussian process is marvelous. Demos like https://statmodeling.stat.columbia.edu/2012/06/19/slick-time... just seem magical.

jldugger · on Nov 10, 2019

Anyone patched prophet into graphite? Curious if the two are easily combined.

person_of_color · on Nov 10, 2019

This is one type of time series, but how about audio time series?