Hacker News new | past | comments | ask | show | jobs | submit login
Sorry ARIMA, but I’m Going Bayesian (stitchfix.com)
173 points by stared on April 23, 2016 | hide | past | favorite | 34 comments



Thanks for this. If you wanted to get started doing this kind of analysis in python check out the examples for the maturing pyMC3

https://pymc-devs.github.io/pymc3/stochastic_volatility/


I second for PyMC3 - a great library. And just recently I went through https://github.com/markdregan/Bayesian-Modelling-in-Python.


I found it quite interesting to compare HN and /r/statistics in terms of the tone difference in the comments when discussing this article: https://www.reddit.com/r/statistics/comments/4fw1bn/sorry_ar...


Because "data science".


This is a really interesting post. I have to say the data science team at Stitchfix are clearly doing really good, applied work that is central to their business. It's so cool to see.

Here's a tip for any R users who read through the code and (like me) is pained by repetition. Instead of using

    library(lubricate)
    library(bts)
    library(...)
Just use apply!

    packages <- c("lubricate", "bts", "...")
    lapply(packages, library, character.only = TRUE)


or

  library(pacman)  
  pacman::p_load("lubridate", "bts", ...)
with the added benefit of installing missing packages (I find this especially useful because my school's computer lab deletes user-installed packages weekly)


That doesn't seem to me to be a huge efficiency win.


In terms of managing packages in session, readability and keystrokes, I think it definitely wins out over successive library calls. It's common to have >5 packages in any one script, especially if you avoid base R like many do.


it's clever but is it more readable?

these sorts of discussions, people who write blog posts about 'library' vs. 'require', kind of feel like 'R smell'.

wouldn't it be better for a language to just take a list of libraries for import, maybe with a readable syntax?

so... have we got to where Julia can re-use R packages yet?


I understand where you're coming from, but these two lines are exactly 'taking a list of libraries for import'.

packages is a character vector of package names, lapply is by definition 'list apply'. We're taking a list of packages and applying the library function on them.

This seems complicated if you're not used to it but R is a functional language. Approaching R from this perspective makes it a powerful, flexible.


Check out rcall.jl


Like others, I enjoyed this.

I work with time series data every day in the domain of commercial real estate. One of my constant struggles is to extract an underlying long-term trend from the real estate cycle. I would love to try this here.

Can anyone suggest some Bayesian learning resources for a non-statistician?


"Statistical Rethinking" by Richard McElreath is really nice, light on maths but heavy on insights. He uses the book in a course and there are recordings of the lectures available on youtube.

http://xcelab.net/rm/statistical-rethinking/

Edit: neither this book nor Kruschke's are going to help you with time series in particular. Bayesian time series methods are based on state space models and are relatively complex (http://www.eurasip.org/Seminars/Tutorials/EUSIPCO2014%20Tuto...). You might want to read some introductory text on time series covering state space models first.

Edit again: The full text of "Bayesian Filtering and Smoothing" by Simo Sarkka is available online: http://users.aalto.fi/~ssarkka/pub/cup_book_online_20131111....


Having read many books/online courses, I strongly recommend this: http://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/dp...

It gives a gentle but through introduction from first principles; lots of good intuition and 'why'.

It works well with "Probabilistic Programming & Bayesian Methods for Hackers" also mentioned, but I'd start with this. It is much more accessible than many other introductory books, IMO.



Another open source Bayesian book/course all in python: http://www.greenteapress.com/thinkbayes/t


If you can program, this is the best probability book. I think you messed up the URL, though:

http://greenteapress.com/wp/think-bayes/


Thanks a ton you all posters of these book names and links. Helped a lot.



Thank you!


Really nice story. All the Bayesian stuff now gets swamped by big data method. There must be a time machine learning people will come back to the Bayesian womb. :-)

I'm currently using it to define priors on measure spaces. I think it's awesome to have so few abstractions in a discipline and be able to do inference anyway. I'd definitely recommend to look into Dirichlet Processes if you haven't before. It's a nice entry point.


Wharton's Pete Fader would agree with you somewhat that big data-style methods are overblown: http://www.datanami.com/2012/05/03/wharton_professor_pokes_h...

he also has some very cool uses of dirichlet processes (modelling entire competitive industries)


I was very surprised when I went to the root domain to see what stitchfix is. Completely not what I was expecting! I guess my model didn't predict a clothes delivery website to be writing about cool stuff like this. But now a bayesian update to this belief is in order :)


I like the content.

Does anyone else have a _really_ hard time reading the light gray-on-white font?


It is ironic that your off-topic comment lead to down-votes, reducing the contrast of your comment. I can't imagine the nightmare you must be living right now.


Mostly, I'm unconcerned about karma. But at present, the contrast is fine. I suppose the vote count has increased.


There definitely isn't sufficient contrast.


I liked it on iPhone 6 plus


Excellent walkthrough I'll be sure to give this a try on my next project.

Any good references for an intro to the traditional arima models?



This book is extremely practical - I would definitely recommend it for doing actual time series analysis. That said, it's not a learn time series book. It's a do basic time series in R book.


Hyndman's online forecasting book https://www.otexts.org/fpp has an ARIMA section that covers pretty much everything a practitioner would need.


Try one of the forecasting books on Frank Diebold's page: http://www.ssc.upenn.edu/~fdiebold/Textbooks.html

I use the Elements of Forecasting book in an undergrad forecasting class that I teach. Forecasting will be better, but I'm not sure if it's completed.


This was very interesting to read, especially as I'm currently taking a course in time series analysis. The course doesn't touch on any Bayesian treatment of the topic at all, and barely goes beyond non-stationary models (e.g. ARIMA.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: