Hacker News new | past | comments | ask | show | jobs | submit login

Saying that linear regression is easier to do properly than more complex methods like random forests, DL, boosting etc is like saying that people should code assembly instead of python



This is a false dichotomy. Both OLS regression and, say, random decision forest regression have the same objective (predict values) and achieve it with similar means (build a generative model / function). They solve the same problem. Contrastingly, assembler and python are broadly aimed at completely different use cases.

Broadly, whether you should move from OLS to random forest regression = SNR increase / increase in manhours and money spent.


It is actually much easier to apply a random forest (or really gradient boosted decision tree, which almost strictly dominates random forests) than a linear regression. Decision tree methods require far less data preprocessing than linear regression, because the model is able to infer feature relationships. Obviously if your features are linearly related to your target than linear regression is much more viable.


This is absolutely true, the one caveat is that you can explain the significance of features and the relationship to the response variables in simpler terms.


Technically, it is an incorrect analogy not a false dichotomy. A false dichotomy means an incorrect assertion that you have to choose X or Y in a situation.

The GP compares python-vs-assembler and random forests-vs-linear-regression but the analogy breaks because python produces assembler and increases the programmer's general certainty concerning what they are doing. Random forests don't make their user more certain of the results as an application. Basically, Python is a relatively "unleaky" abstraction whereas complex AI algorithms a very "leaky" abstractions.


Simple regression has a lot going for it beyond simply being a simpler model. For example, it produces models that are easy to interpret. That is an enormous advantage if you're looking to use data science to help drive strategic decisionmaking.

My personal suspicion is, in a market full of people who are using ever more sophisticated algorithms to ratchet up their customer conversion classifiers' F1 scores by .001 per iteration, the leader will be the company who's decided to steer clear of that quagmire and spend their time and money on identifying new business opportunities instead.


I think calling linear regression simple is misleading. While the algorithm is simple, interpreting the results and not falling into one of the many traps is quite difficult!


His comment was relative to ML, not an absolute.


Mine was also. I am just pointing out that even with a simple algorithm like linear regression you can run into all kind of issues and it's not always obvious. Sometimes a slightly more complex method is easier to get right in the long run. It depends on many factors.


This seems to be implying that it's easier to code in assembly than Python, which I'd disagree with


You might be being really subtle about python there ! I realize that one can create lines of code like :_ = ( 255, lambda V ,B,c :c and Y(VV+B,B, c -1)if(abs(V)<6)else ( 2+c-4abs(V)-0.4)/i ) ;v, x=1500,1000;C=range(vx );import struct;P=struct.pack;M,\ j ='<QIIHHHH',open('M.bmp','wb').write for X in j('BM'+P(M,vx3+26,26,12,v,x,1,24))or C: i ,Y=_;j(P('BBB',(lambda T:(T80+T9 i-950T 99,T70-880T18+701 T 9 ,Ti(1-T452)))(sum( [ Y(0,(A%3/3.+X%v+(X/v+ A/3/3.-x/2)/1j)2.5 /x -2.7,i)*2 for \ A in C [:9]]) /9) ) ) (stolen from : http://preshing.com/20110926/high-resolution-mandelbrot-in-o...)

which could be seen as retrograde vs assembler (but not really for the very funny and brilliant code above - you have to see in formatted nicely and run it to realize that there are some great people out on the web!) perhaps in fact I would agree with this dig - some people do write horrid bits in their python code and python seems to facilitate (or enable) this behavior rather more than other modern languages like Julia. But taking your comment more at face value, reading it to say that more complex methods represent an evolution and that they should be accessed by users as they are easier or better I would disagree. It is easy to screw things up with a random forest or a booster in the sense of overfitting, focusing on the method and not the features and not understanding what the model extracted is telling you about the data. Often a regression model or a decision tree can reveal that there are a few simple things going on which say more about how a process or system has been implemented than the generating domain that that process or system is operating in. This can be gold dust. So, I think that they can be easier to use and simpler to understand, of course when they don't do the job better model generators are required.


You might want to do code formatting for that. Look at the help[0].

[0] https://news.ycombinator.com/formatdoc


Yup, sorry timed out on edit...


Can you elaborate?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: