How should a software engineer with no machine learning background get started on the subject?
Do you think that getting started by learning a framework like TensorFlow is a good idea or should I gain a background knowledge first?
If you want to jump right in with "hello world" type TensorFlow (a tool for machine learning), see https://news.ycombinator.com/item?id=12465935 (how to fit a straight line using TensorFlow)
Unlike some of the other complicated tools, sklearn is just a "pip install" away and includes all sorts of examples of different problems. Classification? Regression? Clustering? Representation learning? Perceptual embedding? Odds are, some part of sklearn covers all of that.
The scikit-learn tutorials are great. Another nice thing about scikit-learn is that the api for a lot of different ML algorithms is very similar, almost identical.
This means that you can set up a train and and test set and swap in and out random forest, svm, naive bases, logistic regression, and various others.
Read about them one by one, try to understand the algorithms generally, test them out, see how they perform differently on different data sets.
It all depends on how you like to approach a new subject, but I think this is more fun and motivating than going straight into the mathematics behind the algorithms right away (which is more along the lines Andrew Ng's excellent course). I'd say once you're into it and using the algorithms, then dig deeper into the core mathematics, you'll have a better context for it.
I would be pretty hesitant to start talking about TensorFlow and Deep Learning before confirming, for example, at least a rudimentary understanding of Linear Algebra.
What is a good "hello world" project for machine learning? That is, what problem can I solve or question can I answer with minimal ceremony, and ideally with multiple techniques / technologies so that I can compare them? Is it this house price estimation like in your last link, or is there something better than that?
The Iris data set [1] is very famous and a popular way to test out classification techniques. It's not "big data", but can be used to familiarize yourself with some basic data mining techniques.
Kaggle has a number of starter challenges. See https://www.kaggle.com/c/titanic for one related to predicting the survival of passengers on the Titanic.
Lol. Predicting the survival of passengers on the Titanic is meaningless and misleading - there is literally no connection to reality, despite the framing of the task which suggests a certain connection. There is absolutely nothing that could be predicted. It is just a simulation of oversimplified model which describes nothing, but an oversimplified view of a historical event. It is as meaningless as the ant simulator written by Rich Hickey to demonstrate features of Clojure - it has that much connection to real ants.
If you work through the data, you'll find things like women, children and first class passengers had a higher survival rate than men with lower class tickets[1].
This matches exactly the stories of what happened: Staff took first class passengers to the lifeboats first, then women and children. Then they ran out of lifeboats.
So the data shows correlation, and eye-witness accounts shows causation. That's close to the ideal combination: eyewitness accounts can be unreliable because we can't know how widespread they are, and correlation doesn't show causation.
But the combination of them both is pretty much the best case for studying something which can't be replicated.
This is only one of many aspects of that event. The data reflects that the efforts of organized evacuation in the beginning were efficient.
But any attempt to frame it as a "prediction", an accurate model of the event or adequate description of reality is just nonsense.
To call things by its proper names (precise use of the language) is the foundation of the scientific method. This is mere oversimplified, non-descriptive toy model of one aspect of historical event, made from of statistics of partially observable environment. A few inferred correlations reflects that there was not a total chaos, but some systematic activity. No doubt about it. But this is absolutely unscientific to say anything else about the toy model, let alone claim that any predictions based on it have any connection to reality.
There is clear correlation between gender and survival rates. Given the data, a decent prior would absolutely take that into account.
Yes, there are other factors. But the foundation of statistical models is simplification, and descriptive statistics are an important foundation of that.
In any case, it isn't exactly clear that there are magical hidden factors which predicted survival. It appears you maybe unfamiliar with the event, because basically those who got into a lifeboat survived, and those who didn't, didn't survive.
To quote Wikipedia:
Almost all those who jumped or fell into the water drowned within minutes due to the effects of hypothermia.... The disaster caused widespread outrage over the lack of lifeboats, lax regulations, and the unequal treatment of the three passenger classes during the evacuation..... The thoroughness of the muster was heavily dependent on the class of the passengers; the first-class stewards were in charge of only a few cabins, while those responsible for the second- and third-class passengers had to manage large numbers of people. The first-class stewards provided hands-on assistance, helping their charges to get dressed and bringing them out onto the deck. With far more people to deal with, the second- and third-class stewards mostly confined their efforts to throwing open doors and telling passengers to put on lifebelts and come up top. In third class, passengers were largely left to their own devices after being informed of the need to come on deck.
Even more tellingly:
The two officers interpreted the "women and children" evacuation order differently; Murdoch took it to mean women and children first, while Lightoller took it to mean women and children only. Lightoller lowered lifeboats with empty seats if there were no women and children waiting to board, while Murdoch allowed a limited number of men to board if all the nearby women and children had embarked
All this behavior matches exactly what the model tells us about the event.
I'd be very interested if you can point to something specific that is wrong about it.
I think you are making the point for him. If you look at the predictive models people make on these, they make a big deal about your sex and status being the main indicators of who survived. The reality is that the main causal indicator for survival was access to a lifeboat.
Now, it so happens that that correlated heavily with class. But, not as much as with sex. Though, there were some places where being male hurt your chances (as you point out in the one officer not allowing men on boats), by and large these were secondary and correlated with success, not predictors of it.
The Titanic's sister ship (the Brittanic) was torpedoed during WW1 and sunk. However, the lesson of the Titanic (too few lifeboats) had been learnt, and only 26 people died.
I don't know what point you are trying to make - yes, I agree that history never repeats, but lessons can be learnt from it, and they can be quantified and they can be useful.
>> The Titanic's sister ship (the Brittanic) was torpedoed during WW1 and sunk. However, the lesson of the Titanic (too few lifeboats) had been learnt, and only 26 people died.
This happened because they made a _statistical_ model of the Titanic disaster, and learned from it? Like, they actually crunched the numbers and plotted a few curves etc, and then said "aha, we need more boats"?
I kind of doubt it, and if it wasn't the case then you can't very well talk about a "model", in this context. It's more like they had a theory of what factor most heavily affected survival and acted on it. But I'd be really surprised to find statistics played any role in this.
This happened because they made a _statistical_ model of the Titanic disaster, and learned from it?
No - statistics as the discipline that we think of today wasn't really around until the work of Gosset[1] and Fisher[2] which was done a few years after this.
I'm sure you noted that I was very careful with what I claimed: "the lesson of the Titanic (too few lifeboats) had been learnt".
These days we'd quantify the lesson with statistics. Then, they didn't have that tool.
Instead, we have testimony[3] relaying the same story: Just one question. Have you any notion as to which class the majority of passengers in your boat belonged? - (A.) I think they belonged mostly to the third or second. I could not recognise them when I saw them in the first class, and I should have known them if there were any prominent people. (Q.) Most of them were in the boat when you came along? - (A.) No. (Q.) You put them in? - (A.) No. Mr. Ismay tried to walk round and get a lot of women to come to our boat. He took them across to the starboard side then - our boat was standing - I stood by my boat a good ten minutes or a quarter of an hour. (Q.) At that time did the women display a disinclination to enter the boat? - (A.) Yes."
So yes, I agree - it was a theory, which our modern modelling tools can show matched well with what the statistics showed happened.
My whole point is that this is very useful, unlike the OP who dismissed it as useless.
OK, tell me, please, what it is that you can predict? That some John Doe, having the first class ticket in a cabin next to the exit would survive the collision of the next Titanic with a new iceberg? That being a woman gives you better chances to secure a seat in a lifeboat? What is the meaning of the word "predict" here?
There is a very important notion from The Sciences of the Artificial book by Herbert A. Simon, that the visible (to an external observer) behavior of an ant (its tracks, if you wish) is not due to its supposed intelligence, but mostly due to the obstacles in the environment.
Most of the models mimic and simulate (very naively) that observable behavior, not its origin.
When people cite "the map is not the territory" they mean this. Simulation is not even an experiment. It is mere an animation of a model - a cartoon.
It is swarm intelligence: How does the system keep finding successful paths in a changing environment? Can we take inspiration from this behavior to create better optimization algorithms?
Why not. I remember a paper which compares the behavior of foraging ants (they send more or less ants according to the rate of returned with food) to adjustment of the window size based on the data rate in TCP.
Simulations are not experiments. It is an animation of a formalized imagination, if you wish.
Because otherwise it should be called machine hallucinations?
The process of learning could be defined as a task of extraction of relevant information (knowledge) about reality (shared environment) not mere accumulation of a fancy nonsense or false beliefs.
So knowledge like: Did the passenger have kids on board? Was the passenger nobility? Was the passenger travelling first class? Where was the passenger located on the ship after boarding? And how do these factors influence survivability?
And reality like: The actual sinking of the Titanic?
If your model concludes that nobility, traveling first class, close to the exits, without family, has a higher chance of surviving, then this is fancy nonsense or a false belief?
Correlations does not imply causation. There were many more relevant but "invisible" variables, which, probably, related to some genetic factors, like ability to sustain exposure to the cold water, ability to calm oneself down to avoid panic and self-control in general, strong survival instinct to literally fight the others, etc. The variables you have described, except the age of a passenger, are visible but irrelevant. And pure luck must have a way bigger weight and it is, obviously, related to the genetic favorable factors, age, health and fitness.
This challenge is not about causal inference. I do agree it is more of a toy dataset, to get started with the basics, and that there are a lot of other variables that go into survivability. But to say these variables, except for age, are irrelevant is mathematically unsound: You can show with cross-validation and test set performance that your model using these variables generalizes (around 0.80 ROC AUC). You can do statistical/information theoretical tests that show the majority of these variables is a significant signal for predicting the target.
In real life it is also very rare to have free pickings of the variables you want. Some variables have to substituted with available ones.
The Titanic story is to make things interesting for beginners. One could leave out all the semantics of this challenge, anonymize the variables and the target, and still use this dataset to learn about going from a table with variables to a target. In fact, doing so teaches you to leave your human bias at the door. Domain experts get beaten on Kaggle, because they think they need other variables, or that some variables (and their interactions) can't possibly work.
Let the data and evaluation metric do the talking.
>> Domain experts get beaten on Kaggle, because they think they need other variables, or that some variables (and their interactions) can't possibly work.
That sounds a bit iffy. A domain expert should really know what they're talking about, or they're not a domain expert. If the real deal gets beaten on Kaggle it must mean that Kaggle is wrong, not the domain expert.
Not that domain experts are infallible, but if it's a systematic occurrence then the problem is with the data used on Kaggle, not with the knowledge of the experts.
I mean, the whole point of scientific training and research is to have domain experts who know their shit, know what I mean?
> Since our goal was to demonstrate the power of our models, we did no feature engineering and only minimal preprocessing. The only preprocessing we did was occasionally, for some models, to log-transform each individual input feature/covariate. Whenever possible, we prefer to learn features rather than engineer them. This preference probably gives us a disadvantage relative to other Kaggle competitors who have more practice doing effective feature engineering. In this case, however, it worked out well.
> Q: Do you have any prior experience or domain knowledge that helped you succeed in this competition? A: In fact, no. It was a very good opportunity to learn about image processing.
> Do you have any prior experience or domain knowledge that helped you succeed in this competition? I didn't have any knowledge about this domain. The topic is quite new and I couldn't find any papers related to this problem, most probably because there are not public datasets.
> Do you have any prior experience or domain knowledge that helped you succeed in this competition? M: I have worked in companies that sold items that looked like tubes, but nothing really relevant for the competition. J: Well, I have a basic understanding of what a tube is. L: Not a clue. G: No.
> We had no domain knowledge, so we could only go on the information provided by the organizers (well honestly that and Wikipedia). It turned out to be enough though. Robert says it cannot happen again, so we’re currently in the process of hiring a marine biologist ;).
> Through Kaggle and my current job as a research scientist I’ve learnt lots of interesting things about various application domains, but simultaneously I’ve regularly been surprised by how domain expertise often takes a backseat. If enough data is available, it seems that you actually need to know very little about a problem domain to build effective models, nowadays. Of course it still helps to exploit any prior knowledge about the data that you may have (I’ve done some work on taking advantage of rotational symmetry in convnets myself), but it’s not as crucial to getting decent results as it once was.
> Oh yes. Every time a new competition comes out, the experts say: "We've built a whole industry around this. We know the answers." And after a couple of weeks, they get blown out of the water.
Competitions have been won without even looking at the data. Data scientists/machine learners are in the business of automating things -- so why should domain knowledge be any different?
Ok, sure it can help, but it is not necessary, and can even hamper your progress: You are searching for where you think the answer is -- thousands are searching everywhere and finding more than you, the expert, can.
How does this not violate [1]? That is, this seems specifically anti-statistical. The best you can come up with on this is a predictive model that you then have to test on new events. In this case, that would likely mean new crashes.
Because we are not doing hypothesis testing, we are doing classification on a toy dataset. Sure, one could treat this as a forecasting challenge, but then one would need another Titanic sinking in roughly the same context, with the same features... That demand is as unreasonable as calling this modeling knowledge competition meaningless.
And if you see classification as a form of hypothesis testing, then cross-validation is a valid way of testing if hypothesis holds on unseen data.
I think that is a rub. With the goal just being to find some variables that correlate together, it is a neat project. But, ultimately not indicative of a predictive classification. If only due to the fact that you do not have any independent samples to cross validate with. All samples being from the same crash.
This would be like sampling all coins from my pockets and thinking you could build a predictive model of year printed to value of coin. Probably could for the change I carry. Not a wise predictor, though.
You are right, but only in a very strict, not-fun, manner :). Even if we had more data on different boats sinking, the model would not be very useful: We don't go with the Titanic anymore and plotted all icebergs. Still, if a cruise ship were to go down, I'd place a bet on ranking younger women of nobility traveling first class higher for survivability than old men with family traveling third class, wise predictor or no.
> You can show with cross-validation and test set performance that your model using these variables generalizes (around 0.80 ROC AUC).
It shows only that given set of variables (observable and inferred) could be used to build a model. The given data set is not descriptive, because it does not contain more relevant hidden variables, so any predictions or inferences based on this data set are nothing but a story, a myth made from statistics and data.
I don't know anything about tensor flow except the very tip of the iceberg.
Can you know nothing about ml, ai, data analysis, and stats then give tensor flow some input and it will give you some input and pretty much apply it to your app?
Or do you have to know these subjects before even starting tensor flow?
It's OK to jump in and try it without having background information. See how far you get and start researching when you hit a wall or find sudden interest.
- using Python in the interactive environment Jupyter Notebook,
- starting with classical machine learning (scikit-learn), NOT from deep learning; first learn logistic regression (a prerequisite for any neural network), kNN, PCA, Random Forest, t-SNE; concepts like log-loss and (cross-)validation,
- playing with real data,
- it is cool to add neural networks afterwards (here bare TensorFlow is a good choice, but I would suggest Keras).
Instead, learn decision trees and more importantly enough statistics so you aren't dangerous.
Do you know what the central limit theorem is and why it is important? Can you do 5-fold cross validation on a random forest model in your choice of tool?
Fine, now you are ready to do deep learning stuff.
The reason I say not to do neural networks first is because they aren't very effective with small amounts of data. When you are starting out you want to be able to iterate quickly and learn, not wait for hours for a NN to train and then be unsure why it isn't working.
I don't think it's a good strategy to discourage people from diving right in. There are many courses and books out there that are suitable even for a beginner who wants to learn about NN.
Of course it's important to get a broad horizon eventually but starting with the theory without the applications is not how most humans learn best. Learning by doing is.
The problem with diving into neural networks is that they are slow to train (with large amounts of data anyway), and difficult to debug. This means it isn't really a great place to start.
There are some good, short, MOOC courses on statistics and probability on Coursera these days. I've been working my way through the Duke sequence with Mine Çetinkaya-Rundel and have found them very helpful. The courses correspond with the material in this OpenIntro text:
2. Yaser Abu-Mostafa’s Machine Learning course which focuses much more on theory than the Coursera class but it is still relevant for beginners.(https://work.caltech.edu/telecourse.html)
This book is great, but if your stats background isn't quite up to snuff, it can be an intimidating first-read.
Personally, I studied Duda & Hart's pattern recognition [1] and Casella & Berger's statistics text [2] simultaneously. This took about the equivalent of 2 semesters. Duda's text gets the main ideas across without being as heavy on the probability theory / stats.
Afterwards, I studied "Elements ..." by Hastie et al., which was far more readable after going through Casella & Berger's text. Now Hastie et al. is my go-to reference. I also should note that this all assumes that you also have the requisite math background: up to calc 3, linear algebra, and maybe some exposure to numerical methods (in particular, optimization).
Everyone keeps linking ESL, but really ISLr is much easier to understand, provides more important clarifying context, and covers more or less the same information.
ESL is more like a reference and prototype for ILSr
I took the summer off to learn enough ML to transition from a career in software engineering & product / leadership type roles to ML. I suggest for a first round learning practical tools and techniques so you can start applying supervised learning techniques right away while also starting to build a more solid foundation in probability & statistics for future deeper understanding of the field. I've written about my curriculum here with lot's of specific resources here:
Yeah, I'm recently started as a research engineer in a lab at University of Michigan doing self-driving car stuff, will update the website with more info and post-summer reflections within a couple weeks.
Ha, well it was a publicly listed job, but I got pointed to it and eventually introduced to the profs in the course of networking with ML folks in town. I can't speak to how many applicants.
I very much like Michael Nielsen's book Neural Networks and Deep Learning. It has a great introduction with examples and code you can run locally. Really nice to get started. http://neuralnetworksanddeeplearning.com
It depends on what your goals are. If you'd like to become an ML Engineer or Data Scientist, Tensorflow should be last thing you learn. First, develop a solid foundation in linear algebra and statistics. Then, familiarize yourself with a nice ML toolkit like Scikit-Learn and The Elements of Statistical Learning (which is free online). The rest is a distraction.
In addition to the linear algebra and statistics MOOCS mentioned, I'll also add:
Gain background knowledge first, it will make your life much easier. It will also make the difference between just running black box libraries and understanding what's happening.
Make sure you're comfortable with linear algebra (matrix manipulation) and probability theory. You don't need advanced probability theory, but you should be comfortable with the notions of discrete and continuous random variables and probability distributions.
If you like books, "Pattern Recognition and Machine Learning" by Chris Bishop is an excellent reference of "traditional" machine learning (i.e., without deep learning).
This online book is a very good resource to gain intuitive and practical knowledge about neural networks and deep learning:
http://neuralnetworksanddeeplearning.com/
I think you should start with a real world problem that is really important to a company that you work for. The problem might be one common to many businesses but unique to that business. For instance, demand forecasting, every business is different as are the signals needed for accurate demand forecasting.
So you could start with some really simple example code for demand forecasting but where you put in your data and your signals. In this way you can learn what you need to solve a particular problem, 'getting lucky' from only having to adapt examples. Sure it might be nice to learn all the fundamentals first but it is sometimes nice to scratch an itch, every company has plenty, choose one and see how far you get and learn along the way.
Has some great links if you already have some knowledge about software engineering and want to get into Machine Learning
Josh Gordon from Google also has a extremely nice handson "how to start with Machine Learning" course on YouTube featuring scikit-learn and TensorFlow:
I'd be more interested in real life results on a small scale first.
I too felt like ML is something new to try, but the lack of real world use cases on a small scale ( not google, Microsoft, ... ) Has kept me from trying/doing.
I only saw the farm with image recognition for vegetables as an example for now.
I come from finance, so for me it is always market prediction (however, the important thing is to approach this as a learning opportunity, not as a way to make profits -- for that, there are many orthogonal technical issues to solve).
Numerous ML competitions also provide enough fun to get started.
Get some background knowledge; I think with a topic like machine learning it's important to understand why certain algorithms work better than others on different kinds of data. I would recommend following a structured course. Andrew Ng's, or the UC berkley one are good. Tom Mitchell's Machine Learning book is a great intro too to supplement the online course of your choice.
If you're a python dev, maybe download scikit-learn and see what kinds of things you can put together after a few lectures.
By all means get some "background knowledge" (linear algebra, statistics, calculus etc), play around with libraries and follow some MOOC, but primarily I'd suggest you go get yourself a post-graduate degree from a brick-and-mortar university, and in a course called "Data Science" or "Artificial Intelligence" and the like.
You can learn on your own, of course, but a university course will focus your learning, provide rich feedback, and give you a strong foundation on which to build. You'll also get to learn from other students, which is not often the case in MOOCs. And there's nothing like having a teacher on your payroll (which is essentially what paying for a course is) to answer your questions, clarify obscure areas in books and generally support you throughout the course.
For the record- I did exactly what I say above. After five years working in the industry as a dev, I took a Masters part-time, sponsored by my employer. I think I got a good foundation as I say above, and I certainly didn't have the time, or the focus, to learn the same things on my own.
And I did try on my own, with MOOCs-and-books for a while. I did learn useful stuff (the introductory AI course from Udacity for instance, was really helpful) but after starting the Masters it felt like all this time I'd been crawling along without aim, and now I was running.
Skimmed through this and didn't see Kaggle. They have a great intro competition to take part in. Great community and great way to get stuck in.
https://www.kaggle.com
While some people might not agree with me, I'd say focus on the Math. Machine learning may be easy to use with these toolkits but doing something useful with it will require deeper understanding.
This specific topic/question comes us frequently enough that I feel like we should either make this thread the canonical answer or have another pointer that we can generally agree upon to point people in that direction.
I think it's important for people to know where to go for good resources, but this exact question keeps coming up incessantly.
Take a class on linear algebra. Learn how to use matlab or octave. Knowing these two interdependent subsets of knowledge before diving into machine learning is absolutely indispensable as far as I can tell. I would've gotten so much more out of Ng's class if I knew this stuff beforehand
To get intuition and the right foundation read Society of Mind. For me the book is more about thinking in terms of computation which is what (IMO) ML is about instead of statistics (of which is important to know too!).
Now practical: I think the best way to learn is pick an algorithm & representation and implement it in your favorite language. Bonus if you have your own language to work with.
I would start looking into Decision Trees first, implement them and then implement some use cases(, which follow after implementing them). Do this for other approaches, like ANN, which you can have it beat you at checkers which is strangely satisfying.
But keep in mind Minsky. I think he is like Archimedes doing "Calculus"-type approaches without fully realizing. Maybe you could be Newton?
A good start in "classical" methods" (i.e.: before deep learning and convolutional neural networks) is the old standby, the Weka Data Mining library [1]. Along with the textbook, it will make you comfortable with methods like k-nearest neighbor, support vector machines, decision trees, and the like.
How is your programming background? Do some contest on hackerrank and gauge your skill because machine leaning uses lots of algorithm from math + computer science (eg computational geometry).
Machine learning is basically writing some math in code and running experiment and statistically reason about result. If you really want to do that then you need to have a background in math + statistics + software development.
Hello, I am a Python programmer and daily I am working using Python to extract, analysis and mount data sets in my job.. I am trying to study machine learning alone using udacity and the books programming collective inteligence as my materials from study. What the recommendation to understand and learn math and statics that are used in machine learning concepts ?
Slight tangent, so bare with me. Every other week, posts such as this come up, asking how to learn X, so I was wondering if there is any Github repo or some website that keeps track of all the resources posted here?
There is a nice example of machine learning with python and R in Analytics Vidhya and other tutorials, also ISLR introduction to statistical learning with R gives you an overview of some standard methods.
It depends on what you really want to do in the future. Learning a framework could be useless if you don't know how to do correctly basic things as creating a train, test and validation set.
There are basic things I think you must know before jumping into a framework or int any specific algorithm. First thing you probably will have to do is to collect the data and clean it. In order to do this correctly you need some basic statistics. For example you need to know what is a gaussian distribution and collect samples in a way that are representative of your problem. Then you may need to clean the samples, remove outlines, complete blank data, etc. So it is basic you know some statistics to do this right.I have seem people with a lot of knowledge of tools than then they are not able to create a train/test/validation set correctly and the experiment is completely invalid from here no matter what you do next (http://stats.stackexchange.com/questions/152907/how-do-you-u..., https://www.youtube.com/watch?v=S06JpVoNaA0&feature=youtu.be ). You also need to know how are you going to test your results, so again you need to know how to use a statistical test (f-test, t-test). So first thing, jump into statistics to understand your data.
The next step I think is to know some common things in machine learning as the no free lunch theorem, curse of dimensionality, overfitting, feature selection, how to select the current metric to asses your model and common pitfalls. I think the only way to learn this is reading a lot about machine learning and making mistakes by your own. At least now you have some things to search in google an start learning.
The third step would be to understand some basic algorithms and get the feeling of the type of algorithms, so you know when a clustering algorithm is needed or your problem is related classification or with prediction. Sometimes a simple random forest algorithm or logistic regression is enough for your problem and you don't need to use tensorflow at all.
Once you know the landscape of the algorithms I think it is time to improve your maths skills and try to understand better how the algorithms works internally. You might not need to know how a deep network works completely, but you should understand how a neural network works and how backpropagation works. The same with algorithms as k-means, ID3, A*, montecarlo tree search or most popular algorithms that you are probably are going to use in day to day work. In any case you are going to need to learn some calculus and algebra. Vectors, matrix and differential equations are almost everywhere.
You would probably have seen some examples when learning all the stuff I talked about, then it is time to go to real examples. Go to kaggle and read some tutorials, read articles about how the community of kaggle has faced and winning the competitions. From here is just practice and read.
You can jump directly into a framework, learn to use it, have 99% accuracy in your test and 0% accuracy with real data. This is the most probably scenario if you skip the basic things of machine learning. I have seen people doing this and end up very frustrated because they don't understand how their awesome model with 99% accuracy doesn't work in the real world. I have also seen people using very complex things as tensorflow with problems that can be solved with linear regression. Machine learning is a very broad area and you need maths and statistics for sure. Learn a framework is useless if you don't understand how to use it and it might lead you to frustration.
You should have the equivalent of an undergraduate degree in mathematical statistics (calculus, linear algebra, et al). It should take about 4 years of full time study to achieve that.
Forget about the code part. It's the least difficult part.
I think that this is a horribly impractical advice, and I keep seeing it everywhere.
With modern tools and frameworks you can start learning and applying what you know on practice almost immediately.
Check out Keras and the book "Deep Learning with Python"[1]. They have enabled me to train my first ANN in 2 days, and get to the point of building a MNIST recognizer in a month(and I was reading it pretty slowly).
Sure, if you're coding it from scratch and must understand every signle detail, you do need like 10 years and 3 PhD's. But that's not a wise way to learn.
I recommend to take the simplest tools, and apply them to practical projects immediately. That will give you the general overview of how things work, and then you will learn the details as needed.
If you don't understand how it works your won't understand how to optimize things, how to do error analysis, how to implement better features and weights out of the box, how to choose the right algorithm from the start, how to do good cross validation ...
Yes, you can take a library and implement it in 10 minutes, but then you're really not learning machine learning, are you?
I will argue you do not need four years of math by any stretch, though. The stumbling block will be notation more than anything else. Relatively basic calculus and linear algebra will suffice.
They were right about one thing: the code is the least important part.
In practicality the OP is right. You won't be on the same level as people with a PhD in a corporate or applied setting. The hardest parts are feature engineering, researching and statistical analysis (presenting research to team). It's hard to gain all those skills without years of experience researching in an academic setting.
As an undergrad, I was doing all those easy ML tutorials and took an undergrad level ML course. I thought I would be useful in actual practice, but knowing the whats/hows of neural nets/clustering/etc. is not enough. Feature engineering/math is the most difficult part. In a corporate setting, if it was a straight forward solution, you wouldn't be doing that work because the solution would be trivial and already implemented.
As an engineer with only a bachelors on a ML team full of PhDs there is a definite difference in skill. I've been reduced to a monkey (a content one) that works on the data pipeline. Learning to deal with real world ML problems would take me years of work that I am not sure I would be willing to do, especially when the pay increases per effort expended learning ML is much lower than with regular software/distributed systems/etc..
On the interest part, you're right that I would never have tried to learn ML if I had known the amount of work that is required to actually be good or if I tried learning the math first. That's the real world though. The useful ML engineers did learn the math. The efficient way to learn ML is to learn the math/statistics first.
IMO the best way to get started (like with anything) is by getting started. I think the way you make progress is going to come down to you personally as an individual and what your motivations are. Before learning ANYTHING new i would invest some time in learning how to learn. There is a good coursera course on this https://www.coursera.org/learn/learning-how-to-learn and the book by the course authors is incredibly useful for putting a framework with some techniques that can help the approach to learning any new skill. This is not meant to be condescending advice but for me personally it's changed the way i go about learning any new skill now.
I think as well it really depends where you are coming from / what your background is. The reason i say this is i have recently gone through a similar transition into machine learning 'from scratch' except once i got there i realised i knew more than i thought. My academic background is in psychology / biomedical science which involved a LOT of statistics. From my perspective once i started getting into the field i realised there are a lot of things i already knew from stats with different terms in ML. It was also quite inspiring to see many of the eminent ML guys have backgrounds in Psychology (for instance Hinton) meaning i felt perhaps a bit more of an advantage on the theoretical side that many of my programming peers don't have.
I realise most people entering the field right now have a programming background so will be coming at things from an opposite angle. For me i find understanding the vast majority of the tests and data manipulation pretty standard undergraduate stuff (using python / SK Learn is incredible because the library does so much of the heavy lifting for you!). Where i have been struggling is in things that an average programmer probably finds very basic - it took me 3 days to get my development environment set up before i could even start coding (solved by Anaconda - great tool and lessons learned). Iterating over dictionaries = an nightmare for me (at first anyway, again getting better).
I think (though i may be biased) it's easier to go from programming to ML rather than the other way around because so much of ML is contingent on having decent programming skills. If you have a decent programming skill set you can almost 'avoid' the math component in a sense due to the libraries available and support online. There are some real pluses to ML compared to traditional statistics - i.e. tests that are normally ran in stats to check you are able to apply the test (i.e. shape of the data: skewness / kurtosis, multicollinearity etc) become less of an issue as the algorythms role is to deliver an output given the input.
I would still recommend some reading into the stats side of things to get a sense of how data can be manipulated to give different results because i think this will give you a more intuitive feel for parameter tuning.
This book does not look very relevant but it's actually a really useful introduction to thinking about data and where the numbers we hear about actually come from
In conclusion if you can programme and have a good attitude towards learning and are diligent with efforts I think this should be a simple transition for you.
Contrary to the other advice around here, I would strongly advise NOT taking a course. I think it is a good idea at some point, but it is not the first thing you should be doing.
The very first thing you should do is play! Identify a dataset you are interested in and get the entire machine learning pipeline up and running for it. Here's how I would go about it.
1) Get Jupyter up and running. You don't really need to do much to set it up. Just grab a Docker image.
2) Choose a dataset.
I wouldn't collect my own data first thing. I would just choose something that's already out there. You don't want to be bogged down by having to wrangle data into the format you need while learning NumPy and Pandas at the same time. You can find some interesting datasets here:
And don't go with a neural net first thing, even though it is currently in vogue. It requires a lot of tuning before it actually works. Go with a gradient-boosted tree. It works well enough out of the box.
3) Write a classifier for it. Set up the entire supervised machine learning pipeline. Become familiar with feature extraction, feature importance, feature selection, dimensionality reduction, model selection, hyperparameter tuning using grid search, cross-validation, ....
For this step, let scikit-learn be your guide. It has terrific tutorials, and the documentation is a better educational resource than beginning coursework.
4) Now you've built out the supervised machine learning pipeline all the way through! At this point, you should just play:
4a) Experiment with different models: Bayes' nets, random forests, ensembling, hidden Markov models, and even unsupervised learning models such as Guassian mixture models and clustering. The scikit-learn documentation is your guide.
4b) Let your emerging skills loose on several datasets. Experiment with audio and image data so you can learn about a variety of different features, such as spectrograms and MFCCs. Collect your own data!
4c) Along the way, become familiar with the SciPy stack, in particular, NumPy, Pandas, SciPy itself, and Matplotlib.
5) Once you've gained a bit of confidence, look into convolutional and recurrent neural nets. Don't reach for TensorFlow. Use Keras instead. It is an abstraction layer that makes things a bit easier, and you can actually swap out Tensorflow for Theano.
6) Once you feel that you're ready to learn more of the theory, then go ahead and take coursework, such Andrew Ng's course on Coursera. Once you've gone through that course, you can go through the course as it actually has been offered at Stanford here (it's more rigorous and more difficult):
I will also throw in an endorsement for Cal's introductory AI course, which I think is of exceptionally high quality. A great deal of care was put into preparing it.
I hope this helps. What I am trying to impart is that you will understand and retain coursework material better if you've already got experience, or better yet, projects in progress that are related to your coursework. You don't need to undergo the extensive preparation that is being proposed elsewhere before you can start PLAYING.
Newton's method and other numerical methods are the hello world of machine learning.
Why numerical methods?
* They might produce the right answer
* They frequently do
* They are easy to visualize or imagine
* You get used to working with a routine that is both fallible but quite simple and remarkably able to work in a wide variety of situations. This is what machine learning does, but there are more sophisticated routines.
At some point you need to make a decision to go down the road more focused on analysis & modelling vs machine learning & prediction. It's not that the two are exclusive, but they really do seek to address really big forks in the problem space of using a computer to eat up data and -- give me predictions or give me correct answers
Google needs lots of prediction to fill in holes where no data may ever exists. Analysis and modeling can really fall down when there is no data to confirm a hypothesis or regress against.
An engineer needs a really good model or the helium tank in the Falcon 9 will explode one time in twenty vs one time in a trillion. The model can predict, based on the simulation of the range of parameters that will slip through QA, how many tanks will explode. Most prediction methods are not trying to solve problems like this and provide little guidance on how to set up the model.
On the prediction side, you will learn all the neural net and SVM stuff.
On the analysis and modelling side, get ready for tons of probability and Monte Carlo stuff.
> Newton's method and other numerical methods are the hello world of machine learning.
Newton's method and other similar numerical methods are the hello world of a branch of mathematics known as 'numerical analysis' and scientific computing. This is not Machine Learning.
Everybody learns differently, but I would suggest starting with the how, not the what. Compare: How do I sort a list? With: What is exactly happening when I sort a list? Application before theory.
Start with a tutorial/pre-made script for one of the Kaggle Knowledge competitions. Move on to a real Kaggle competition and team up with someone who is in the same position on the learning curve as you. Use something like Skype or a Github repo to learn new tricks from one another.
If you like to study/read: the famous Coursera Andrew Ng machine learning course: https://www.coursera.org/learn/machine-learning
If you just want course materials from UC Berkeley, here's their 101 course: https://news.ycombinator.com/item?id=11897766
If you want a web based intro to a "simpler" machine learning approach, "decision trees": https://news.ycombinator.com/item?id=12609822
Here's a list of top "deep learning" projects on Github and great HN commentary on some tips on getting started: https://news.ycombinator.com/item?id=12266623
If you just want a high level overview: https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec...