Reading articles like this written by people who want to share their fabulous domain knowledge for free of charge really is the reason why I read Hacker News. Thank you, i hope i will have the time to read through it all with thought and later hopefully utilize it with my own projects.
And people like you who take time out to read and learn is exactly the reason why people like me write such articles! Absolutely thrilled that people liked it and that I will be contributing in people learning this beautiful field of science I do research in :)
These are the tutorials that depict the reality of a machine learning career. Everyone broadly understands that data preparation is the key, but few realize what that involves. Half of this tutorial is just about getting and prepping data for training. Kudos!
To quote one of the greatest professor in ML Pedro Domingos - "First-timers are often surprised by how little time in a machine
learning project is spent actually doing machine learning.
But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it,
and how much trial and error can go into feature design.....Learning
is often the quickest part of this, but that’s because we’ve
already mastered it pretty well! Feature engineering is more
difficult because it’s domain-specific, while learners can be
largely general-purpose."
This so so helpful. It would take me months to gather resources to learn this stuff and I wouldn't even know what I would be looking for. To the author: please share more content if your valuable time permits
Working on them already! Next one is going to be on Word Embeddings for Natural Language Processing. Basically, how do we convert words and sentences to numbers so that a computer can work with them. Applications like Text classification, sentiment analysis all of them depend on this one single fundamental backbone!
Great write-up. Especially the fact that half of it was about finding cleaning and structuring data! You can tell someone isn't applying ML if they aren't spending most of their time getting their data organized. It's the "sharpening the axe" part of the hour Lincoln describes.
For example, they never introduce you to how you can run the same algorithm on your own dataset
I actually think the tensorflow tutorial on CNNs actually runs through training and classification on your own set with inception pretty well.
You mention you're a CV student. Any particular area of focus?
While much of this goes over my head, detailed write-ups like this by people who have no direct way of gaining a financial outcome from all their hard work is the cornerstone of why the internet is fantastic.
I have been searching for exactly this type of tutorial for months. Your explanation of the state of online "10 minute introductions" for machine learning is spot on. I understand the concepts, and have a thorough background in programming, yet there always was a gap in my knowledge base. Thank you for sharing this!
This is wonderful. I just became interested in this subject but had difficult finding resources that weren't simple copy/paste examples, as you mentioned, or semester-long courses. Thank you!
I saw this tutorial by you somewhere Spandan, and found it here on HN. I am yet to explore it but I have marked your GIT repo already. Thanks for the hard work.
Are there more great resources like this to learn finding, cleaning and structuring data? Would greatly appreciate it if someone could point me in a direction.
I couldn't find much, that's why I stressed on it in the tutorial. Scraping is a fun hobby but it's extremely useful. I strongly suggest spending time using python's selenium and beautiful soup libraries. The former is good to automate pages with javascript elements, and the latter to parse HTML!