Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning Crash Course (developers.google.com)
1926 points by matant on March 1, 2018 | hide | past | favorite | 222 comments



Looking through the topics covered, the standard AI-course caveats (https://news.ycombinator.com/item?id=16247629) apply.

Yes, AI/ML MOOCs teach the corresponding tools well, and the creation of new tools like Keras make the field much more accessible. The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.

However, contrary to the thought pieces that tend to pop up, taking and passing a crash course doesn't mean you'll be an expert in the field (and this applies for most MOOCs, honestly). They're very good for learning an overview of the technology, but nothing beats applying the tools on a real-world, noisy dataset, and solving the inevitable little problems that crop up during the process.

Reviewing the Keras documentation (https://keras.io) and examples (https://github.com/keras-team/keras/tree/master/examples) are honestly much better teachers of AI/ML than any MOOC, in my opinion.

(Of course, Keras is now a part of TensorFlow, so there's a neat Google vertical intergration with this crash course!)


> The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.

It is absolutely true that you do not need a graduate degree to apply AI/ML to vanilla problems.

It is also absolutely true, in my experience, that you need a graduate-level education or years of hands-on experience to troubleshoot cases where AI/ML fails on a deceptively-simple problem, or to tweak an AI/ML algorithm (or develop a new one) so it can solve a novel problem.

That said, I think these MOOCs are good enough to get someone to a place where they can create nice /r/dataisbeautiful-style visualizations, or pair with a senior-level DS to deliver something.

(Edited to add folks who have worked on problems for years and add a final note.)


> It is also absolutely true, in my experience, that you need a graduate-level education or years of hands-on experience to troubleshoot cases where AI/ML fails on a deceptively-simple problem, or to tweak an AI/ML algorithm (or develop a new one) so it can solve a novel problem.

How much of that is critical domain specific knowledge and how much of that is just general engineering debugging/problem solving experience though? Certainly the person who does have the masters/PhD and a few years of applying that to real-world ML problems will have the edge but an experienced developer who's got a knack for maths (though no direct ML experience) may be able to get up to speed quicker than you think. Part of that will be experience with knowing how and when to ask the right questions when you get stuck.


> How much of that is critical domain specific knowledge and how much of that is just general engineering debugging/problem solving experience though?

It's both, right? You pick up problem-solving techniques as a researcher or engineer; as the former, those techniques lean towards scientific problems. Your average engineer doesn't need to know about contrasting.

Again: it's possible to learn the necessary math in your spare time! I agree!! However, it's far easier to do it in a graduate program as a full-time job for 2-5+ years.


The knack for maths is the important bit.


The math necessary for ML/AI (statistics/vector calculus) is mostly taught at undergrad level though isn't it? So most engineers should already have it covered.


I can't help but think in 3-5 years how quaint our tools of the day will seem.


I think about this constantly.

Not to sound like I walked uphill in both directions back in my day or something, but I remember building models in numpy without pandas. It was tedious -- and that's just a nice API wrapping ndarrays!


> Not to sound like I walked uphill in both directions back in my day

Local minima?


Most likely, gradient descent with momentum.


Oh boy, that and perturbation.


I’m not so sure.

You can make an argument current tools haven’t really surpassed a Lisp Machine for developer productivity, or a SmallTalk environment.


Not really. I see things like leftpad and npm fails and CEOs mailing private keys they've stored from customers. I see the same lessons we have have to re-learn year after year.


What's an example of a problem that needs that troubleshooting? (Curious)


Honestly? The exact problem I'm dealing with at work right now.

We're trying to re-write our recommender for artist music stations at iHeartRadio (aka "I'll listen to Drake or Kendrick Lamar's station at the gym today"). Just today, I tried adding negative sampling to the matrix I'm factorizing, hoping it encourages spread in the embeddings learned for artists in certain types of genres.

I have a MS, but not a lot of research experience. It would have taken me a while to find this solution on my own. However, the moment I described this problem to my manager - a PhD graduate with several years of research and industry experience - he immediately suggested negative sampling.

What I learned during my MS helped me grok the math immediately. We're adding noise to the training set and penalizing vectors lengths to avoid overfitting. Easy! Identifying a solution worth exploring? Not easy, at least without a degree or significant experience.

(There's also the chance I should know this, in which case I have some reading to do. ¯\_(ツ)_/¯)


Aren't you sort of glossing over the fact that he is in high up machine learning position at a company that specializes in recommender systems? Doesn't that by itself increase the likelihood that he deeply understands implicit and explicit matrix factorization?

I am a good ways through my masters (second CS degree, first specializing in ML), and the more I learn, the more I realize that on any given topic, there is no guarantee the PhD in the room has the most expertise. Machine learning is a broad field that contains many subfields, methodologies, and many applications. It is a bit like computer systems or software engineering: nobody knows it all, people who are experts have intimate knowledge of a specific subset of the field. Of course, you can more around over time, but it takes years to build up expertise in even two or three subfields of machine learning.

Side note: sounds like we do similar work. I work at Vevo, also do a lot of matrix factorization to learn latent factors of items such as artists, videos, etc.


> Aren't you sort of glossing over the fact that he is in high up machine learning position at a company that specializes in recommender systems? Doesn't that by itself increase the likelihood that he deeply understands implicit and explicit matrix factorization?

Sure thing, but someone in that position needs years of experience in recommender systems, as well as working with researchers.

Folks are hanging on to the PhD part of my claim, instead of the "PhD or experience" part. The fact is, a PhD + prior industry work means the person is close to a decade of relevant background, grad degree or not. They will unstick a co-worker far faster than an experienced backend developer with, say, a year of Keras experience.

> Side note: sounds like we do similar work. I work at Vevo, also do a lot of matrix factorization to learn latent factors of items such as artists, videos, etc.

Seems like it! Email me if you'd like to chat some more offline (it's in my profile).


That has little to do with a PhD, it's the kind of thing you get with experience leading to a deeper understanding.

3D programming started as a field where only PHD's had any deep understanding of what was going on simply because they had experience when nobody else did. You see this pattern repeated frequently, in any complex domain.


Yeah, I expected this reply.

The PhD is sufficient but not necessary here, right? A PhD researcher's job description is basically "learn necessary math, become a domain expert, and publish papers advancing that domain." It's difficult (but possible) to gain the same experience in industry if you don't have a graduate degree. Which company would pay you to work through Bishop or Goodfellow for a few months? Even a principal DS doesn't get that deal, much less a junior/associate.

Also remember: my comment addressed non-vanilla cases. In your example, this is the difference between a researcher advancing 3D programming and someone using Unity or Unreal.

(Also, sorry for all the edits. Done now!)


I would say PHD is sufficient to advance the field. That's no small thing, but only really overlaps at the start when just about anything advances the field and you need a broad focus.

Machine leaning for sorting peas at high speed is a very well trodden area at this point with a lot of industry specific domain knowledge. I expect self driving cars for example to reach a similar state in ~10-25 years.

The risk with a PHD is you miss the specific wave. But, if you want to stay on the bleeding edge it's probably well worth it.


> I would say PHD is sufficient to advance the field.

Yep! We’ve now made our way back to my initial point in response to OP. :)


You can spend many months working through papers and books without a company paying you for that. That's something that I continually do and have always done, in my own time (and many different fields). Sufficient and not necessary indeed.


It's definitely easier to do when it's your primary job.


ali rahimi alludes to the problem of google engineers simply needing to tweak models that were previously tuned by google researchers who do have well-developed intuition [0]. because the intuitions in explicit form are at best heuristic and not necessarily even consistent, signing up to improve a model without them might result in spending indefinite time and compute resources without guarantee of positive results. which is a terrible perf-theoretic strategy...

[0] http://www.argmin.net/2018/01/25/optics/


Model divergence, nonsense predictions. The whole black art of ML (specifically neural nets) is coaxing them into working.

If you take some sophisticated deep neural net and try to train it on a binary classification where tails occurs 99% of the time - unless you specifically take measures to correct for this bias - the net will just learn to predict tails.


Fairness and fighting adversarial examples come to mind.


Unless you work for a company obviously known for their ML the "expertise" out there right now is brutal. People are building recommendation engines without knowing the very, very, very basics like Jaccard indexes, ROC Curves, or topic drift. I've even had to explain type two error to someone working on one of these before.

I agree with your general thrust, and you're right, messy data is often 95% of the problem, but even going through just the Google courses will put people in the top 15% in most cities.


I took a machine learning graduate-level course from Andrew Ng himself, and I don't recall learning about Jaccard indexes or topic drift. Maybe your sense of what counts as "very, very, very basic" is skewed toward your own experience. There's a phenomenon known to psychologists where people tend to think that the stuff that they know is very easy and basic, so they conclude that anybody who doesn't know what they know must be uneducated. But then it turns out that the person you think is uneducated knows about a bunch of surprising stuff that you don't. I can't remember the term for this phenomenon, but I often remember it whenever I find myself beginning to judge another person's expertise. This phenomenon is also super relevant to the failings of most technical interviews, in my opinion.


There's a bit of snobbiness in different areas of tech, although there also are in different areas of academia and research. At the end of the day, the most successful people are the ones who wouldn't dismiss a DS who didn't know "Jaccard index" or "the Halting Problem".


Are you referring to the Curse of Knowledge?

https://en.wikipedia.org/wiki/Curse_of_knowledge

What you are describing also sounds a little like the Dunning-Kruger effect:

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect


Oh, wish I knew the name of that phenomenon as well.


You probably can't communicate effectively. If you are describing "Type two error" of course you will get eyes glossing over. A huge problem with research fields is their terse banal labels. Confusion matrix anyone?


Or you can just say "false negative", and every CS major will understand you.

I find people in Math and CS have often very different names for the same type of concepts and they could easy understand each other if they stuck to the more common terms.

In this case, saying: TYPE 2 ERROR, makes you look like you are trying too hard.


It's also extremely confusing because very few people remember type 1 vs 2 but false positive/negative has intuitive understanding.


type ii error is statistics, not mathematics. there is no equivalent concept in CS because type ii error relates specifically to statistical inference and hypothesis testing.

that said, if you are just pointing to a box in a confusion matrix and saying "TYPE II ERROR," you are probably trying too hard.


Eh, but if you've taken a machine learning course, you should have seen the notion of false positive/false negative when you cover any kind of classification technique.


but they're not actually equivalent, in spite of tables like this [0]. type ii error is a false negative result in the context of a test, where you have to understand which hypothesis is which and exactly what you are accepting or rejecting (hypotheses are not always as simple as hotdog/not-hotdog); if your listener doesn't know what statistical tests mean or wasn't following the setup, they have to stop you and ask.

[0] https://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Tabl...


Granted, Type II error and confusion matrices are covered in more basic statistical classes, and are indeed important for hypothesis testing.


I think the point the parent might have been making is that many people (or maybe just me) know "type II error" by the far more self-explanatory name of "false negative".


What's a "type two" error?

I had to google it. It's a false negative.

A "Type 1" error, is a false positive.

Is this like how people overuse the term "orthogonal"?


"Type I" and "Type II" errors are some of the stupidest and most obfuscatory academic terminology ever invented, and (as an academic) I absolutely refuse to make the effort to learn which way round they go. Just call the bloody things what they are: false positives and false negatives. (Getting seriously OT now, but Kahneman does something annoyingly similar with his talk of "System 1" and "System 2" in Thinking Fast and Slow).


As a computer scientist/programmer, there are three numbers, 0, 1, and infinity. If you're going to index your errors by the natural numbers, and you've got Type 1 Errors and Type 2 Errors, my next question is what a Type 3 Error is, and you know what my next question after that is.

Otherwise, please take this wisdom from programmers, who deal with this sort of thing all the time, and use an enumeration, in this case, {False Positive, False Negative} will do just fine.


When you are designing an hypothesis test the term positive and negative are not so clear. For example you can test that mean weight of bags is greater than 5.0 kg or smaller than 5.0 kg, both test are different, and some times you can accept both greater and smaller than 5.0 kg. The philosophy of hypotesis test is not as clear as a standard tests for pregnancy. In other terms,in some cases the H0 hypothesis is symmetric (>= versus =<) and is not clear what a positive result should be, you have to state clearly what is the H0 hypothesis. In a pregnancy test everyone agree than H0 is that you are not pregnant, that is in my HO the semantic difference between Type 1 error and false negative.


What are some synonyms to Kahneman's System 1 and System 2 then? Because Type 1 and 2 errors seem to be completely equivalent to false and positive negatives. I think Kahneman motivates his decision to introduce the terms System 1 and 2 quite well in his book, and I don't know of any direct counterparts.


Jonathan Haidt proposed a similar system in his book "The Happiness Hypothesis". He called it the automatic and controlled sides. The automatic side/system 1 is also what's being described in the book "The Inner Game of Tennis". I would summarize the two sides as the reflexive and the deliberate sides.


If the title of his book is justified, maybe the fast and slow systems?


Ahaha, I agree so much the the Kahneman jibe. How could someone so smart pick names so f*cking dumb?!


So, to put this in human terms.

A false positive or false negative, can be like a pregnancy test.

A false positive, can be where the pregnancy test shows your wife is pregnant, but she is not. And the baby never arrives. Phew, dodged a bullet!

A false negative, can be where the pregnancy test shows your wife is not pregnant, but she really is. And 9 months later, a baby accidentally pops out. Oh crap!


Of course false-positives can also be bad. To use your example "you then spend $10,000 prepping for the baby, but it never comes, what a waste!"


> Jaccard indexes

Funny you mention Jaccard; I was looking up if IoU (Intersection over Union) has any other name known to ML people when I was preparing my self-driving car presentation (IoU is used in semantic segmentation), and found out it is called Jaccard index as well. To my surprise, all ML experts I know knew about IoU but nobody about Jaccard. I guess it might depend on which university you attended?


i think the term is more prominent on the NLP side, via information retrieval (IR) and clustering. i first saw it in IR, and you'd see it in stanford's CS 124 or CS 224N, for example. if the parent is talking about people who are working on a system that has text-understanding component, i can understand their surprise.


For anyone else who wondered what the Jaccard index is, it's also referred to as Intersection over Union.

...and if you haven't come across that either, see https://en.wikipedia.org/wiki/Jaccard_index for details.


Was the project the person was working on having any success? If so, you might be unfairly ignoring their positive contributions and focusing only on this one negative sign that you observed.


My ML model is 99.7% sure you're a gatekeeper. Might be a type 1 error, though.


What's topic drift?


It seems to be a specialized term referring to the change in focus of blogs [0] and online communities [1] over time. This strikes me as a very specialized concept, rather than a generally-important term in machine learning as a whole.

Edit: To add my perspective, with years of industry experience and graduate-level machine learning coursework, I have never before encountered this term.

[0]: https://link.springer.com/chapter/10.1007/978-3-319-16354-3_...

[1]: http://catb.org/jargon/html/T/topic-drift.html


Also seems closely related to the concept of stationarity in time series analysis [https://en.wikipedia.org/wiki/Stationary_process]


maybe he meant concept drift? https://en.wikipedia.org/wiki/Concept_drift


I did, yes.


We have to separate AI researcher and implementation engineer. These types of crash courses help get you to the point where you can reasonably work under PhD level people and write code to test, scale, and deploy their ideas.

For many current applications of ML this is acceptable because you're just stealing an idea from a paper or stealing ImageNet to recognize your problem. For anything else you really need to pay up and fight with Google for a real expert.


Exactly. Somewhat akin to graphics programming. There are much smaller groups that work on actually building 3D graphics engines, however, many developers take those engines and use them to build successful applications and games.


So we need to wait for Unity of ML? With Asset Store selling models and datasets.


Will we reach a state where ML is as accessible for implementers as SQL databases? I still remember the time when databases were only for experts.


Yes hopefully. Take a look at BayesDB (and the underlying crosscat algorithm) and probabilistic programming.


>you can reasonably work under PhD level people and write code to test, scale, and deploy their ideas.

Which phd, though? All PhDs are not equal (see politics vs computer vision). Also, PhDs are hardly the holy grail of demonstrating capability, accuracy or intellect, especially given the reproducibility crisis, phds as a measure of any of those things should be used carefully.


They are talking about PhDs in Machine Learning of course


They really don't. There is a link to the Wikipedia page for matrix multiplication. If these are the people you want to hire, you might as well start outsourcing or generating random numbers


Gate keeping is only obsolete when it ceases to have impact. The reality right now is that ML is extremely hard to enter even for a very knowledgeable and deeply experienced but non-credentialed (by degree) person.

It will be interesting to see how the situation evolves but my own observations are that people trying to enter the space might be better off getting a quickie masters if they can afford the time or cost than to try and bootstrap it.


Even people getting a quickie masters is hit/miss in my experience. At the end of the day, successful machine learning engineers require a whole suit of different skills, both technical, communicative, and even life skills that don't really exist for software devs. Not all those can be taught in 3 months, 2 years or even 6 years.


> both technical, communicative, and even life skills that don't really exist for software devs

Not a fan of this "data scientist is a unicorn" style of thinking. The best people in any profession (especially software engineering) also use these skills in their day-to-day work.


Data science isn't yet as stratified as software engineering, so there's less room for those without those "unicorn" skills. 10 years ago, there was no room at all. 10 years from now, there will probably be plenty of undergrads hired as junior data scientists.


Life skills? Communicative skills? What?


IMO essentially ML experts don't work in a bubble and may interface with potentially anyone at a company; C-level, engineering, product, marketing, ops, etc etc. What other tech-employee needs that flexibility? So, I grouped communication / life skills into being able to understand, read, interpret and ultimately provide value to potentially any team. Just having the technical skills will only get you so far.


Isn't this part of what most software engineering degrees teach though? Particularly surrounding project planning and requirements gathering?


I might be an outlier, but I interface with all of those on a daily basis in my role as a software engineer


IMO software engineering experts (leaders) need to do the same thing.


I agree with this comment. My experience has been that people don't really look at your resume unless you have machine learning experience on your resume or one of the stats type majors


> "you can't use AI/ML unless you have a PhD/5 years research experience"

This is not true since a few years ago. But the fact that you can use it doesn't mean you understand what is happening and why it works in development but not in production. Everybody can copy a jupyter notebook and train a TensorFlow model in ImageNet. Now go to a new domain with very few information like 3D models and create a new network to be trained in that dataset. How many people that can train ImageNet can do the latter? Even inside deep learning experts in image classification fail in reinforcement learning domains and need a couple of years to be completely productive.


I fully agree with you that after a MOOC you've barely scratched the surface and until you're implementing them yourself then you're not going to jump into a ML job.

However personally I view the rest of the opposite way round. Getting through a course on Deep Learning takes months [0]. Then reading through Keras code once you understand the appropriate NNs is easy.

For example it takes a while of going through Neural Networks to understand ResNets. But if you understand ResNets then looking though Keras code that creates a ResNet [1] is easy.

If I want to build a NN of any sort in Keras I can just Google for it. However there's no simple Googling you can do to teach yourself NN in an easy to follow structured way.

[0]: https://www.deeplearning.ai/

[1]: https://github.com/Hyperparticle/one-pixel-attack-keras/blob...


Understanding NNs is easy. Understanding, collecting, and cleaning up data is the hard part.

Also, DL != ML.

Paraphrasing "The Tao of Network Protocols": If all you see is DL, you see nothing.


There are a tremendous number of people outside of programming who spend much or all of their work time collecting, cleaning up, and understanding data. Think teachers, accountants, traders - essentially everyone who spends a lot of time in spreadsheets.


The parent was referring to Keras which is a NN API hence why I responded talking about NN.


Isn't this meant to be an introduction? I'm not sure who comes out of a crash course assuming they're an expert.


> taking and passing a crash course doesn't mean you'll be an expert in the field (and this applies for most MOOCs, honestly)

You're stating the painfully obvious here. I doubt anyone reading HN is under the impression that they'll be an expert after a single online course.

This is just a marketing stunt by Google to ensure their tooling is the defacto standard for AI/ML so that Google can dominate the AI/ML market they way they dominated Internet Search.


> taking and passing a crash course doesn't mean you'll be an expert in the field (and this applies for most MOOCs, honestly)

Any field that you can become an expert in with a 6-week course or less is not a field that should be paying even high 5-figure salaries. Or, conversely, any field which pays 6-figure salaries is either not accessible via an MOOC, or is massively overinflated and probably in a bubble.


The real barrier to entry to ML is statistics. Most computer science degrees require an intro to statistics class, but if you really want to understand ML and where it should and can be applied appropriately you need a much deeper understanding.

IMO, it's much easier to pick up the programming required for ML than the statistics. This was reflected in the classes I took as a double statistics/computer science major. Most of the people in my CS department's machine learning course were statistics students looking to go into data science, not computer programmers looking to get in on the ML trend.


> Yes, AI/ML MOOCs teach the corresponding tools well, and the creation of new tools like Keras make the field much more accessible. The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.

The problem is that having a hammer makes one see everything as a nail. Sure, given a suitably clean set of images, anyone who's done a couple of tutorials will be able to apply a pre-trained neural on them to get something.

The hard part is getting an understanding of what tweaks to use when, and when to give up on a method. Otherwise, it is very easy to get carried away and waste time/resources.

For that, one needs to develop a good understanding of the landscape of ML algorithms, why each of them works and how they could break. That typically takes (intensive) experience or an understanding of the theory. Otherwise you'll be doing a brute-force search through a list of possible algorithms. As they say, "a few days in the lab might save a few hours in the library..."

Yes, things can get painful during hiring because the process is broken as it is, with additional complications due to not knowing how to vet for quality in a nascent field. But the "ML elite" are not morons and they don't mean to be obnoxious gatekeepers.


> The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.

So what are they hoping to achieve with this course? I'm genuinely asking because part of me wants to take the course, but another part of me feels like what's the point if, even through many additional courses to build up a skill set, Google wouldn't hire you as an ML engineer unless you basically start your career back to a junior engineer but in machine learning at another company.


I have a feeling that they're trying to get people familiar with TensorFlow and thus very compatible with their cloud computing services.


I dunno.. for this level of ml scikit/numpy is way more accessible than tensorflow.


Look at it from an interview perspective. If I ask "are you interested in exploring ML", and you're enthusiastic, my next questions are : What have you done? Have you taken any courses? GitHub? Blog Posts?

If the answer is that you're waiting for a special sign that it's worth doing before making an effort, then that really tells me that your enthusiasm for doing ML is not reality-based. Doing the ML thing is a pretty different mindset from other software jobs.


No but you'll ready yourself for the (geometric) programming of the next twenty years or so.


The other day I met with someone who was visiting my city to attend a big ML conference. In the course of our discussion, it transpired this person did not know the Halting Problem. He'd "heard of" Turing machines, but nothing more than "hearing" of them.

Gatekeepers shouldn't keep gates just for gatekeeping sake. But if so-called ML experts don't even know undergraduate computer science, that should really give you pause before you open up your wallet for them.


I could have the same reverse worldview:

"I attended a big software dev conference. Someone I met did not know about data bias. They heard of gradient boosting but nothing more than hearing them. If so-called dev experts don't even know undergraduate statistics, that should really give you pause before you open up your wallet for them."


Maybe I'm an iconoclast, but I'd respect that person more for not trying to bullshit his way out of it.


That's a great point. It shines a positive light on the gentleman I spoke with, and a negative light on the industry as a whole (if bs is so rampant that merely admitting not knowing something makes someone shine)


Why does an ML-expert need to know the halting problem?

Considering that ML is really a CS-oriented form of statistics, why would you expect a statistician to know CS theory?


Thinking more, it's the misleading names ("machine learning", "AI") that rustle my jimmies so much.

Sure, you don't need to know the halting problem to approximately solve MNIST by fitting a million-parameter curve to a dataset.

But you're misleading people if you're claiming to have any kind of insight into how computers can be made intelligent, or how computers can "learn", when you don't even know the halting problem.


I disagree. Frankly, for a lot of people and a lot of contexts, I don't think the halting problem is particularly important. You're using understanding of it as a shibboleth for exposure to common curricula about theoretical computation. But you can even know a lot about practical computation and not know anything about the halting problem. Curious: has your knowledge of the halting problem ever actually saved you time or effort in your work? If so, how?

Turing's work on the limitations of his machine are interesting, and I'm sure people with a deep understanding of them can advance the study of computation.

I think you're just being dismissive of skillsets which aren't your own. I think you're just bothered by the fact that AI and ML are being advanced more by people with more knowledge of linear algebra and statistics than computer science. And realize that it's the arrogant among them that will dismiss you as "just a technician."

Anyone who is looking down on either "scientists" or "technicians" should get over themselves.


> Curious: has your knowledge of the halting problem ever actually saved you time or effort in your work? If so, how?

Not OP, but I'm working a lot with ontologies. Some ontologies representations are undecidable, while other languages are not very expressive but can be manipulated in polynomial time. Had I not known that, I would still be like "crap, why does it take so long? I must have a bug somewhere, maybe I should switch to C".

> AI and ML are being advanced more by people with more knowledge of linear algebra and statistics than computer science.

Just answered OP about that, but actually, symbolic AI is pure computer science. It does not get as much publicity as ML currently, but believe me, it's everywhere: at the core of almost all package managers, like debian's apt-get or maven, at the core of most advanced static code analyzers, etc.


This is a recent shift lead by the ML trend. Traditionally (like 5 years ago), ML and AI were two different things, AI being the term for symbol manipulation. Expert systems, inference engines, constraint programming, SAT solving for instance. These domains are typical CS stuff: inference, complexity classes, low-level representation of data, etc. You don't need that much knowledge in math/statistics to be proficient in those fields, but you rather know what the halting problem is.

I'm working in the symbolic AI field, and sometimes use ML techniques. They are complementary. To me, ML is about induction, AI is abut deduction. They don't solve the same kinds of problems and they tend to work pretty well together.


I guess "dynamic programming" must really bother you. That field was named completely arbitrarily, to secure funding.

The more you look around, the more you find science concepts are named for marketing purposes.

Heck, "data scientist" is a bit of nonsense.


Does this just come down to a semantic idea that if something isn't in pursuit of AGI, its not really AI? That feels unfair to most of these researchers who absolutely disagree with that.

And to consider these algorithms to not "learn" is similarly unfair. They do. They learn to solve specific problems (at least right now), but they do learn.


would you not expect your hypothesized (theoretical) ML expert to understand boosting, which is generally explained in terms of PAC learning, which draws on computational complexity?

that said, i'd also expect a phd in statistics to be able to figure out boosting without taking an undergrad course that worked up from automata. so the halting problem test, while it does capture something, may not be quite right.


Why? Most of that cruft is abstracted away, computation only gets cheaper over time (a world class AI rig cost ~30k, a decent one for 2k) and most applications of ML run on commodity hardware.


For one thing, it suggests that they are actually technicians, not the scientists they're selling themselves as.

That's fine if you want a technician (and if they're charging technician's rates).


I think when it comes to ML the CS experts with limited statistics knowledge are the technicians and the statistics experts with limited CS knowledge are the scientists, not the other way around.


And then that technician rate is X times what a technician rate would be for pure software dev.. what is your point?


> But if so-called ML experts don't even know undergraduate computer science

To be fair, Machine learning seems more closely related to applied mathematics - statistics/optimization than to computer science.


How does knowing that it's impossible to predict whether an infinite loop exists in a piece of code yield an actionable piece of wisdom that this ML expert should have?

I'd suppose that most developers, formal education or not, would have encountered an infinite loop at some point in their initial work with iteration or recursion.

How does knowing that Turing proved you can't predict this bug in a piece of code change anything?

I might genuinely be missing something important here - not trying to be snarky in my questioning.

It seems like obviously infinite loops are a disastrous bug for critical code - but what does knowing the formal name of the problem and background of its discovery give you?

I could understand if you were arguing in favor of test code or static analysis.


"turing machines" are cs 101?


Amended that to "undergraduate computer science"


> The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.

It's the main reason why I decided to present a talk at the next PyCon Italy, as a very junior data scientist, to inspire other Python developers to learn some practical machine learning. If I could do it (and use it for a work project already) many other people can do (and no, I don't even have a degree in CS, just years of work experience)


>> The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.

I'm going to have to ask who exactly are those AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience".


TLDR: Taking a class is fine, but nothing beats real-world practice.

Wonder where I've heard this one before. :)


Great to see they have a nice introductory section to feature engineering! Feature engineering is often the most impactful thing you can do to improve quality of models and a place where I often see beginners (and experts for that matter) get stuck. Google walks through how to work with json files and categorical variables https://developers.google.com/machine-learning/crash-course/....

If anyone is looking to get more indepth, I work on an open source python library for automated feature engineering called Featuretools https://github.com/featuretools/featuretools/. It can help when your data is more complex such as when it comprised of multiple tables.

We have several demos you can run yourself to apply it to real datasets here: https://www.featuretools.com/demos.


Your comment got me interested in this course. However, all I could find about feature engineering there is what you linked to, directly.

Given that entire scientific careers, books, and conferences are built around the topic of feature engineering, and at least IMO good ML tools live or die with good feature engineering (in its broadest sense, for you deep learning fanatics :-)) that doesn't seem like more than the bare minimum I'd expect from any ML "crash-course" that is to be taken serious (and I wouldn't expect an ounce less from Google... :-)).

Am I missing something, maybe?

In any case, nice work of your own, and thanks for sharing it!


10 seconds into the video of feature engineering they say that feature engineering takes up about 75% of the time https://developers.google.com/machine-learning/crash-course/...

They understand the value, but but if you keep watching, they don’t seen go beyond the basic.


Although I'm normally skeptical of AI/ML courses, that section on feature engineering do's-and-do-nots is new and surprisingly under-discussed. It's very useful even outside of AI/ML.


I agree.

I expect that as companies increase their focus on finding practical applications of ML / AI, the topic will start to get more attention in these tutorials, as well as from researchers. Right now, too many people assume you already have a feature matrix, which is rarely the case when working on real world problems.


OTOH, automating feature engineering is a thing. There are papers on using unsupervised methods to do that.

The 1st place in Kaggle's Porto Seguro competition trained an Autoencoder on raw data to extract features.


How do you select features created with featuretools? The problem with automated feature engineering is that you end with too many irrelevant features, and I haven't found a good guide on feature selection.


For those interested in a deeper dive to just deep learning, "Tensorflow and deep learning - without a PhD" is really good, and covers a lot of material in a single 2hr talk.

https://www.youtube.com/watch?v=vq2nnJ4g6N0


+1 for this. Well worth the 2 hours.


the deep nets and conv nets stuff was excellent. i wish the explanation of rnns was a little better.


This looks like a well put-together course, and a good way to learn TensorFlow. Keras and TensorFlow are top of my list of technologies to explore in the very near future.

Is anyone here doing Andrew Ng's Machine Learning course [1]? I'm about half-way through and really enjoying it. I'm particularly appreciating that the programming exercises are done in MatLab/Octave, so I feel that I'm really understanding the fundamentals without an API getting in the way, and developing some good intuition. Obviously frameworks are the way to go for production ML work, but I wonder whether ML people here think this bottom-up approach is advisable or could it be misleading when I move on to Keras/TensorFlow/whatever?

[1] https://www.coursera.org/learn/machine-learning

Edit: brevity


I teach ML and am currently writing my 2nd book on it.

I always advocate learning the fundamentals. Machine learning is math, and neural networks in particular rely on linear algebra and vector calculus. (You can build a NN without using linear algebra directly, likely it'll be slower and besides, the concept still relies on linear algebra).

Frameworks abstract away a lot of the mathiness, which is a net good for society (ie, exposing lots of developers to neural networks), but I consider that a net-negative for the individual developer.

When working on anything but trivial toy problems, you should make sure you understand your problem domain and implementation thoroughly. Is the activation function you've chosen ideal for your problem domain? If not, choose a better one. If no better one exists, you can invent it; but you'll also need to know how to design the backpropagation algorithm for that new activation function (which requires some vector calculus).

Learning the math, as you have, helps you tune your algorithm based on actual knowledge rather than guesswork. I don't think it will be misleading when you move on to a framework. The frameworks are built on the same math.

That said -- if all you're looking to do is play around, then you don't need the math as much.


Thanks for taking the time to write such a comprehensive reply - much appreciated. "ML is maths" is something that I'm getting used to now. I do have some real uses in mind for what I'm learning' both in my job and some side projects, particularly image feature recognition, and I'm looking forward so seeing how it all works in out. Thanks again!


Image feature recognition is not quite solved but I feel it's very close. It's easier, obviously, if the problem domain is very specific.

In the past, like when I started on ML, the best tip was to make sure to do some edge detection with a few convolutions before feeding an image to a neural network. Now, we have convolutional neural networks that kinda do that for you automatically.

Sometime in between those two dates, someone figured out how to get the convolutions trained via backpropagation -- and they did that by deriving the gradient of an arbitrary convolution (or more likely, looking it up). And that let us put convolutions right in the neural net and have the convolutions automatically train themselves along with the rest of the network. And we observe that the convolutions do things that we would do, like remove unnecessary detail and highlight edges or exaggerate colors.

Anyways; I believe the current state-of-the-art for generic image feature recognition is an ensemble of convolutional neural networks. I believe Google leads the pack on the commercial side so maybe look into how they do it.


not quite solved is the right way to put it.

If you look at capsules papers, you will realize that convnets are not very good at recognizing transformations (e.g. 3d rotations) of the same object. That's probably why so many training examples are required to make them work well.

Also, if you look at errors made by state of the art models, some of them are obvious (to a human) objects, classified as something entirely different and unrelated. Which leads me to believe that object recognition is not completely solved until a model has some kind of common sense, either build in, or acquired during training.


Andrew Ng’s older machine learning MOOC class was excellent. I took it once, and took it again a few years later. In the last 8 months I also took his new deep learning set of courses. All really good stuf! (I have been working in the field since the 1980s, but constantly refreshing helps me. And Andrew’s lectures are great fun, he is a teaching artist.


I used to download all ML videos for offline ad hoc watching. But, this crash course videos cannot be downloaded using 'youtube-dl'. Any recommendations?


Where are the videos anyway? I don't see any play button or ay console to play the videos.


Use the time you spend trying to find how to download the videos actually watching them, doing the readings and practising the exercises.


i completed the andrew ng course recently, and felt that the difficulty dropped a lot in the second half of the course (for example, he stops giving homeworks). im hoping for more in his new DL courses


I've taken the CNN and RNN (parts 4 and 5) classes of his new DL specialization, and they both area about as rigorous as you wanted. I do have to give a warning though that the last class starts so see some confusing mistakes in the HW. For example, the expected output given is from an outdated HW version.


sigh. well i hope the support is good.


The choice of TensorFlow is a bit disappointing for a beginner-focused course which looks really solid otherwise. Business seems to have gotten priority over pedagogy in that case.

I see TensorFlow as the Angular of machine learning: first on the market, powerful but unwiedly. Like Angular, it will ultimately get superseded by tools with a nicer API (scikit-learn, Keras) or more versatility (PyTorch). Like Angular, it's probably not the best choice for a beginner to invest time into.


Add to that that TensorFlow was practically a latecomer, not the first to market.


I love the prework section: https://developers.google.com/machine-learning/crash-course/... It's a very good mix of topic and skills that I think everyone should learn, even if not directly planning to do ML or DL. If y'all are looking for a compact (and inexpensive) textbook on linear algebra that comes with all prerequisites you can check out: https://gum.co/noBSLA (disclaimer: I wrotes it)



As someone who just did this internally:

Do it. It's worth your time. Very well paced exercises, and it walks you through the flow quite nicely.


I went through the first a couple of topics. It seemed very disjointed. Different people presenting, different exercises. Was it like this internally? Or is this heavily "annotated"?


Unless you have already invested a lot of time into learning (and building on top of) TF, I would advise to pick up PyTorch. It’s much easier to learn and use (imperative!), and has higher performance on common workloads.


Except there aren't many good resources to learn it and the documentation isn't very good. Hopefully this will improve soon.


On the positive side with PyTorch you don’t need nearly as much documentation as you would with TF. TF in general feels like it’s fighting you every step of the way. There’s a lot of cognitive overhead. Not so with PyTorch. Everything is straightforward, and can be run/examined in ipython.


I like this move from google. Sure it is targeted for you to use Tensorflow but more courseware and MOOCs help everyone. I love doing self study and Tensorflow's tutorials are top notch. Since I can also use Tensorflow on my own hardware and anywhere else I really love better docs and MOOCs in general. What I really want to do is understand enough Tensorflow to reproduce other people's experiments in their papers on github and I think this would be one of the best ways to do this. Of course, this may eat into a bunch of companies that have paid programs for ML but its Google's prerogative to make ML cheaper and easier to deploy and learn so I am all for that.


we recommend that students meet the following prerequisites:Mastery of intro-level algebra. You should be comfortable with variables and coefficients, linear equations, graphs of functions, and histograms

Any book suggestions to getting up to speed in this area?


If you're looking for books, have a look at Schaum's Outline of Precalculus [0]. Khan Academy [1] is also good and there's this MOOC on coursera called Data Science Math Skills [2].

[0] https://www.amazon.com/Schaums-Outline-Precalculus-3rd-Probl...

[1] https://www.khanacademy.org/math

[2] https://www.coursera.org/learn/datasciencemathskills


Khan Academy probably has this covered.


A shame Google doesn’t just link to the Khan Academy course.


not true. they do have khan academy link where applicable. for example for the algebra ones. check below link https://developers.google.com/machine-learning/crash-course/...


I always liked the Saxon books[1], since they involved so much spaced repetition if you did the problem sets that it beat the symbolic manipulation into your long-term memory.

[1] http://amzn.to/2FH3bXL


Shameless plug: Lambda School (YC S17) is also putting on a free Machine Learning crash course (we call it a mini bootcamp), followed by an optional 6-12 month course that you pay for once you get a job in data science (it’s free until then, and always free if you don’t get a job in ML).

https://lambdaschool.com/machine-learning-bootcamp/


is there a more fleshed out outline for what will be covered here? sounds interesting


As someone who is trying to learn ML, all the courses available are hugely helpful. One thing I wish I had easy access to is the process that someone goes through while trying to build a model on a real dataset.

Specifically following questions are the ones I struggle with:

1. How did you figure out what features would be useful?

2. How did you figure out what algorithm(s) are appropriate?

3. how and why did you massage the data in a specific way?


> How did you figure out what features would be useful?

There are various feature engineering and feature extraction techniques. Filter methods, wrapper methods, and embedded methods. Principle component analysis, autoencoding, variance analysis, linear discriminant analysis, Gini index, genetic algorithms, etc -- the feature selection process will depend on the dataset, the problem domain, the analysis algorithm you ultimately use, etc.

> How did you figure out what algorithm(s) are appropriate?

Also depends on the problem domain. Discrete or continuous data? Categorical features, numeric features, features as bitmasks. Do you need a probabilistic outcome? Etc.

Generally you start with the easiest algorithms in your toolbox to see how viable they are. For a classification task I'll almost always start with a naive Bayes classifier (if the data allows) and/or a random forest and see how they perform. If the problem domain is highly non-linear you might start with a support vector or kernel method. Neural network is a last resort for me, as I find most classification problems can be solved to a high accuracy much more simply.

> how and why did you massage the data in a specific way?

This relates back to #1 -- you should only massage data based on what your feature engineering tells you to do. Sometimes you might want to remove outliers or clean up the training data, but only if the outliers really should be removed from consideration entirely.


Thanks for the response!

> There are various feature engineering and feature extraction techniques. Filter methods, wrapper methods, and embedded methods. Principle component analysis, autoencoding, variance analysis, linear discriminant analysis, Gini index, genetic algorithms, etc -- the feature selection process will depend on the dataset, the problem domain, the analysis algorithm you ultimately use, etc.

Obviously thats a big toolbox and Im sure it takes time to develop an intuitive understanding for all these techniques. What I hope for is some sort guidebook on what to look for when I stumble across problems. So lets say you try out an algorithm and your accuracy(or whatever evaluation criteria you might have) is low. How do you figure out if thats due to the algorithm, or is it due to (or due to the lack of) feature selection?

An analogy that might be useful is, when I see my database queries are slow, I can use EXPLAIN to guide what knobs to tune. Obviously it requires understanding what indexes are, what a full table scan is etc. etc. but the EXPLAIN plan provides a guidebook of sorts.


Every problem is different, so the only advice I can give is: research research research! Do the hard work up-front; figure out how to describe your problem in a mathematical sense, and identify the right tools to use for the shape of your input, output and problem dimensions. What's the distribution of each dimension. Are the relationships linear, nonlinear, clustered, dispersed, logarithmic, etc. Once you know those things, you're able to narrow in on the right tools and analyses to use.


If you are willing to do the work, Frank Harrell's Regression Modeling Strategies is a pretty good introduction to a lot of this.

It's written for a very different set of problems than typical ML, but it has lots of really good advice for practical problems in data analysis and prediction (which is another term for ML).

Mostly people learn this stuff by experience. Find a dataset, choose a predictor, filter, clean and massage your data till you get better metrics/understanding (preferably both). Rinse, repeat on many different datasets and problems, and you'll know how to do this.


Georgia Tech has an graduate course on Machine Learning CS-7641. There are four major projects in that course where the students must analyze (and re-analyze) a chosen dataset. Here is an example of the code one student used: https://github.com/JonathanTay/CS-7641-assignment-1 Unfortunately all the plotting code was intentionally removed. Sometimes the project reports make it online (http://www.dudonwai.com/docs/gt-omscs-cs7641-a3.pdf?pdf=gt-o...) . Having spent several months of my life on the assignments I'd say that only way to learn it is to try a whole bunch of different things and try and figure out why some work and why some don't. Sometimes you learn from the failures, sometimes from the unexpected successes.


Take a look at some of the highly rated kernels on Kaggle - they’re often well annotated with the types of things you’re looking for, including actual experimentation to test ideas.

Edit: fix autocorrect


I've compiled this into a Todoist template you can import - it's got links to each module + times.

To preview: http://todotemplates.com/posts/HRtYanEq8zMgRL5fz/google-ml-c...

To import directly: https://todoist.com/importFromTemplate?t_url=https%3A%2F%2Fd...


Cool tutorial, but I'm not entirely sure what makes this ML -- aside from neural nets, this is more or less the material you'd encounter in a basic applied statistics or regression analysis course, minus material on estimating uncertainty, modeling survival or time-series data, and causal inference. I suspect you'd benefit more from a 50 minute tutorial on those than neural nets.


I want to ask people who know ML well if the hype is warranted?

Billions of courses, web sites, job applications and HN posts. The subject seem to have taken off massively in the last two years. I mean image and speech recognition is pretty cool (when it works!), but hardly that earth shattering, is it?


Deep nets are deservedly big because they've managed to improve upon most of the decades-old state-of-the-art methods in the world of signal processing (DSP): voice, image, video, game play, and a significant amount of natural language. No other single computational/algorithmic method has achieved so much in so many domains, ever. That's revolutionary.

The rate of advance using deep nets in signal processing will likely slow down now, but they aren't going away, not in the foreseeable future.

The hype around DNNs arose when we took our unbridled enthusiasm for what's they've achieved in DSP and extended it to other domains where data is less 'dense' and thus aren't as amenable to fast de/convolution in N-D space or time.

Will DNNs revolutionize or introduce all the techniques needed to achieve AGI/Strong AI? I very much doubt it. As yet, there's little sign that DNNs can perform relational operations on interdependent symbols, like the transforms available via type theory, bayesian nets, or predicate logic.

The multitude of disparate facts and semantics in a rich knowledgebase can't be organized into dense matrices the way that continuous signals can, so the SIMD operations that are so effective in DSP won't implement the rich transformations needed in a relational fact-based knowledge space equally as well, if at all. Thus DNNs almost surely aren't going to take us to the heights of logical or compositional thinking that human level intelligence requires.

But how far up relational mountain will DNNs take us? I suspect that won't be known for a decade or longer. But even if we don't reach the summit, we'll be considerably closer than we were before.


This is a really fantastic and interesting look at ML, for someone who’s just beginning first steps. Any recommendations on where else I can read about the areas DNNs and associated stuff (GANs, RL, etc) in terms of what they’ll likely not be capable of in the near mid term?


just commenting to thank you for a very accessible assessment of a simple question "is the hype warranted". please do more of these on HN (or elsewhere!)


The hype is and isn't warranted.

ML is a much broader field than just neural networks. The hype for ML, in general, I think is warranted. We hit an inflection point when AWS launched and scalable processing power became cheap. It became cheap to process tons of data and generate insights. I don't have hard numbers on this, but probably 90-95% of machine learning used in practice is NOT neural networks, and have accuracies in the 90%+ arena. So ML in general -- sure, hype warranted.

Neural networks are the new hot topic, and the hype isn't fully warranted yet. TensorFlow made them very popular in the developer community; this is a good thing because it's spurring more investment and research in ANNs. But for any given problem, odds are that a neural network is not the best (ie, most accurate or cheapest) way to solve it. Neural networks do have specific problem domains where they are the state of the art, but for most other problem domains there exists a better solution. So I'd say that neural networks are a little over-hyped right now, but with a new generation of developers learning about and experimenting with ANNs, that will change in a few years. I think we're about to see an explosion of ANN usefulness over the next few years.

TLDR: ML is very useful but is more than neural networks; neural networks need a little more progress to catch up to the tensorflow hype.


Computers are automation tools that increase human efficiency by doing the grunt work for you - but they are limited to automating the tasks that can be captured as a set of rules in code. When we figure a new way to model more complex tasks in code, a whole new set of things can be automated.

Here's a concrete example: Before spreadsheets existed, there used to be legions of accountants who created complex ledgers on paper and added up all numbers to track how a business was doing. You'd literally mail off your sales numbers to an accounting team somewhere and wait three days to get the latest report generated and sent back. Sure they had calculators to add numbers, but the computers of the day didn't understand how those numbers related to each other. The human still had to do most of the work to create the reports.

The big idea of spreadsheets was to make the computer manage the more complex task of knowing how different numbers in a report related to each other. It made most ledger tasks totally automatic once the initial report was defined. Now a single accountant could do the work of the entire accounting team - and more accurately and in less time! There were stories of the first spreadsheet testers having to delay mailing back their financial reports by a few days because their clients would be suspicious if they mailed them back too fast.

Nearly overnight accounting got a lot more efficient and companies made more money. T"What If" modeling that used to be too slow and cost prohibitive to do was now it was quick and easy. Companies could plan more intelligently. The spreadsheet was a true game changer.

This same pattern happens every time the bar is raised on the complexity of what can be automated and Machine Learning raises the bar one giant notch. Previously we were limited to automating tasks that a smart coder could describe as discrete steps in code. But with ML, the computer can figure out it's own rules just by looking at data. That means in many cases you can solve very hard problems just by collecting a lot of data. Lots and lots of things that used to be done by large groups of people will now be able to be done with a single computer.

In that sense, ML is a total game changer. Don't focus on the specific applications thus far. Focus on the idea that all kinds of tasks that used to require humans can now be automated with a little bit of applied ML. The opportunities are literally everywhere.

In a few years, ML won't be some esoteric technique used by a few people. It will be a core skill that everyone uses or touches in some way. It's going to creep into everything everywhere because it's just so darn useful.


Deep nets are deservedly big because they've managed to improve upon all of the decades-old state-of-the-art methods in the world of signal processing (DSP): voice, image, video, game play, and a much of natural language. No other single computational/algorithmic method has achieved so much, in so many domains, ever. That's revolutionary.

The advances made by deep nets in signal processing will likely slow down now, but they aren't going away, not in the foreseeable future.

The hype around DNNs arose when we took our unbridled enthusiasm for what's they've achieved in DSP and extend it to other domains with data that's less 'dense' and thus aren't as amenable to de/convolution in N-D space or time.

Will DNNs revolutionize or introduce all the techniques needed to achieve AGI/Strong AI? I very much doubt it. As yet, there's little sign that DNNs can perform relational operations on interdependent symbols, like the transforms available via type theory, bayesian nets, or predicate logic.

The multitude of disparate facts and semantics in a rich knowledgebase can't be organized into dense matrices the way that continuous signals can, so the SIMD operations that are so effective in DSP won't implement the rich transformations needed in a relational fact-based knowledge space equally as well, if at all. Thus DNNs almost surely aren't going to take us to the heights of logical or compositional thinking that human level intelligence requires.

But how far up relational mountain will DNNs take us? I suspect that won't be known for a decade or longer. But even if we don't reach the summit, it'll be higher than we were before.


I've been reading about it "getting popular during the last two years" for at least 6 years.


I've been itching to learn a bit about the industry and to be able to create & train ML models myself; I'm glad Google decided to put out a course where I wouldn't have to worry about the quality of instruction.


On a sidenote, can someone talk about what kind of tools are being used to integrate the subtitles and scrolling behavior over a youtube video in this course? Is there an OS implementation?


Thanks Google! Now I know that I am a ML guy, as an economist and econometrician. Yes, we shoot this on all kind of stuff, though with a clear business acumen or economic policy thinking.


I have a new project at work: I need to take in a free form text of recipe ingredients (e.g. "1/2 cup diced onions", "two potatoes, cut into 1-inch cubes", etc.) and build a program that identifies the ingredient (e.g. onion, potato), as well as the quantity (e.g. 0.5 cup, 2.0 units). Would machine learning be an applicable approach to solving this? Right now I'm just planning on using an NLP library to parse out the various parts of the ingredient text.


I did the same a while back, and i suggest using an NLP library to extract parts of speech and parse trees and building a quick dirty solution. I did the same a while back and the strong solution isn't much better (took a week+) than the hacky manual one based on specific keywords like "teaspoon" and parts of speech/parse trees (took a few hours).


It's not very sexy, but I think you might find it easier and more robust just to use an NLP library.

I built something similar (albeit for a relatively limited database of recipes) for a hackathon a couple of weeks back. I didn't even use a proper NLP library, just some simple hand-rolled pattern-matching, and got pretty good results.

Good luck!


I think you're right. Did you happen to open-source your code from the hackathon? I'd love to take a look at your approach if you don't mind.


Sorry, I normally would but one of the other team members is considering taking the hack forward and wanted to keep it closed for now. (It's hard to see how much competitive advantage he'd have from 48 hours of very-hacked-together code, but so few hackathon projects get taken forward that I didn't want to discourage him!)

The approach was to tokenize the input and then do basic pattern-matching on it, with separate dictionaries of quantity units (e.g. cup, oz, pound) ingredients, processing words (e.g. "chopped") and throw-away words (e.g. "of"). In fact, possibly the most complicated part was parsing "2.5", "2 and a half" and "2½" all to the same thing.


Whether you end up using a machine learning approach or hand-crafting the solution, I recommend you work in a ML-like manner, dividing up the data you have into test and training sets and using cross-validation to evaluate your work.

For you actual question, yes, as others have said it might be just an NLP/regexp problem. Otherwise, you could look at ingredients identification as a classification approach. I recommend checking FastText, NLTK, familiarize yourself with word dictionaries and pre-trained vectors that are available, these tools might help generalize your work beyond the data you have at hand.

(E.g. if it works well on your data using pre-trained word vectors from wikipedia, chances are it might work on examples you don't even have.)



This is an NLP problem if all you're trying to do is extract nouns.


The ulterior motive behind this is to increase usage of Google Cloud (according to this answer on Quora: https://www.quora.com/Why-did-Google-release-their-machine-l...)


Cool! Does anybody also know a good blockchain crash course of similar kind so one could grok all the major buzzwords of today?


I loved this one a lot: https://anders.com/blockchain/


This is what got me a proper intuition for why a blockchain can be useful.


They do have pre-reqs training material with reference links. https://developers.google.com/machine-learning/crash-course/...


While I like the idea, in principle, that you don't need a CS education to use AI/ML, I doubt it. Here's a problem that cropped up today: Our instance ran out of hard drive space on a training set of ~400,000 images. The individual images were only 375 GB, but took up 1.5T when converted to Numpy matrices. Why? The arrays were converted to standard int arrays (32-bit x 3 channels) when they could've fit into short (8-bit x 3 channels). Each image was 4x as large as it needed to be.

You can certainly use high-level ML tools (like Keras), but it takes a great deal of work to wrangle your data into a usable format, and even more knowledge to debug an ineffective network.


I wonder if they are doing this to complete with course.fast.ai


IMO teaching people ML is good in general for Google. 1. it spreads the use of tensorflow, 2. it increases not only tensorflow, but also Google's mindshare, 3. it trains people that may become future Google employees, and/or serves as a useful resource for existing employees


Also 4) will increment the usage of TPU's on Google Cloud Platform and subsequently the revenue of their cloud offerings.


This has been an internal course at Google for many years.


I think that helps to prove the point.


The Google AI courses are also good for old seasoned ML practitioners who want to learn more about more recent deep learning techniques


Having done this course 6 months ago, this is a fantastic introduction to the major concerns of practical machine learning.


Can't view. Requires a Google account.


Have you tried to create one?


How does this compare to the CS229 lectures by Andrew Ng? (the recorded lectures, not the MOOC)


CS229 @ Stanford is very math/proof intensive. This might be somewhat similar to CS221 or the Coursera course.


serving videos from youtube without alternatives doesn't make it accessible. proxying for identified videos esspecially videos from this crash course would be useful.


>serving videos from youtube without alternatives doesn't make it accessible.

Why is that?


this site is so broken on firefox mobile they should be embarrassed.


In the course, in lecture

"Reducing Loss: Gradient Descent"

is

"Convex problems have only one minimum; that is, only one place where the slope is exactly 0. That minimum is where the loss function converges."

The first sentence is flatly wrong: E.g., for positive integer n and the set of real numbers R, function f: R^n --> R where for all x in R^n f(x) = 0, f is convex, concave, and linear, and for all x in R^n x is a minimum and a maximum of f.

Can there be uncountably infinitely many alternative minima for the Google ML problems? Yes, e.g., just enter one of the independent variables twice.

The second sentence is nonsense.

Grotesque, outrageous incompetence!!!!

It has long been known that minimizing a convex function, even a differentiable convex function, with just gradient descent can be just horribly inefficient. A LOT is known about how to do much better than just gradient descent. E.g., there is Newton iteration (right, that Newton, hundreds of years ago) and quasi-Newton. And there's more.

Why so inefficient? Well, draw a picture like Google did except use just two independent variables instead of just the one in the Google picture. Then see that the resulting, convex "bowl" can be like a long, narrow boat with a very gentle slope in one direction and a very steep slope in an orthogonal direction. Yes the cross section of the bowl can be, first cut, an ellipse with one short axis and one long one. Sure, the axes are eigenvectors, etc. and the ellipse is part of a local quadratic approximation. Well, gradient descent keeps going back and forth nearly parallel to the short axis of the ellipse and making nearly no progress on the long axis. People have known this and known good things to do about it for, uh, at least half a century.

For the Google ML problems, might (1) tweak Newton iteration to improve the rate of convergence to a minimum or (2) at each iteration don't get a gradient descent but get a supporting hyperplane of the epigraph of the convex function, as the iterations proceed, accumulate these hyperplanes, notice that they lead to an approximation of the full epigraph, and use linear programming or some tweak of that to minimize the hyperplane approximation to the convex function -- there is much more that can be said here. By the way, the convex function for that ML problem is quite special, e.g., quadratic.

A lot has long been known and well polished in regression and classification, e.g., with TeX markup:

N.\ R.\ Draper and H.\ Smith, {\it Applied Regression Analysis,\/} John Wiley and Sons, New York, 1968.\ \

Leo Breiman, Jerome H.\ Friedman, Richard A.\ Olshen, Charles J.\ Stone, {\it Classification and Regression Trees,\/} ISBN 0-534-98054-6, Wadsworth \& Brooks/Cole, Pacific Grove, California, 1984.\ \

C.\ Radhakrishna Rao, {\it Linear Statistical Inference and Its Applications:\ \ Second Edition,\/} ISBN 0-471-70823-2, John Wiley and Sons, New York, 1967.\ \

A good start on convexity is

Wendell H.\ Fleming, {\it Functions of Several Variables,\/} Addison-Wesley, Reading, Massachusetts, 1965.\ \

Total, fun, dessert ice cream on convexity is Jensen's inequality; right away can use it to prove a lot of classic inequalities.

Gee, look on the upside!!!! From this sample, the claims of machine learning (ML) revolutionizing the economy are nonsense!!!! And for startups, don't much have to worry about serious competition from Google!!!!


i would request that you stop critiquing "machine learning" based on the presentation in introductory online materials like this and the ng coursera course. you provide a lot of signal in general but i think these critiques do decrease your SNR.

i am certain that you are familiar with the "usual" statistics sequence. (for others: there are lower-division courses that use calculus in a few places but otherwise avoid it, focusing instead on memorizing procedures. there is the upper-division probability/math stat sequence that uses calculus heavily but avoids analysis. and there is an intro phd sequence that finally gets into measure theory.) if you look at a coursera course that gives an high-level overview of practical statistics and simplifies its presentation to be accessible to people who never took calculus, you can criticize the very idea of a course that does not explain the measure-theoretic issues, but it makes no sense to use it to criticize the field of statistics, or to criticize the competence of others in the institution that produced it.

here, google is producing introductory training materials for developers. many developers have never taken calculus, let alone optimization, statistics, or analysis, and when i took MLCC internally, you were supposed to go through this whole thing (lectures and coding) in two days. it's supposed to give you enough understanding of the concepts to understand the API and apply it.


If you find something wrong mathematically or otherwise with something I write, then by all means let me know. So far you have found nothing. Details:

The Google statement I quoted was flatly wrong. It is really important for students to be told that.

I gave some references to more in statistics.

> the measure-theoretic issues

I didn't mention measure theory, and the statistics references I gave don't mention measure theory either. I referenced Fleming only as background on convexity, and there is no measure theory in that part of Fleming.

You mentioned the role of calculus for the math of regression: That use of calculus, for deriving the normal equations, is neither necessary nor, really, sufficient. There is another derivation, nicer, fully detailed mathematically, with no calculus at all. The core of the idea is that the minimization of the squared error has to be an orthogonal projection and, then, presto, bingo, get the normal equations. And there are more advantages to that derivation. To keep my post simple, I omitted that derivation, but a good treatment of regression without calculus could use it.

I omitted the standard definition of convexity, but apparently in this discussion we need that. By omitting the definition, the Google material was not so good.

Definition (convex function): For the set of real numbers R and a positive integer n, a function f: R^n --> R is convex provided for any u, v in R^n and any a in [0,1] we have

f(au + (1-a)v) <= af(u) + (1-a)f(v)

So, for a picture, on the graph of f(x), we have two points (maybe not distinct) (u, f(u)) and (v, f(v)). Then we draw the line between these two points. The number a determines where we are on that line. With a = 0, we are at point (v, f(v)). With a = 1, we are at point (u, f(u)). Then as we move a from 0 to 1, we move along that line. We also have on the graph the point, say, P

(au + (1 - a)v, f( au + (1-a)v ) )

And on the line we drew, we have point, say, Q

(au + (1 - a)v, af(u) + (1-a)f(v))

Well, we are asking that point Q be the same as point P or directly above point P. That is, the line we drew is on or above the graph of (x, f(x)). The line is sometimes called a secant line and is said to over estimate the function.

Definition (concave): The function -f is concave if and only if the function f is convex.

As in Fleming, a convex function is continuous. Intuitively, the proof is based on two cones, and as we approach a point we get herded between the two cones. The cones are from the convexity assumption. Draw a picture.

IIRC, there is a result in Rockafellar that a convex function is differentiable almost surely with respect to Lebesgue measure, but this is the only connection I would make with convexity and measure theory.

For convex f, the set of all (x, y) where y >= f(x) is the epigraph of f, that is, the region on or above the graph of f.

Definition (convex set): A subset C of R^n is convex provided for any u, v in C and any a in [0,1] the point

au + (1-a)v

is also in set A.

Well, as in Fleming, the epigraph is convex. It is also closed in the usual topology of R^n.

Definition (closed set): A subset C of R^n is closed (in the usual topology of R^n) provided for any sequence x_n, n = 1, 2, ... in C that converges to y in R^n, y is also in C.

In particular, if we define the boundary of C, set C contains is boundary. So, the interval [0,1] is closed and the interval (0,1) is not closed.

Well, for any closed convex set and point u on its boundary, there exists a hyperplane that passes through point x and where set C is a subset of the closed half space on one side of the hyperplane.

Such a hyperplane is said to be supporting for set C at point x.

Intuitively, in R^3, push convex set C to be in contact with a wall. Suppose point x on the boundary of set C is in contact with the wall. Then the wall is a supporting plane for set C at x, and set C is a subset of the room side of the wall.

Or think of a big, solid, irregular piece of cheese and John Belushi as his Samurai Tailor swinging his sword: John keeps swinging his sword in arcs that are in flat planes and cuts down the irregular cheese to a convex hunk. So, the convex C has been determined (formed) from the supporting hyperplanes from Belushi's sword.

For another way to make a convex set, take a piece of wood and press it against a belt sander until the boundary consists of only flat sides.

Let a solid rock roll around in a steam of water for a few thousand years, and may end up with a smooth, shiny, convex rock.

Faster a chicken egg is convex.

In general, a closed convex set is the intersection of its supporting hyperplanes.

Then, we can approximate a convex set with some of its supporting hyperplanes. In particular, we can approximate a convex function with some supporting hyperplanes of its epigraph. At times, this can be useful -- it's the main idea behind Lagrangian relaxation in constrained optimization (I used that once).

In particular, the epigraph of a convex function is the intersection of the supporting hyperplanes. In that case, a supporting hyperplane is called a subgradient. The function is differentiable at the point of contact if and only if the subgradient is unique. If the subgradient is unique, then it is just the tangent hyperplane from the gradient of the function.

In R^3, a cube is a convex set. Each of its sides is part of a supporting hyperplane. For a point on the boundary of the cube that is not on an edge, the supporting hyperplane at that point is unique. The corners and edges of the cube also have supporting hyperplanes, but they are not unique.

In R^3 a sphere is convex. Then it is also the intersection of its supporting hyperplanes. At each point on the boundary of the sphere, the supporting hyperplane is unique.

Similarly for epigraphs.

So, the function f: R*n --> R where for each x f(x) = 0 is convex, concave, and linear, and each x is both a maximum and a minimum of f. So, the minimum of a convex function need not be unique. In this case, the epigraph is just a closed half space.

"Look, Ma! No calculus!" And no measure theory.

Exercise: Derive the regression normal equations via perpendicular projections and without calculus.

Exercise: Argue the role of perpendicular projections in the minimization in regression.


my only goal in mentioning the statistics sequence at all was to give a familiar example where the standard sequences vary in depth depending on audience. a trivial point, yes, but i wanted to be concrete because it's the internet. apparently that was a terrible choice, as it was far too close to the topic at hand; my apologies for making you search so hard for a connection.

i made my request because it's jarring for me as a reader when you punctuate your (often delightful) expository writing with conclusions about entire fields and large organizations that seem (on the face of it) to be justified by old and/or very limited data.

but that's a selfish request, and you are of course free to tell me to get lost and post whatever you want (and i'll still read it); i'm certainly not going to pursue this further, aside from the apology and clarifying comment above.


You do understand that for all the talk and new terminology and claims of "learning" in the "machine learning" (ML) in the Google OP, what is in the OP is a poor introduction to some highly polished material in "regression analysis" in 50 year old books. So, the ML stuff is adulterated old wine in new bottles with new labels. That is essentially intellectual theft and corruption, and without references essentially academic plagiarism. You should be offended.

If they are going to plagiarize, even just teach, regression analysis, then at least they shouldn't make a mess out of it, and a mess is what they made. Google should "get that MESS OFF the Internet".

Students should be told the truth: Regression is powerful stuff. Sometimes the results can be valuable. The Google OP is an introduction to regression and does have some value. But the Google material is a MESS, and students should be informed that they are getting really low quality material and should see some references to some beautifully polished material.

So, I helped any students who would be the target audience for the Google OP.

You should know this; I believe you do.

I'm offended by the mess and passing that out to students trying to learn. You should also be offended.


Google dumbs everything down because they think everyone is dumb. I have learned to avoid their documentation and attempts to teach the populace.


Correction of a typo:

> So, the function f: R*n --> R where for each x f(x)

should read

So, the function f: R^n --> R where for each x f(x)

Excuse: Just now I'm using a keyboard on a laptop, and I'm not used to the keyboard yet.


http://p.migdal.pl/2017/04/30/teaching-deep-learning.html -> "What mathematicians thing I do"

(Full disclaimer - I did theoretical physics, so understand both sides. :))


You should recheck your definitions on convexity.

>function f: R^n --> R where for all x in R^n f(x) = 0

This hyperplane is not convex. A convex curve by definition can not be equal to its tangent at any point.

Edit: I should specify, I mean a convex curve cannot be completely equal to any of its tangents, obviously it will equal each tangent at a single point.


You don't want to consider just "tangents" and, instead, consider what I defined as supporting hyperplanes of the epigraph and subgradients of the function. If the gradient exists, that is, if the function is differentiable, then the subgradient really is a tangent. Otherwise can have many different subgradients supporting at one point on the curve and its epigraph.

It's simple: A cube has supporting planes at each point that is an edge or corner, but those points do not have tangents.


It sounds like you are describing curves that are strictly convex. Curves that are convex, but not strictly convex, can intersect their tangents at more than one point, or even at every point.

I'm going by the definition of convex function given in Rudin's "Principles of Mathematical Analysis", Apostol's "Calculus", Wikipedia, and MathWorld.


Fair enough. I suppose pointing out that the authors merely omitted "strictly" wouldn't have served GP's point as well.


Strictly convex need have no role in this Google ML material. Just convex is enough.


> This hyperplane is not convex. A convex curve by definition can not be equal to its tangent at any point.

No, my math is fully correct, and your claim is wrong.

For a lecture Convexity 101, see my

https://news.ycombinator.com/item?id=16498564


Is there something like this for Java programmers?


ML concepts are independent of programming language. I suggest not to let language preferences stand in the way of listening to lectures and understanding the subject.

For implementing exercises using Java, you have a bunch of good options:

1) The most direct equivalent to Pandas+Tensorflow I can think of is DL4J. They have a good comprehensive set of concept and implementation tutorials [1].

2) TF APIs have a Java port and can be used from java desktop and console applications [2]. So a second but slightly more difficult option is using TF Java port + Spark APIs.

[1]: https://deeplearning4j.org/documentation

[2]: https://www.tensorflow.org/install/install_java


I think Weka gets used in the java world. https://www.cs.waikato.ac.nz/ml/weka/


Which languages and libraries does it use? Is it based in Google libs, like Tensor Flow?


Clicking through to the homepage, the full title is: "Machine Learning Crash Course with TensorFlow APIs".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: