Why is machine learning ‘hard’?

mattlondon · on Nov 12, 2016

I think the problem is that we don't really understand ML properly yet.

Like picking hyperparamters - time and time again I've asked experts/trainers/colleagues: "How do I know what type of model to use? How many layers? How many nodes per layer? Dropout or not?" etc etc And the answer is always along the lines of "just try a load of stuff and pick the one that works best".

To me, that feels weird and worrying. Its like we don't yet understand ML properly yet to definitively say, for a given data set, what sort of model we'll need.

This can lead us down the debugging black-hole TFA talks about since we appear to have zero-clue about why we chose something, so debugging something ultimately might just be "opps - we chose 3 layers of 10, 15, and 11 nodes, instead of 3 layers of 10, 15 and 12 nodes! D'oh! Lets start training again!"

It really grates me to think about this considering how much maths and proofs and algorithms get thrown at you when being taught ML, then to be told when it comes to actually doing something its all down to "intuition" (guessing).

And yeah as others have said - data :-)

BickNowstrom · on Nov 12, 2016

Good rule-of-thumb to always add dropout. Very hard to lower the performance this way.

Yes, we have to try a load of stuff, especially in deep learning, where the old feature engineering seems to be replaced with architecture engineering [1].

Yet, the stuff to try is usually well-known [2]. You should not be trying stuff completely at random (unless you do a random gridsearch for hyperparams [3]).

> we chose 3 layers of 10, 15, and 11 nodes, instead of 3 layers of 10, 15 and 12 nodes!

Hinton advises to start with a wide net, ensure that the net is able to learn the function, then add dropout, smaller layers, and/or regularization to reduce the overfit. This should avoid scenario's like the above.

Wanting to have the perfect model for a specific problem mathematically calculated may be an unrealistic demand. If we view machine learning as compression (using the shortest program possible to describe a function/distribution), per Kolmogorov Complexity, we can't compute this shortest program, and if we happen to find it, we can't know we did. The maths is for theory. Intuition is for the practical application. See intuition more as guided guessing, instead of random guessing: with experience you should know what are the flaws and benefits of the different models and architectures for particular problems. You can limit your search to sane narrow ranges.

[1] https://smerity.com/articles/2016/architectures_are_the_new_...

[2] Finally, suppose you want to train an LDNN. Rumor has it that it’s very difficult to do so, that it is “black magic” that requires years of experience. And while it is true that experience helps quite a bit, the amount of “trickery” is surprisingly limited ---- one needs be on the lookout for only a small number well-known pitfalls. - http://yyue.blogspot.com.br/2015/01/a-brief-overview-of-deep...

[3] http://www.jmlr.org/papers/v13/bergstra12a.html https://people.eecs.berkeley.edu/~kjamieson/hyperband.html

narrator · on Nov 12, 2016

When I was taking the MIT analog circuits class, they were showing the differential equation for RLC circuits and the teacher explained that these were figured out via "guesswork". Apparently these equations are not solveable using a step by step simple transformation. If you plug them into Mathematica, it won't know how to solve them. Some really smart mathematicIan figured them out intuitively.

This is what machine learning to me feels like. We just hook this stuff together and some really smart guy with a fractional Ramanujen worth of natural intuition says it will work and it does, especially for the really complex image recognition models. Have you guys looked at Google's "inception" architecture? It's a huge rube Goldberg machine of many many layers and it does work, but there's not a lot of reasoning about why it was designed the way it was.

jps359 · on Nov 12, 2016

Well, generally, we know why certain models work on certain types of data.

But picking a proper model and hyperparameters requires some insight/knowledge about the properties of the data itself, which is the whole purpose of doing ML experiments in the first place (to gain knowledge/insight about the data)!

However I do agree that we don't fully understand the area yet or the reach of its applications.

makmanalp · on Nov 12, 2016

> just try a load of stuff and pick the one that works best

Well, yeah, that's easier to say than "read this entire book to understand how machine learning models work". The truth is that different models work with different success for different kinds of data.

How many predictors do I have? How many observations vs how many predictors? How much data overall? Do I care about interpretability? Am I doing classification or regression? If doing regression, what is my tolerance for error? How much time am I willing to spend training and predicting? If doing classification, are my classes balanced or imbalanced? How are the classes spread? Do I care more about overall errors or false positives / false negatives (think of a cancer screening test - you might rather have false positives than miss something with false negatives).

All of these factors, and more, go into play when selecting a model. There's other bits too. Selecting variables: including unneeded or highly correlated variables can actually worsen the result. Feature engineering: Turns out you might be able to process your variables in such a way that the algorithm picks up on the details you need much more easily. Cross validation: You don't want to "overfit" your training data, i.e. get a model that is super specific to your training data such that when encountering actual data points, it has worse accuracy because it's not general enough. Hyperparameter tuning: Usually a lot of these models have tuning parameters that you can tweak, that are hard to know what the values should be from the get go, and you have to try a bunch and look at the response curve of how the accuracy changes.

So yeah, machine learning is not magic. Turns out there are different tools for different problems. We do have some models like Random Forests and SVMs that work fairly well out of the box on a wide variety of problems, and some kinds of neural networks also do well but often need more data and processing time to get decent results. It's all a tradeoff :)

highd · on Nov 12, 2016

To be clear your complaints only described a subset of ML - neural network approaches in particular.

scoot · on Nov 12, 2016

> the answer is always along the lines of "just try a load of stuff and pick the one that works best"

Forgive the naive question, but why couldn't ML figure out its own best "stuff"?

chongli · on Nov 12, 2016

Because then you've got to figure out how to tell the machine what best means and that's what you were setting out to do in the first place.

Silfen · on Nov 12, 2016

This is pretty much what boosting is. Although we can ask the same question of the boosting algorithm, I guess.

cshep · on Nov 12, 2016

Another complication, especially for newcomers, is the boundary at which to use certain algorithms. Questions like 'Can/should I just use an SVM over deep learning?' are also suspect to 'guessing'. Mind, it's this explorative process that makes ML so intriguing too.

giardini · on Nov 12, 2016

mattlondon> "And the answer is always along the lines of "just try a load of stuff and pick the one that works best"."

Sounds to me like an opportunity to apply automation - Monte Carlo is your friend and machine cycles are cheap!

Most software development and ML development are diametrically opposed. That's why we have statisticians.

Most software developers like the control that programming gives them - that's why they are programmers. They prefer developing systems that can be programmed (trained) quickly, consistently and with rapid feedback using computer languages.

ML in contrast, is sort of like the claw crane arcade game at Walmart, you remember, the one where you insert 50 cents and then get to "control" the movement of a scoop for 10 seconds and then it drops onto a pile of possible prizes and, when you raise the crane, it drops the prize back onto the pile.

https://en.wikipedia.org/wiki/Claw_crane

If you're very lucky, you get a stuffed bunny or duck for your kid. Usually your kid goes away disappointed. There's something there but it's just beyond your ability to find it in 5 minutes. That's what ML looks and feels like but 50X worse. The ML training software has 50-100 knobs(you get to pick!) and the object of your desire lives in a multi-dimensional hyperspace. Find out how to use the knobs to reach your boss' goal. This is imo NOT a task for a software developer. And you don't even get a fuzzy bunny for your efforts.

Yes, the claw cranes are rigged:

http://kotaku.com/5929888/why-yes-those-claw-machines-are-ri...

And if you think that's bad, consider that much of AI-related ML is like trying to succeed with a claw crane designed by God (or maybe the Devil). It pays to choose your challenges carefully.

blt · on Nov 12, 2016

I think to some extent, this kind of difficulty occurs with any numerical programming. With typical software engineering the bugs are usually in logic. All but the hardest problems can be diagnosed with a visual debugger. With numerical code, though, you usually can't look at the state of a few discrete-valued variables and a stack trace to figure out what went wrong. "Why is this matrix singular?" can take days to answer. You spend a lot of time staring at the code, comparing it to the math on paper, trying to visualize high-dimensional intermediate data, etc. Continuous math can be a lot harder to reason about than discrete.

nabla9 · on Nov 12, 2016

It's even worse than that.

Talking from personal experience in the private sector:

You go trough the whole process and present your results for a customer. They make the required changes, see few percent improvement in the bottom line and are happy. Year later you pick up the same code for another project and discover small error in data collection script that completely invalidates everything.

Garbage in, garbage out errors can have internal consistency that survive cross validation and provide similar results with many different algorithms and models. Random changes in the real word can produce actual gains.

Standard machine learning datasets rarely suffer from this problem because results and semantics of the problem are known beforehand. If you have a original black-box problem, it's possible to do random search and improve by accident.

boulos · on Nov 12, 2016

tl;dr: Agreed, but the problems and models ML folks play with are more complex than most physics-like systems.

Having spent a lot of time doing Monte Carlo rendering, I can say that "why is this pixel messed up" can be a painful affair (and that domain is at least visual!). Bad input geometry, NaNs or Infs because you forgot to handle some 1/r^2 term, etc. occur in all numerical computing.

Some of this is actually obviated in TensorFlow the same way it is in much of numerical computing: frameworks that have been battle hardened through years of bug fixes. This lets you at least focus on "Did I describe my model correctly?" rather than "Did I implement gradient descent correctly too?".

Again, rendering (and all physical simulation) has the benefit of producing a picture and the inputs are effectively "physical". The challenge for machine learning is that you're trying to optimize for figuring out the underlying model rather than relying on the laws of physics for your problem. While this is just as true in any numerical modeling problem ("Should this be a 4th degree polynomial or just cubic?"), the models that ML folks put together are effectively quite complex and the input datasets are both very large and usually unstructured. The visualization of the network being trained in the TF Playground (http://playground.tensorflow.org) helps with the model but not the original space.

simonbarker87 · on Nov 12, 2016

This was it for me. I don't have a CS background but with PhD in electronics/semiconductors I can usually brute force my way through things and get a good grasp.

I got half way through Andrew Ng's ML course and felt I fully understood the concepts but spent all of my time battling with matrices and the foundation mathematics rather than building on ML understanding. If I had a better grasp of the maths then I think I'd get on better with it.

eva1984 · on Nov 12, 2016

So there is a greedy strategy to approach problem with ML

1.Starts with a VANILLA model, a proven one. To establish a baseline you can fall back on. For example, in deep learning, starts with fully-connected nets, then vanilla CNN, adding BNs and ReLus, then residual connections, etc.

2.Do not spend too much time tuning hyperparmeters, especially in the field of deep learning, once you change the your algorithm, a.k.a network structure, everything changes.

3.Adding complexity as you go. It is important once you established some solid baseline, then you can start add more fancy ideas into your stack, and you will find, fancy ideas are improvement over the already working ideas, and it is not that hard to add it.

4.One important tip to remind, once you change your algorithm, as time goes, those changes might not happy with each other. So reduction is also very important. Rethink your approach from time to time, take away stuff that didn't fit anymore.

5.Look At Your Data. Garbage in, Garbage out. Cannot be more true. Really. Look at your data, maybe a sample of it, see whether youself, as the most intelligent being as of yet, can make sense of it or not. If you cannot, then you need to probably improve the quality.

Anyway, ML is a very complex field and developing like crazy, but I didn't feel the methodology to tackle it is any different from any other complex problems. It is a iterative processes starting from simple prove solutions to something greater, piece by piece. Watch and think then improve.

jupp0r · on Nov 12, 2016

There are also lots of architectural problems with machine learning components to consider, beautifully summarized in "Machine Learning: The High Interest Credit Card of Technical Debt" by Sculley et al. http://research.google.com/pubs/pub43146.html

BickNowstrom · on Nov 12, 2016

I actually think Machine Learning is relatively easy. There are a lot of resources, the community is very open, state-of-the-art tools are available, and all it needs to get incrementally better is trying out more stuff on different data sets.

I worked in SEO before, which had far more elements of "black magic". Perhaps SEO helps with the transition to ML, because you are basically reverse engineering a model (Google's search engine) / crafting input to get a higher ranked output. It's feature engineering, experimentation, and debugging all-in-one.

And front-end development of the old days... debugging old javascript or IE6 render bugs makes ML debugging pale in comparison. You had to make a broken model work, without being able to repair it.

As for the long debugging cycles in ML. John Langford coined "sub-linear debugging": Output enough intermediate information to quickly know if you introduced a major bug or hit upon a significant improvement [1]. Machine learning competitions are not so much won by skill, but by the teams iterating faster and more efficiently: Those who try more (failed) experiments hit upon more successful experiments. No Neural Net researcher should let all nets finish training, before drawing conclusions/estimates on learning process.

Sure, the ML field is relatively new, and computer programming has a longer history of proper debugging and testing. It is difficult to do monitoring on feedback-looped models running in production, yet no more difficult than control theory ;). And proper practices are being developed as we speak [2]. The author will probably write a randomization script to avoid malordered samples automatically in the future.

[1] http://www.machinedlearnings.com/2013/06/productivity-is-abo...

[2] http://research.google.com/pubs/pub43146.html

aflam · on Nov 12, 2016

The long and convoluted debugging cycle for machine learning really hurts my faith in ML models. This issue - with some practical advice - was the center of this interview (disclaimer: author). https://shapescience.xyz/blog/interview-data-science-methodo...

I'm convinced we lack decent tools for ML debugging: what could they be?

Matthias247 · on Nov 12, 2016

I think once you have the need to go deep enough into a topic they all get hard.

Debugging and testing is also hard in all things that are somehow related to realtime or concurrency. E.g. OS development, embedded firmware, network stacks, etc. For these things you often also need to know about Math, Physics, Statistics, Electronics, Hardware and Software Architecture, etc.

Game engine development is also hard because you should also know about most of this stuff to really find the most efficient solutions.

du_bing · on Nov 12, 2016

That's right, machine learning requires knowledges of so many fields, so if any problem occurs, developer has to do so so many check to find the problem, and optimize it.

curiousgal · on Nov 12, 2016

Exactly why I despise "ML Crash course" and "Learn ML in x hours" kind of courses.

dzhiurgis · on Nov 12, 2016

It gets worse. There's YouTube channel with uploads like:

Build a Neural Net in 4 Minutes

Build an Antivirus in 5 Min

Build a Self Driving Car in 5 Min

walrus · on Nov 12, 2016

Have you watched any of them? I just watched "Build a Self Driving Car in 5 Min" and the content was good. My only complaint is that his demo used end-to-end learning, which isn't how most self-driving cars actually work (but he acknowledges this).

Drdrdrq · on Nov 13, 2016

Link: https://www.youtube.com/watch?v=hBedCdzCoWM

taneq · on Nov 12, 2016

Cue a link to the essay, "Teach Yourself Programming In Ten Years".

partykid92 · on Nov 12, 2016

one big dimension here, the "implementation error" can be easily be debugged. Gradients can be checked numerically. The model can be checked to work by looking at the optimality conditions (not just the loss function go down). This shouldn't be an issue for anyone from a traditional coding background.

peatfreak · on Nov 12, 2016

Why would you expect it to be easy?

Kirth · on Nov 12, 2016

I've met some people whom believe that the more modern tools (such as Tensorflow) magically require little human input and make it so that you do not need to know/understand the mathematics and statistics. Not sure where they get this idea.

Everyone wants to do machine learning, but nobody seems to want to learn statistics.

dzhiurgis · on Nov 12, 2016

I am this person.

I think it's because programming did not require me to learn maths (ok I did learn how to multiply matrices in high school, but I never used it in my life). So my expectation is same with ML.

Those platforms are abstractions so I do not really care how they are implemented. Same way I have not idea how JS really implements objects or sorting. I did one of those crash courses on certain platform, and while there was some stats, I totally did without it. I could probably build a classifier that instead of classifying images of dresses, would classify pillows, curtains or cars. But I did not feel like I learned anything.

mattlondon · on Nov 12, 2016

+1 to this.

I dont think you need to know (or at least should not need to know) much stats at all to use pre-built libraries like TensorFlow.

It feels to me that a lot of the ML courses around concentrate almost entirely on the stats & maths side of ML though. This strikes me as a bit of mental-masturbation.

To teach people how to program from zero-knowledge, we don't first teach them how modern compilers or the JVM works and how they do their complex optimisations and JIT etc. Why are we teaching people how to use ML from zero-knowledge the absolute raw nuts and bolts of the maths involved (complete with all of the mathematical proofs to prove that something works etc)?

Sure eventually it would be useful to know what is going on with the maths, just like with programming it eventually can be useful to know what the compiler/JVM is really doing, but a LOT of productive stuff can be done when blissfully ignorant of what TensorFlow/the JVM is doing.

ML is easy, but the courses are often too aloof and strike me as academically focused on the maths purely for the sake of the maths itself, rather than on what ML can do. ML is not hard - any programmer can understand it, but the maths is off putting to programmers who are not mathematicians (the majority I'd say)

auvrw · on Nov 12, 2016

> Everyone wants to do machine learning, but nobody seems to want to learn statistics.

i think there are are a fair amount of people who want to learn the stats .. even some who want to learn the analysis.

it appears that some things that can be phrased in terms of iterative numerical computation can be difficult because there are probably some properties of the limiting behavior of those computations that can't be learned because they've yet to be discovered.

nonetheless, i (maybe?) get what the parent post is generally saying -- as someone who knows nothing about tensorflow, i wonder if tensorflow users are generally interested in flatness, etc., which (exact sequences of tensor products) is the only guess that i've made about what a portmanteau of "tensor" and "flow" uses as a conceptual model.

i wonder if the difficulty of 'machine learning' is that people tend to approach it as its own thing with its own special, entirely separate bag of tricks. certainly there will be some tricks unique to these iterative statistical techinques.

----

however, i don't think the original article gives enough due-deference to the actual workaday difficulties, challenges, and [non-monetary] rewards of software development in industry: if ML, ANNs, etc. are, as some say, essentially "computer psychology," then being productive on with a team of developers to ship a business product is peddle-to-metal human psychology..

an_account · on Nov 12, 2016

How would you recommend learning ML? I've been interested but don't know where to start.

adamnemecek · on Nov 12, 2016

I put together a list of resources I found useful:

https://news.ycombinator.com/item?id=12900448

blahi · on Nov 12, 2016

Not who you asked, but:

Statistics in Plan English.

Regression Analysis by Gilman

Elements of Statistical Learning.

You need to throw some matrix algebra and calc 1 & 2 somewhere in between. Certainly before ESL. It would also require You can't simply read the books and go through the examples. You will be stuck at a concept at many occasions and you will battle it out until after much googling and reading additional papers, you finally get.

After those 3 books, you've got the basics.

redtexture · on Nov 13, 2016

Fuller title on "Regression Analysis by Gilman"? I cannot find such a title author combination online.

Perhaps?: "Data Analysis Using Regression and Multilevel/Hierarchical Models" by Andrew Gelman & Jennifer Hill

blahi · on Nov 13, 2016

Huh, I typo-ed his name and fudged the name in my head. Sorry about that.

That's the book.

kevinwang · on Nov 12, 2016

Depending on your situation, this advice may be useless, but I'd say for me personally, taking an intro to machine learning class at my university is a wonderful and easy way to learn about the basics. So I'd recommend taking a university class in-person, or online.

Scea91 · on Nov 12, 2016

This is the best answer. In the end nothing is better than a degree in the field.

Dzugaru · on Nov 12, 2016

Even if you develop an "intuition" for known tasks (like classification) - there are so many problems that are not tackled yet and no one has any "intuition" yet. Common sense very often doesn't work there (in high-dimensional spaces ;)).

For example I've only recently stumbled upon an "Explaining and harnessing adversarial examples" article - and that completely changed my perception about my current work in computer vision.

js8 · on Nov 12, 2016

I think it is hard because of https://en.wikipedia.org/wiki/No_free_lunch_theorem

That follows there is no single "good" algorithm, and you need to have and exploit domain knowledge in order to succeed.

norswap · on Nov 12, 2016

> Machine learning often boils down to the art of developing an intuition for where something went wrong (or could work better) when there are many dimensions of things that could go wrong (or work better).

I'm not a practitioner, but I always thought this was the main challenge. Uses of ML are rarely "right" or "wrong" per se, but they rely on intuition to get a model that "works" in a practical sense.

There is no royal way to machine learning: you can't decide you are going to make an algorithm that detects bad comments (as determined by human consensus) and then just go make an implementation that you can reason out to be correct, the way you could prove a graph algorithm correct. Trial-and-error and hard-to-transcribe intuition are baked into the process.

(I'd love to get some insider insight on this comment!)

bitL · on Nov 12, 2016

I will be a little sarcastic - using sub-optimal/locally-optimal algorithms everywhere in ML due to time complexity, why would you expect it would bring nice/predictable results? It's more like a miracle if you find something that works, otherwise you will be hitting the usual hard problems from optimization and end up in a "catch-as-you-can" situation where even Monte Carlo randomness is a good guess. And way too many people assume ML is just applied statistics, always keeping their minds in this frame and missing out on the large data ML capabilities where statistics is irrelevant and you can directly ask and find answers to many fundamental questions in your dataset.

ramblenode · on Nov 12, 2016

The diagrams are pretty misleading. One can craft a space with any number of arbitrary dimensions, but that's kind of meaningless until the space is populated with data. Certainly the likelihood of a bug is not uniformly distributed across the space, and certainly the density of bugs within a space varies greatly depending on the problem. I imagine the average kernel developer's 2-D space is both very dense and has greater spread than the 4-D space of many ML engineers.

jorgemf · on Nov 12, 2016

In software you can track the program and detect what instruction is not doing what it is suppose to do. In machine learning there is no program to track, it is not a set of instructions with a purpose, the whole thing either works or not. In order to discover what could be failing, you need to have a deep knowledge about a lot of stuff (maths, statistics, CS) to figure out what is wrong. And sometimes the answer is that the problem doesn't have a solution.

fnl · on Nov 12, 2016

It is very much possible to track your calculations and quite literally debug your models. But like in computer science, it is hard to find the data scientist who can actually do that and not just copy-paste tutorial code from some blog post and then wonder what isn't working...

strictfp · on Nov 12, 2016

People don't use it simply because it's not what they signed up for. It's not obvious that a software engineer will enjoy being a data scientist. I for one think it's tedious, and I don' t enjoy spending time and effort collecting the necessary data to solve my problems the ML way.

DrNuke · on Nov 12, 2016

Generally speaking, being hot in the media does not help: walking the walk is way harder than talking the talk.

godmodus · on Nov 12, 2016

because of preprocessing and needing to choose the functions that'll do the approximation - the process itself is semi-automatic, not fully automatic. ANN's inner nodes are specific functions that need parameter tweaking (after choosing the right ones that is) - vector machines have different kinds for different data, etc.

and those two things are very domain specific so you need to do a lot of homework first, and debugging later.

edblarney · on Nov 12, 2016

I've worked in computational linguistics:

1) It's 'hard' because you need a lot of 'training data' in order to train models etc.. It's hard to get.

2) 'AI' type interfaces represent a whole new kind of UI challenge. For 'predictive typing' for example, you can optimize an algorithm so that it does better for 90% of the US population, but then it gets 'worse' for the remaining 10%. So it's a paradox. This can have weird effects.

For example, if you have an app in the app-store, you may leave the settings so that it's 'broadly optimal'. You get ok stars.

If you then make it 'better' for those 90%, you might get a little boost in ratings, but you get 1 and 0 star ratings from the 10% for whom it's a sub-par experience. This can destroy your product.

Anyhow - 'there is no right answer' often in AI, and setting expectations can be extremely difficult.

And all of that has nothing even to do with CS.

Kenji · on Nov 12, 2016

That reminds me of Stephen Hawking's computer. His old computer gave deterministic dictionary predictions and after some time, he had memorized all the predictions that came up, so he could be much quicker because he knew exactly how much of the word to type and then to select the proper word - a purely mechanical process that doesn't require feedback anymore. However, when they gave him a new computer, it had adaptive predictions (I think) and Hawkins could not memorize them anymore. Then he always had to type letter for letter and inspect the predictions before he could select one - a huge step down.

Stuff like that is absolutely crucial but is often forgotten by engineers.

kixiQu · on Nov 12, 2016

Seconded, and yet it's madness to me that engineers don't pick up on this. I can't be the only one who's let someone use my computer for basic internet browsing and then been deeply irritated when their newly visited sites break my intuitive knowledge of how many characters I need to type in the URL bar before hitting enter...

mingyeow · on Nov 12, 2016

"And all of that has nothing even to do with CS." -> therein lies the kicker! Consumer level (including sass) apps require very real user and business tradeoffs. That tradeoff is rarely discussed, which i honestly find very curious in and of itself.

fnl · on Nov 12, 2016

That is about it. Only need to add the actual data quality that matters, too: I often get a ton of junk to work with, which is partially useless. And the difficulty isn't just "algebraic debugging", but embedding the whole pipeline in a way that won't blow up or grind to a halt the it's used in a production environment, especially during peak loads when "everybody is looking" or when a new semantic event type happens.

yomly · on Nov 12, 2016

I appreciate that you've given a pretty topline overview but in the 90/10 example, if that 10% can be characterised and/or clustered can the algorithm be optimised for both groups involved? Appreciate that that's not always possible and can lead to lots of engineering overhead - curious what your thoughts are though...

edblarney · on Nov 12, 2016

It's a very good point.

Yes, often it is possible to determine where the user belongs in that 90/10 setting, but it can take a lot of time in order to be 'pretty sure'. You need a lot of 'user interaction' in order to make that assessment.

The 90/10 rule can broadly apply to things like culture: certain Latino Americans speak/write very differently. A lot of 'le' and 'la' (gendered) in there as well as a whole different set of proper names and colloquialisms.

But it can take some time to really establish if someone is 'latino' from their writing.

Even harder: some people type more precisely, some people type more loosely. You can actually adjust the probability spectrum of a predictive keyboard to match someone's style. But get this: people's style changes all the time! I noticed that when I'm tired, I type like I'm drunk. Or if I'm busy etc.. So there's even variation in style that makes it difficult.

It's a really hard thing to do.

edblarney · on Nov 12, 2016

I should add:

You can 'massively decreasing returns to complexity' in these domains.

Meaning that you can do 'pretty good' with some basic algorithms.

For the next 'bump in performance' you need some complex code.

After that - you really start to have 10x larger models, or crazy complex engineering just to move the needle.

It creates a completely different set of 'Product Management' rules. It's kind of fun, unless you're a struggling startup trying to figure this out on the fly :)

Usually, someone comes along with a new approach which changes the games.

As I understand it 'Neural Networks' i.e. 'Deep Learning' style AI has changed everything voice related quite a lot.

And also - different business approaches can change the game. Google has access to zillions of phrases for properly transcribed audio phrases. This is the 'golden asset' that can underpin a really great voice recognition engine. Google voice is even better than the old industry standard - Nuance - in many scenarios and my hunch is that it's the size of their training data that has given them an edge - at least that.

yomly · on Nov 14, 2016

>You can 'massively decreasing returns to complexity' in these domains.

This is a really concise expression I've been looking for the sentiments you've just laid out so thanks for that!

Really like your insight in Google, think it's spot on.

Re. 'Product Management' rules - would love to know more about this? Do you keep a blog?

volker48 · on Nov 12, 2016

Yeah, I was thinking the same thing. If you can tell which group a particular user belongs to then you can train two models and optimize them independently. Then you just select the most appropriate model for the user.

karmacondon · on Nov 12, 2016

Practically this almost never works out. The 10% cluster is using a very small dataset and will produce inferior results. If you train a model based on only 10 people, you're prone to overfit that small sample

kleigenfreude · on Nov 12, 2016

> 1) It's 'hard' because you need a lot of 'training data' in order to train models etc.. It's hard to get.

Also, I'd imagine that the data could be bad/incomplete, e.g. data was collected in an inconsistent manner or in the wrong areas, leading to an incorrect solution that fits the data, but doesn't solve the problem.

This is the biggest concern I have in using the data that we've collected to come up with a solution using ML: no one ever intended for the data to be used for the purpose for which I would use it, and is incomplete or incorrect.

However, I think the chance of good things coming from inadequate data outweighs not trying to make use of the data.

edblarney · on Nov 12, 2016

Yup.

You HN lads are smart, you're pretty quick to figure out all the 'next problems' that one would encounter.

Yes - getting the right training data can be surprisingly hard.

Did you know how hard it is to get a 'very official' large set of words for a given language? It's hard!

There is no entity that really decides what language is - so you have to kind of determine it from what people write. But that takes a lot of writing, and frankly, you're making assumptions all the time there.

France has a body that's 'in charge' of their language so to speak, and most Western nations have entities that are 'roughly' that. Beyond the West, Japan and China ... it's a gong show.

'Filipino' is barely a language - even though many millions of people speak it, it varies in dialect from village to village and they barely resemble each other.

I think that someone will eventually come up with a 'probabilistic' OS because in the real world, nothing is certain ... some things are just more likely than others!

mchaver · on Nov 13, 2016

An official set of words is only useful if your NLP task is restricted to items that themselves are restricted in their language use. Twitter and SMS data sets are interesting because they represent something closer to casual speech rather than formal writing.

The French Academy provides an official dictionary and language usage, but speakers hardly restrict themselves to its contents.

Filipino mostly refers to the Manila dialect of Tagalog, whereas Tagalog is a language with many dialects existing in the Philippines. There are lots of languages in the Philippines but as far as I know they aren't referred to as Filipino.

For a lot of NLP problems you will probably have to make your own data set. It can be a lot of work.

tmptmp · on Nov 12, 2016

Beautifully written and insightful.

>>After much trial and error I eventually learned that this is often the case of a training set that has not been correctly randomized and is a problem when you are using stochastic gradient algorithms that process the data in small batches.

Take this single term from the above sentence: "stochastic gradient algorithms", they represent three key areas: statistics, calculus and CS.

These three things, even when they are to be studied in isolation are much complex. For ML, you must be able to juggle these 3 fireballs effectively. No surprise, it's much, much more difficult than many other software engineering problems.

ttub · on Nov 12, 2016

No offense intended, but what is your background?

I'm a researcher with a physics/stats PhD, and if a colleague approached me and said "stochastic gradient algorithms" entails three highly complex areas of scientific knowledge, I would have been stunned and assumed an undergrad with an English major had stumbled into our lab.

Just because you find something extremely challenging, doesn't mean it is inherently challenging. Considering what a lot of people in my field is struggling with, your example is absolutely trivial. You might want to adjust your ego downwards a bit.

tmptmp · on Nov 12, 2016

No offense taken. I feel humbled. Indeed, your argument supports my view.

I have deep and very high regard for the people who are able to apply ML to fields like DNA analysis or NLP which can take the dreaded "Turing test".

I stand nowhere in the ML arena, but I had tried once and got a good shock of my life: how hellishly difficult the ML can get and how quickly. I really feel humbled. If anything, I learnt to appreciate the width and depth of human brain capabilities. It seems entirely magical now to me how on earth does my brain process/understand such complex things like this very paragraph. Prior to some exposure to ML, I couldn't have appreciated this thing.

>>Just because you find something extremely challenging, doesn't mean it is inherently challenging.

Agreed. I never claimed it anyway. But the kind of problems, for which ML is being applied, the state of the art existing "analyzable algorithms" (like, finding approximate near-optimal solutions for TSP) are far from trivial. In addition to this, we must realize that the ML solution must "beat" these algorithms hands-down in "non-trivial" cases. All this makes ML extremely difficult.

I agree that for real world (and not necessarily state-of-the-art) ML applications, you have to handle many more fields in addition to these 3 fields. All I say is even these three things, when taken together, are very complex things to handle.

edit: typo

adriand · on Nov 12, 2016

I know a simple upvote may (and perhaps ought to) suffice instead of writing this, but I feel compelled to comment on how impressed I am that you responded to "You might want to adjust your ego downwards a bit" with such class and humility. Truly refreshing, truly commendable. I'm using it as a learning experience, because my initial reaction on reading that comment was highly negative and I sincerely doubt whether I would have been able to summon the kind of response you did.

On a morning where I woke up feeling anxious and worried after the events of Monday, this was a small but appreciated little reminder that there's still hope.

Cyph0n · on Nov 12, 2016

I deeply respect the humility of your response, even though I personally would have responded with a simple downvote.

I was curious about what you do, so I visited your HN profile to find more details. As a HN user - and a Muslim - I believe that it would be more useful for yourself and others if your profile listed something about you and where to contact you, rather than a wall of text arguing for why Islam is evil.

Anyways, I hope you have a nice weekend :)

chrisbennet · on Nov 12, 2016

Strictly as a thought exercise, consider how often you find yourself saying "No offense intended."

The person you responded seems like most of "us" - most of us aren't ML experts. He humbly shared his experience with dipping his toe in the ML pool and reported back that he (metaphorically speaking) had to chip the ice off the pool before he could even get in.

This was useful information to "us" (non-experts interested in learning about ML). Calling his difficulties "trivial" for an expert in the field while true, came off as condescending. Telling him (him!) to check his ego came off as egotistical.

No offense intended. ;-)

pault · on Nov 12, 2016

If you hadn't included that last sentence I wouldn't have downvoted your comment. You can't just say "no offense intended" and then insult someone. Anyway, most of the people reading this are probably regular software developers that are interested in ML, not PHD holders in any field. It's obviously not written for academics at the leading edge of the field. And if something takes several years of dedicated study to master, it's challenging.

kevinwang · on Nov 12, 2016

I didn't find the parent's comment egotistical at all. I believe they were just commenting on the three different fields that combine to form stochastic gradient descent. The fact that it's viewed as fundamental or trivial only supports his point that things in machine learning are hard, and may require wide knowledge breadth.

taneq · on Nov 12, 2016

I understand how it would look this way to you, but you have to realise that these areas only seem simple to you because they're exactly what you've spent a large part of your life studying.

It's like you're a senior Navy pilot, and you hear a crop duster pilot saying that the Osprey is difficult to fly because you need helicopter piloting skills, plus multi-engine fixed wing piloting skills, plus experience landing on a carrier. He's not wrong, it just doesn't sound that bad to you because you just happen to specialize in the exact combination of skills required.

DavidSJ · on Nov 12, 2016

He didn't say he found it personally challenging. He said this part of optimization finds itself at the intersection of three deep fields of human knowledge. That is undeniably correct.

quonn · on Nov 12, 2016

Right, but in this particular case the "stochastic" and "algorithm" parts are trivial. For example, you don't really need to draw on CS/algorithm knowledge to implement or understand gradient descent.

partykid92 · on Nov 12, 2016

Thats like saying bubblesort lies at the intersection of combinatorics, topology and complexity theory. True, but pretentious

Al-Khwarizmi · on Nov 12, 2016

I don't think it was pretentious, and your analogy is quite off. You don't need to understand any of those fields to grok bubblesort (algorithmics is enough). You do need to know some statistics, calculus and algorithmics to understand stochastic gradient descent.

partykid92 · on Nov 13, 2016

you don't need to understand statitstics. calculus (calculus is a high school subject and is hardly deep), and what is "algorithmics"? If you mean the study of any and all algorithms, then sure,

sixo · on Nov 12, 2016

3 fields, but each only at an introductory level.

throw_away_777 · on Nov 12, 2016

By the time you have a PhD a lot of problems may seem simple, especially in your field, but you spent 6 years of study most people haven't. The truth is that machine learning is the intersection of statistics, data analysis and software engineering. This makes it hard to learn, and also hard to get a job as companies are looking for people who are experts in all 3 fields.

eva1984 · on Nov 12, 2016

I don't know what OP means by saying SGD. If he/she means it is Gradient-Based optimization algorithm as a whole, it is definitely a challenging and open question, and a lot of researchers trying to improve them.

eli_gottlieb · on Nov 12, 2016

I understand stochastic gradient descent just fine, but you need to reverse your thought here. Just because something is trivial for us doesn't mean it lacks inherent difficulty. After all, if it was inherently trivial, pretty much everyone would be able to actually derive and implement the algorithm, which the vast majority of the human race can't.

mattlondon · on Nov 12, 2016

Congratulations - you know something about your field. No need to be a dick about it.

I am sure there are fields where you don't know stuff, so please grow up.

amelius · on Nov 12, 2016

> Take this single term from the above sentence: "stochastic gradient algorithms", they represent three key areas: statistics, calculus and CS.

Well most practical problems require knowledge of different disciplines, so this is nothing special.

For instance, building a fluid dynamics solver requires knowledge of: physics, numerical mathematics/differential equations, computer science, computational geometry, and probably a few more.

atomical · on Nov 12, 2016

I'm getting a 404.

a_bonobo · on Nov 12, 2016

Me too, the whole blog 404s

Here's the cached version:

https://webcache.googleusercontent.com/search?q=cache:https:...

dgellow · on Nov 12, 2016

try the HTTP (non SSL) version. I had to disable HTTPSEverywhere

dschiptsov · on Nov 12, 2016

Because most of the models are flawed or wrong.

fucking_idiot · on Nov 14, 2016

mostly because the implementation is really tough - mostly lots of matricies and calculus. i recommend using sklearn.