Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning (xkcd.com)
546 points by tchalla on May 17, 2017 | hide | past | favorite | 128 comments



Reminds me of this horrifying stack exchange post: https://stats.stackexchange.com/questions/185507/what-happen...


One of my colleagues (an engineer) suggested something like this when I worked at a factory. My jaw dropped and I just stared at him. I had nothing to say. I guess he's a manager now.


I can't believe that's real!


wow, that is actually horrifying.


ELI5?


The manager of the person who asked the question thinks that if you take data in form of pairs (X, Y), split them up, sort independently and combine again, you'll get better results. In fact, such operation obviously destroys the relationship X had to Y, so the result is meaningless.


Ha, gotcha! Maybe if you do it enough times...in the cloud?


No no, results stored on The Blockchain.


A chatbot can read them back to us.


Only if it's implemented in Rust. Any other language wouldn't be safe.


But Rust is not Web Scale™. Go.js is Web Scale™.


You'd be better off just going with whatever "result" you want regardless of the data, and save on the AWS costs.


Doesn't it also assume you'll have access to the correct answer in production as well? Is X the observation and Y the correct answer, as often indicated by that notation?


Suppose your dataset consists of the following points.

(1,5)

(2,3)

(3,10)

(4,4)

Clearly there's really not a linear relationship between the first coordinate and the second coordinate. In other words, if you plot these four points and try to approximate them with a single line in the plane, you just can't do a very good job.

The person's manager suggested sorting the second coordinates in this list while keeping the first coordinates fixed, resulting in the following:

(1,3)

(2,4)

(3,5)

(4,10)

These points are still not collinear, but they certainly can be better approximated by a line than before. The problem is that this is simply a completely different set of points, so a linear approximation here implies nothing about the original dataset.


Imagine you were trying to measure whether or not increasing X proportionally increases Y. Now imagine you first sort your data points, Xi and Yi, independently -- it will now misleadingly appear like Y increases when X increases since the variables are both sorted. You've scrambled your data (Xi, Yi) to be (Xk, Yj). So you're essentially working with a totally different dataset now.


It's actually quite a simple question, just asked using a lot of overly complex language which probably confused the questioner and his boss.

It's no different to having a list of key/value pairs and sorting the keys and the values independently, and hoping to get a meaningful result which is absurd.


That is an excellent analogy.


OMG. This is a data scientist trap. They can't help themselves! It's actually a great question to test fundamental understanding and ability to explain.


Meanwhile I don't know how to get enough votes on my new stack exchange account to enable commenting.


Wtf wow


Is this a joke?


Tom Scott yesterday made a video for laypeople on the topic of 'black box' machine learning and how it can be difficult to get it to behave as you want, too.[0]

It's an interesting watch - I'd recommend it if you're interested in learning about it.

(Heck, I'd recommend the channel. Tom does some great videos on a number of different topics.)

[0] https://www.youtube.com/watch?v=BSpAWkQLlgM


His second channel is very nice as well. More casual and conversation based, and a bit less interesting on the topics it self. But nice nonetheless


Isn't he the guy from the hello internet podcast?


No, HI is Brady Haran and CGP Grey, though I suspect Brady may have collaborated with Tom.


"Just stir the pile until [the answers] start looking right" is actually a pretty decent description of gradient descent.


It's clearly simulated annealing. You stir less as you get tired. :)


Simulated annealing also exploits gradient information, so it is not equivalent to random search, although it is a stochastic algorithm.


With gradient descent you are walking in the direction of negative gradient. Stirring the pile implies that you are walking randomly from point to point.


No, by stirring you're just making the pile to move, and things descend down the gradient of gravity. :).


The little things moreso. Brazil nut effect!


When you stir, you're stirring in a pattern, probably a circular one. Plus you're probably not paying too much attention to the shape of the pile, so it's gradient-free. It's pattern search. Except it's only approximately a circular pattern. There are random variations. Stochastic pattern search.


So more like a genetic algorithm then?


GAs also exploit structure in the search space.

To see this most clearly, read up on estimation of distribution algorithms (EDAs), which generalise the ways in which GAs work.


It is more like tuning the hyper-parameters, which is guided by anything but logic.


It actually seems almost like the definition of RANSAC.

https://en.wikipedia.org/wiki/Random_sample_consensus


Gradient descent is not random search, so this is an inaccurate description. It works by exploiting the gradient in a structured landscape of solutions.


I'm not so sure. Gradient Descent refers to reaching the best solution available. Xkcd seems to be referring to more like the Bonferroni principle where you are looking for patterns without a hypothesis and post facto justifying it.


I think the point is about how data manipulation is about 80% of machine learning work. If an algorithm is giving you crap results for some data set, once you've fiddled with its hyperparameters through cv and so on there's not much you can do besides data manipulations like PCA, ICA and the like, to try and get a better result. Most algorithms work pretty badly with raw data.

I'm guessing that, in the sciences, that is a big no-no. Imagine if doctors, seeing that a new drug being trialled is failing to cure a disease, simply started chucking out the sick subjects until all the ones that were left where healthy, declaring the sick ones to be "noise" and the trial a success. Somehow, I don't think that would fly...


This is confusing "cherry picking" (selecting some data and discarding other data) with "data transformation". Transformations are applied for perfectly good reasons, such as dimensionality reduction, or to enable the subsequent application of certain statistical tests that rely on assumptions not valid for the data in its original form.


> I think the point is about how data manipulation is about 80% of machine learning work. If an algorithm is giving you crap results for some data set, once you've fiddled with its hyperparameters through cv and so on there's not much you can do besides data manipulations like PCA, ICA and the like, to try and get a better result.

Data manipulation is different from data transformation. Manipulation changes the nature of data, transformation does not change the nature of data. PCA is data transformation, not data manipulation.


Tell me what you mean by "the nature of data" and I'll tell you if I agree with your definition. I don't see why the dimensionality of a dataset is not its "nature" for example.


AKA half the work done in Machine Learning nowadays.


To be precise: the best solution reachable from a given starting point, which may be a local minima rather than a global minima.


(iterative) stochastic gradient descent to be specific. Data is stirred for each pass to prevent cycles.


This is _so_ on point considering a lot of my customer interactions. Until I go in and go over establishing a process framework that goes through the models in use and sets criteria for their evaluation (even something as simple as ROC), a lot of ML work in companies is mostly tinkering with things (sometimes with wrong theoretical underpinnings) until it performs adequately.

The hype is only real if you systematically work it into measurable process, not virtuoso jamming.


Stone age tools as Chomsky calls them. The problem with simple to use tools, is the simplicity masks two key facts from the user - whether aptitude for the tool exists & amount of effort and time to mastery.


That's some nice snarkiness about how modern machine learning works. But let's not forget that this apparently "dumb" approach has beaten out much more intelligent seeming systems on many tasks. To me, this means we shouldn't be so confident that we know what an intelligent system looks like. Maybe effectiveness in AI doesn't have much to do with human interpretability, and even less to do with whether humans find the approach intellectually satisfying.


Is it survivorship bias? ML attempts that fail are cancelled and never heard from again, those that succeed are publicized.

The same approach works with pig entrails: a bunch of people make predictions, the ones that fail go away, the ones that happen to succeed a few times "must work".

Or in stock market terms, "past performance is no guarantee of future results."


In a way you're exactly correct and that's exactly why machine learning works. The models specify a huge range of possible input-output mappings and then do a dumb search to find the ones that work; the mappings that don't succeed are discarded. It turns out, surprisingly, this is the best approach we have come up with.

> Or in stock market terms, "past performance is no guarantee of future results."

Past performance isn't a guarantee, but under mild conditions it is very strong evidence that there will be future results.


And lets' not forget that one lucky hit with pig entrails is enough to build a career that lasts a couple of decades.

I'm trying to remember where I read about this but, allegedly, there used to be a gentleman in New Orleans (if memory serves) who went around handing, at random, sealed envelopes with "boy" or "girl" written on a piece of paper inside, to pregnant women.

The idea was that he could expect to hit the right sex of the unborn child a few times and that those lucky hits would make people think he had some sort of gift for seeing the future. As to the misses, the women would be too preoccupied with having just given birth to raise a stink. Note the envelopes were handed out for free. It was his advertisement, see.


>Is it survivorship bias? ML attempts that fail are cancelled and never heard from again

Which is exactly how we got from single cell organisms to human level intelligence. Of course it took 4 billionish years for that to happen. Life, hence intelligence is survivorship, that's the selection mechanism.


Like other tech fads, there is a reasonable basis of useful stuff underneath a much larger lather of big, frothy bubbles. Almost every idiot is running around talking about using "machine learning" for their new system without understanding what this means, just as they're doing with "containerization", "orchestration", and so on.

The rule of thumb is that if Google or Facebook releases some toolkit for something, the next 3-5 years are going to be a hellscape of idiots clogging up the channels for the relatively small number of people who may have an actual, legitimate use for these tools. But since the idiots are bored at work and can tell their bosses, "Well, Google does it this way, so it must be the best! We're important just like Google, right?", we have to languish through their tiresome drivel, and watch as they drag their companies through a quagmire, only to propose the next fad as the savior a couple of years later.


Human intelligence is rapidly becoming obsolete.


True that. But you could replace "Machine Learning System" with "code base" and "linear algebra" with "C++" and it would be an accurate description of most corporate software development I've seen.


Eh, we (collectively) have a pretty good grip on how code-bases, linear algebra and C++ work, but we still have no idea how minds and intelligence work. And so the best we can do with ML right now is stumble blindly in the dark, poking and prodding things until it 'seems' to give the right answers, some of the time.


Imho, it's more typical of machine learning. Sure, both can be very similar and are often signs of a lack of deeper understanding.

I'm no ML expert by any means, but I've seen several bachelor/master thesis and even ML competitions where ensembles performed best. Sure, this isn't necessarily aimless stirring and could combine models that really capture different aspects of the data. But often enough it's just several algorithms that do the same general thing, combined to achieve a slightly higher score.

Imho this is most relevant when competitions provide data that is not readable by humans (e.g. simplified: "classify these documents where all words are given as word IDs and never as actual strings").

To me this has a touch of pouring in data, stirring (build many classifiers and plug them together in an ensemble), and getting answers on the right side.

Optimizing hyper parameters goes in a similar direction, imho. I can really see an analogy to stirring


You don't throw random C++ code at the compiler to see if you get the correct result.

At least, most of the time you don't.


I think this is one of Munroe's general digs at corporate culture, and not specifically aimed at machine learning, that's just the current buzzword.


It feels like a dig at how people do machine learning these days though. My impression is that people just take TensorFlow and a random neural net, and keep throwing gigabytes of data at it until they feel the results look like they should...


Random Neural Nets should totally be a real thing.

I mean more than they already are.



At University, we did neural nets via genetic algorithms. Does that count?


Bingo. :)


> Random Neural Nets should totally be a real thing.

Reservoir computing. Some are critical of this method.


There's a long history of that: http://catb.org/jargon/html/koans.html#id3141241


Courtesy of "Machine Learning in $(some single digit number) Minutes".


I like a different analogy. In old times the oracles and shamans danced in masks wielding amulets of power and magical weapons. Modern day shamans dance with datasets and clusters, but the main principle is the same. If my performance produced a desired effect (correlation does not imply causation), that is, obviously, due to my undoubted magic powers. If it fail, which is usually the case, that is because the data was not big enough and there was not enough money given to perform big-enough sacrifice to please the gods.

We actually might learn a lot from Tibetan tantric practitioners. It seems that Wall street guys and economists did.


My advice : if you see a pile of sticks growing in the car park, make a swift exit.


As famous russo-ukrainian sentence goes, "is it a defeat or a victory?"


I don't know what is the Russian equivalent of this saying, but the Ukrainian version that got very popular recent years has an interesting culturological twist. The word that is used as an antonym of "victory" does not translate to English as "defeat", but instead it literally means "betrayal". This is because in Ukrainian history we almost never lost by being defeated, and almost every time our defeats were results of betrayal. And same thing applies to the ongoing war: we don't really fear a military defeat as much as a political betrayal (either internal or external).


> in Ukrainian history we almost never lost by being defeated

Nice to see Ukrainians reinventing German Stab-in-the-back mythology from interwar period.


As in all things, Greeks invented this first. Remember Efialtes, the guy who betrayed the Spartans in Thermopylae?

(In modern days also, all my friends who like football, their teams never lose a match. It's always the referee who is on the side of the other team).


It's a very powerful hammer in our psychological toolkit, and it's been used countless times throughout history.

No one wants to blame the Everyman and his masculine valor for failing the country. So the parts of the national leadership who started the war need to deflect blame, and they do it by attaching themselves to martial myths and posing as defenders of the Everyman. Not like those other dastardly effete leaders who oppose war, who are in a rhetorically weaker position because they're correctly acknowledging that individual valor doesn't play a major role in the outcome of the war. They're easily portrayed as trivializing valor and not glamorizing it sufficiently.

If you talk to old white American soldiers, you can hear the same thing about Vietnam (damn liberals!) and more recently Iraq (damn liberals!). I'm sure you could hear similar things from old Brits nostalgic for Empire.


It's much more complex than that.


After making such a statement it would be proper to elaborate on what you meant there. Don't get me wrong, I am not yearning for a debate on Ukrainian historiography here, I just believe that borderline chauvinistic statements like that should be checked.


I get your point. It would be chauvinistic if I said that we are so strong that we can't be taken by force (which is impossible), while actually I'm just saying that we are too corrupt (so that every time a pivotal point comes, someone gets bribed and betrays). And also, I'm speaking of the last 300 years, and not really the _entire_ history.


"Is it a treason or a victory?" would be a more precise translation. And yes, it's Ukrainian only. It's quite surprising to see this funny national meme pops up on HN.


"Yes"


that would be actually very Chinese answer


or a very boolean algebra answer


I'm waiting for the followon - I stirred it up, I got good results, and now I can't seem to reproduce them again :-(


There is this famous paper "I Just Ran Two Million Regressions". Any half decent statistician will tell you what's wrong with that type of approach.


Could one describe machine learning as "combine enough bad statistical tests that the cries of 'it seems to work' from managers drown out all the 'it's wrong' cries from the statisticians"


In machine learning circles, this is called "overfitting".


So I am curious, like to a software engineer who has never learnt much of stats how difficult has it been for them using these things in the field to solve problems with, if they are any doing so? Like do they have a problem because they don't have a grounding in stats or do these things pretty much "work" assuming the models are coped to fit the problems.


Not hot dog.


Seems to be releated to this recent one:

https://xkcd.com/1831/

Something tells me Rundall Munroe had a run-in with some over-eager data scientists recently.


So far Deep Learning has made this Xkcd obsolete: https://xkcd.com/1425/

Check whether a photo is of a bird.


Can a computer algorithm reach a human-level of skill in identifying birds in a photo in all situations? That sounds like a very hard problem indeed.

For instance could an algorithm conclusively identify the birds in all of these pictures without having too many false positives?

https://ak3.picdn.net/shutterstock/videos/6087611/thumb/1.jp...

http://www.hippoquotes.com/img/impact-of-nature-quotes-in-fr...

http://www.birdsasart.com/baacom/wp-content/gallery/cache/17...

https://vztravels.files.wordpress.com/2014/05/img_1390.jpg

http://www.zastavki.com/pictures/1920x1200/2011/Animals_Bird...

This is not a rhetorical question by the way, I genuinely don't know the state of the art in this field. If it's indeed possible to do that today I'll be extremely impressed.


Detection accuracy is fine. We are actually at a point where NNs can make photos :)

Convert string:"this small bird has a pink breast and crown, and black primaries and secondaries." into a photo.

https://www.youtube.com/watch?v=rAbhypxs1qQ


That's impressive, but I'll point out that the bird photos in this video are all clean, well focused close ups, that's probably easier to process than random pictures.

If you wanted a general algorithm working on non-curated data (like tagging facebook photos for instance) I'm sure it would be significantly harder.


Check out the (deliberately blurry) examples in https://arxiv.org/pdf/1703.05393.pdf where it can distinguish between blurred, low resolution pictures of different types of crows.

It's only ~50% accuracy, but the photos are terrible. Much worse than Facebook pics.

OTOH, this is classification into hundreds of classes, not millions like in the case of FB face recognition. (Although of course that can use the connectivity graph as a filter on that too).


This is all doable today. As an example, check out some bird photos[1] from the Visual Genome[2] project that are similar to your examples. I selected the photos and hosted them on Imgur in hopes we don't kill Visual Genome with traffic ;) The systems to do this today are not highly efficient or without flaws but it can certainly be done.

The research group I am part of, Salesforce Research (formerly MetaMind), have a model that does this "accidentally" - and there's even an example image of a bird[3]! The model is only meant to provide a caption for an image, not to segment the image into the various objects, but learns to "focus" on the bird as part of describing the image. For those particularly interested, check out the paper "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"[5].

Systems made specifically to segment an image into objects would obviously do far better. For an example of that, check out "CRF as RNN - Semantic Image Segmentation Live Demo"[4]. There are many more systems of this style floating about.

[1]: http://imgur.com/a/1UPnn

[2]: http://visualgenome.org/

[3]: http://imgur.com/vbX8NNZ

[4]: http://www.robots.ox.ac.uk/~szheng/crfasrnndemo

[5]: https://arxiv.org/abs/1612.01887


I think you underestimate the problem, which is not to get an output that says "Bird", but one that says "Specific breed of bird."

Human experts can get enough clues from the bird shape and the context to do that in the sample photos. I doubt your captioning system can.

This is a good example of a standard problem in ML - underestimating the complexity of the problem domain.

You could argue that your system only needs to do the simpler task to be useful, and that's likely true. But if the goal is to approach human expert levels of classification, it needs to improve by at least a few levels.

I suspect getting it there would run into some interesting performance constraints, and possibly some theoretical issues too.


No, ML is very, very good at doing breeds. See, for example https://arxiv.org/pdf/1603.06765.pdf which gets 88.9% accuracy on the Stanford Dogs dataset, and 84.3 on the Caltech Birds dataset.

These are way better than anything a non-expert human can do. For example, it can distinguish between the Rhinoceros Auklet and the Parakeet Auklet.

I'm not sure what expert performance is, but around 94% is where humans top out on most tasks.

Also, the parent poster knows what they are talking about: https://www.semanticscholar.org/author/Stephen-Merity/337544...


How far away is ML from identifying everything in a picture?

For example if we could take a photo of Noah's Ark loading up every animal?

Do you just loop through each NN you have on each species?


A single NN can predict more than one class of object. The ImageNet competition has 20,000 classes.

There's also image segmentation as another poster has pointed to.

In the case of FB face tagging, they'd have learn an embedding space for faces, and when a new image comes in they'd place it in the embedding space along with all the person's connections and find the nearest neighbors.

See https://arxiv.org/abs/1503.03832 or the implementation https://cmusatyalab.github.io/openface/


This seems to be what you are looking for: https://code.facebook.com/posts/561187904071636/segmenting-a...


The problem posed in the xkcd is "Check if the photo is of a bird", not identify the bird in question. As far as identifying the bird species, that would probably be harder, because I'm guessing there's very few human experts who could reliably do that across a wide spectrum of species, and without knowing the context of the photo.


RESnet in 2016 achieved 97% accuracy on the ImageNet challenge with hundreds of classes. I think that's near human accuracy.


It DID take a research team and ~ 5 years to get there? :)


Others have pointed out that this problem has not been solved in the general case.

More importantly, the progress that has been made in recent years actually builds very heavily on work since the early 1990s, so not only is it not complete, what has been achieved took a great deal longer than 5 years.


Well, because we had to invent the massively parallel GPU in between those times. Essentially work that would take an entire supercomputer cluster in the 90s can be done on my desktop with 4 high end GPUs stuck in it.

Now that we are in the range of having the correct hardware the whole "it's taking decades issue" will go away.


GPGPU was definitely part of the success of recent years, but there was also a lot of experimentation and hard work carried out e.g. on CNN designs. Lots of trial and error. That took a lot of time. There have been fundamental changes in the structure and training of NNs that have helped bring the step-change in success.


There was a previous discussion about this nearly 2 years ago on hn:

https://news.ycombinator.com/item?id=10239401


Interesting, it shows how things can move quicker than expected, even when judged by enlightened people.

Is there a way to know the date (1425) was released ?


There's an army of Machine Learning researches out there. Arguably a single team would have taken much longer, even on a narrow task.


24th September 2014


Programmers often have Scotty syndrome and pad their time estimates to look like geniuses when they solve ahead of schedule.


> to look like geniuses when they solve ahead of schedule

No, it is because estimating software tasks is difficult, the penalty for underestimating is that people think you are dishonest/flakey, and there isn't anywhere to get an education in how to do it well. The default advice given to junior engineers is therefore: "take your intuition and triple it." I hate that this is the state of the industry. My interactions around estimation over the past 5 years since uni have literally made me feel nauseated and near fainting on multiple occasions. I would love for Joel or Klamezius or Uncle Bob or someone else to fix it and produce a good course on how to create estimates.


There isn't a way. It's an issue that blights everyone in the industry.

Probably the best your going get is the book "Software Estimation: Demystifying the Black Art "

Even applying those techniques you get it wrong.

Most experienced software companies have adopted agile, and accept reductions in scope to meet deadlines as something that happens.


Agreed, agile seems the only way, but does indeed require experienced managers. A lecturer once pointed out that business/normal people would always expect some kind of point estimate, they are never satisfied with some kind of distribution or interval. Personally, I would say that this is even more sad than that: the point estimates are always taken at the extreme values, which ever suits the person wanting the estimate more, never the average value.

Of course, all this leads to bad blood between techies and business side: how long will it take? -> probably about 3 weeks, but this requires using a library we haven't used before, so in the worst case even 2 months -> what? so long? get it done in 4 days, this is required the next week -> no, that's not really possible -> make it happen -> it happens and it either sucks when it's delivered at all, so the deadline gets extended anyway to iron out all the bugs or it causes lots of problems in the future.


For very long projects, I have seen much delay because of feature creep.

"OK you have implemented it as requested, but finally the customer does not like it, it needs to be slightly different. Can you do it quickly?"

Sometimes it is easy to adapt, sometimes next to impossible.


Or when you allow for hilarious false positives/negatives. Sometimes birds are birds, sometimes they are cats and cats are birds, sometimes they are dogs. Everything is possible with the right training set and machine learning.



By the looks of http://explainxkcd.com/1425 it was September 2014.


First copy in Waybackmachine is Oct. 2. 2014. https://web.archive.org/web/20141101000000*/https://xkcd.com...


24 Sep 2014 (hover over the title in https://xkcd.com/archive/). So, less than 5 years ago!


It's funny that this also seems to work for humans.

When the outcome is bad/not what humanity wants, we get/give a negative response and hope the outcome next time will be better.


The second part quite accurately describes most of the generalization techniques. Especially when it comes to deep learning.


For webcomics and ML, see this footnote: http://p.migdal.pl/2017/04/30/teaching-deep-learning.html#fn...

"It made a few episodes of webcomics obsolete: xkcd: Tasks (totally, by Park or Bird?), xkcd: Game AI) (partially, by AlphaGo), PHD Comics: If TV Science was more like REAL Science (not exactly, but still it’s cool, by LapSRN)"


This thread is hilarious.

ML is powering most or even all self driving car efforts underway, powers online translation services, numerous vision projects and speech recognition besides winning competitions meeting or exceeding human performance on the same data.

I'm just as allergic to those that hype some technology as I am to those that will snarkily discard something with arguments that have already been laid to rest, in some cases multiple years ago.

Xkcd is fun, but isn't necessarily prescient nor does it have to be accurate, when this cartoon was published the writing was already on the wall and that's 3 years ago. It's fine to be skeptical about new technology but before you start criticizing it make sure that you have at least a rough idea of where things stand lest you end up looking foolish.

Sure, ML is abused and if we're not careful we will see another AI winter because of silly hype and ascribing near magical properties to ML. But at the same time snark, condescension and a-priori dismissal of what is most likely the biggest landslide in computing since the smartphone is - especially on a site that deals with both hacking and novelty - something that I would not expect.

Compared to the HN love for the next JS framework or language fad this attitude is surprising to say the least.


Do you think it's connected in any way to https://www.wired.com/2013/02/big-data-means-big-errors-peop... ???

If not, what is your interpretation of the XKCD cartoon?


Couldn't a DNN be trained to inspect other DNNs and generate human-readable explanations of how they work ? In the fewest words, with maximum poetry


Brilliant idea! Why not try it out yourself and see how it goes?


don't know anything about DNNs!


Can we load all XKCDs into a neural net (LSTM, or something), and train it to "dream" new ones?

Like the neural nets which generated non-sense, but realistic looking C++ code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: