I found some slides [1] explaining how this works. A less poetic example of priv...

apw · on Sept 28, 2014

I found these slides very helpful:

http://fodava.gatech.edu/files/uploaded/DLS/Vapnik.pdf

In particular, I was unsure after reading the original article whether the additional information--for example, the poetry--was available to the learner on test inputs. The above slides explicitly state that it is not.

tehwalrus · on Sept 28, 2014

Unfortunately, I don't find that reference very helpful - it's just pages of annotated equations.

What's an example of pseudocode that would actually implement this? Surely you don't load a natural language module in order to parse the pathologist's notes (in the example given in the reference about biopsies)?

(I should also note that the original article is devoid of any technical examples, making it completely opaque to me what it actually entails.)

skybrian · on Sept 28, 2014

This might be somewhat better. Here are slides from a course that the researcher taught about this algorithm:

https://www.cs.princeton.edu/courses/archive/spring13/cos511...

Here's a directory containing the data for the number learning example:

http://ml.nec-labs.com/download/data/svm+/mnist.priviledged/

lars · on Sept 28, 2014

It's worth pointing out that Vladimir Vapnik is the inventor of Support Vector Machines. The short version of what he's done here is he's come up with a way of formulating them that allows him to make use of extra information at training time (that is not available at test time).

It really is a very innovative approach IMO.

tehwalrus · on Sept 29, 2014

That is an important point - the extra info is only available at learning time (otherwise you need a physician sitting next to the "cancer-scanning" computer slowing down the clock speed by doing the analysis themself.) This seems obvious once you say it, but it had not occurred to me before, thanks!

skybrian · on Sept 28, 2014

Yes, I mostly don't understand the math either. But apparently with the poetry, they converted it into a vector somehow based on the appearance of keywords. Perhaps someone will find a friendlier example.

theoh · on Sept 28, 2014

I think an example is this: for a concrete set of data, see the distributions on page 28 of this document (page 40 of the PDF) http://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/harpur_thes...

These different distributions are difficult to distinguish with statistics. However, if you can see the shape snd therefore know the "rule" or the structure of the distribution, it's easy to design/train a network to recognise them.

(note that the content of that PDF is not about this problem per se)

tehwalrus · on Sept 28, 2014

I imagine a keyword decomposition would work very well with a pathologists' report too - essentially using the biological feature-names as "tags" on the image would eventually allow a program to correlate an image of that feature with a name much faster than a general "good v bad" fitness function.

If this is how it works, it's a shame this isn't made clearer in the texts. You could get someone to tag images whimsically but consistently, rather than go the whole trouble of writing poetry.