Hacker News new | past | comments | ask | show | jobs | submit | npp's comments login

In the same vein -- John Nash once said that Minsky was the smartest person he ever met.

...He also said that he thought McCarthy was an idiot. :)


This article reminds me of a lot of what Dalton Caldwell spoke about in his talk about why not to start a music startup at a Startup School in the last several years.

This business seems to have been run and have proceeded horribly, for the exact reasons that Caldwell and everyone else with experience in that area seems to be extremely familiar with.

Is there anything legitimately interesting to the "Apple" and "Steve Jobs" parts of this story other than the usual clickbait?


I found it interesting to see how - if the story is true - your enemies may not be who they appear to be. He thought the labels hated his business whereas they may have actually been quite partial to it in slightly different circumstances. That's useful to know for when your circumstances change.


Your blog post makes an important mistake in that it recommends actually inverting the matrix X^T X, which one never actually does in this case. In general, A^{-1} b is sort of mathematical "slang" for "a solution to the linear system Ax = b", not "compute A^{-1} and multiply b by it".

This is discussed in many places, but here are a few examples:

http://www.johndcook.com/blog/2010/01/19/dont-invert-that-ma... http://en.wikipedia.org/wiki/Linear_least_squares_(mathemati...


One way would be to take a course on a topic that uses linear algebra heavily, rather than taking a generic linear algebra course. Since all the ideas from linear algebra are then being used to actually do something, you'll get at least one example for why you would use whatever it is.

I would suggest trying these videos: http://www.stanford.edu/~boyd/ee263/videos.html. The prerequisites are very low and a main focus is on interpreting the abstract concepts in applications.


It's a good followup. Those models are widely used in computer vision, natural language processing, and other AI and applied ML areas, so those classes all go together.


In an earlier comment, you said that much of the time was spent in constructing the features (e.g. you had to implement CSS). Did you mean implementation time, or training/classification time? This latest comment makes it sound like most of the time is in downloading the page, while the feature extraction is relatively fast.

In any case, if the feature extraction is taking too much time, what is sometimes done is to dynamically select which features to extract for a test example based both on the expected predictive value (e.g. via mutual information or some other feature selection method) as well as the time it takes to actually compute the feature. This can be measured by, say, average computation time per feature on the training set. This can speed things up a fair bit if the feature extraction takes too long, since you only bother computing the features you really need, and are biased towards the ones that are quick to compute. This may not translate to your particular application, though, if I remember correctly, I've seen it used a while back for image spam classification.


Feature selection is an option, but not if all features require a certain preprocessing step.

My guess is that they need to render the page so they can determine the visual layout. So regardless of which visual features they use, the rendering step cannot necessarily be avoided.


Some higher-level math will be important, other parts will not be. The parts that will be more useful are more on the analysis side (real analysis, complex analysis, functional analysis, convex analysis, Fourier analysis, probability theory). These are higher math and are very applicable, or are prerequisites to understanding the applied stuff (convex optimization, dynamical systems, control, ...).

It helps to know what a topology is, but not much more, and you would learn enough "on the way" in learning analysis properly. It helps to know what groups are, because they do show up in practical things, but you don't really need to know full-up "group theory". (They show up because they capture the idea of symmetries, and it is useful in certain practical situations to talk about something being symmetric with respect to various transformations, e.g. under permutations or rotations or whatever. But in this case you don't tend to do much analysis actually using group theory beyond this.) A whole course on abstract algebra is not necessary unless you're interested. It may help in some indirect way of "helping you think better", it may not.

See, say, http://junction.stanford.edu/~lall/engr207c/ as an example of an EE course that does a fair amount of math.

(Also, above, I don't mean 'applicable' in the very indirect sense of "helping you think better" -- I mean people use it to do real stuff. Whether you want to do that stuff is another story -- there are certainly good things in EE/CS that don't require this kind of math.)


People have already explained the fact that the field now goes under a number of different names, and that it is very active, so I won't rehash this. A few pointers to things where you can keep track that are different from what others suggested so far.

1. Foundations and Trends in Machine Learning -- this is a journal aimed at publishing a very small number of well-written survey papers on various trends in ML. This is easier to follow than an entire conference (much lower traffic, higher signal/noise), and should be readable for a wider audience (assuming they are math-inclined).

2. Conferences like Algorithms for Modern Massive Datasets are practically-oriented, well attended by a lot of industry, and involve a lot of AI: http://www.stanford.edu/group/mmds/. Look through the speakers and topics. This is one example, there are others.

3. A lot of important tech companies have teams that do AI and AI-type things, at least using the modern definition of AI (Google, Facebook, Twitter, LinkedIn, Netflix, Amazon, Microsoft, eBay, even Apple with its Siri acquisition; there are others). This is not to mention people using this stuff in other areas, like finance and bioinformatics. These groups sometimes talk about what they're working on, so you can check this out.


Advanced Linear Algebra, Roman; Linear Algebra, Hoffman & Kunze; Matrix Analysis, Horn & Johnson; Principles of Mathematical Analysis, Rudin; Real Analysis, Royden.

Miscellaneous comments:

- Reading pure abstract algebra (e.g. Dummit & Foote) isn't a good use of time if you intend to go into statistics, since it only shows up in a few very special subareas. If you decide to go into one of these areas, you can learn this later.

- More advanced books on linear algebra usually emphasize the abstract study of vector spaces and linear transformations. This is fine, but you also need to learn about matrix algebra (some of which is in that Horn & Johnson book) and basic matrix calculus, since in statistics, you'll frequently be manipulating matrix equations. The vector space stuff generally does not help with this, and this material isn't in standard linear algebra books. (Similarly, you should learn the basics of numerical linear algebra and optimization -- convex optimization in particular shows up a lot in statistics.)

- People have different opinions on books like Rudin, but you need to learn to read material like this if you're going into an area like probability. It's also more or less a de facto standard, so it is worth reading partly for that reason as well. So read Rudin/Royden (or equivalent, there are a small handful of others), but supplement them with other books if you need (e.g. 'The Way of Analysis' is the complete opposite of Rudin in writing style). It helps to read a few different books on the same topic simultaneously, anyway.

- Two books on measure-theoretic probability theory that are more readable than many of the usual suspects are "Probability with Martingales" by Williams and "A User's Guide to Measure-Theoretic Probability" by Pollard. There is also a nice book called "Probability through Problems" that develops the theory through a series of exercises.


One could say a number of things, but I'll just say this: I would suggest that you consider working for a bit in between undergrad and grad school. If you had a strong chance of getting in directly from undergrad, then it will remain so after a year and a half (when you'll need to apply), especially if you keep in mind from right now that you should keep in touch with your advisors / rec letter writers. Two years of work is reasonable; one is usually too little to get into the swing of things and three starts to get a bit long and it will become harder to get admitted.

This has a number of benefits: worthwhile non-academic experience, better sense of whether you really want to do a PhD or whatever else, usually more focus when you do go back because you have had time to reflect on what exactly you want to do and get out of it, some general maturity that comes from working rather than just being in school, less pressure in making a big decision right now, and so on. Since you aren't hell-bent on becoming a professor, it is good to see both some academia (your undergrad) and industry before jumping into a long-term thing like a PhD. It's also more comfortable applying to grad school from a job you already have rather than as an undergrad, since if you don't get in anywhere you like, you can simply stay at your job and even try again the following year. (This also all applies if you decide you just want to do an MS.)

Basically, you have to make your own decision about this, and this is a fairly simple (and productive) way to make the decision easier.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: