Hacker News new | past | comments | ask | show | jobs | submit login
Neural Networks for Machine Learning (coursera.org)
121 points by ameasure on Oct 3, 2012 | hide | past | favorite | 56 comments



I'm in the middle of the machine learning coursera course, and registered for this one as well due to interest in the material.

My one complaint is that the programming assignments weren't interesting at all. The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes. For someone who understands the basics of linear algebra and programming, it was just a syntax challenge, and that got irritating after a bit so I stopped doing them.

I won't get the certificate for completing the course, but I have a few extra hours of free time each week to add this second course, so I'm happy. I doubt that the actual homework that Stanford students taking this course get is so easy and repetitive, though, and I'm positive they wouldn't complain about not getting to retake quizzes after getting poor grades.

Not to knock the course. I've learned a lot and the professor (Andrew Ng) does a good job.


I've taken both, and the code is in fact not that much simpler than it was in the original class. There are, however, two huge differences: the algorithm is spoon-fed to you, and there is no math.

Firstly, think about how much more difficult the assignments would be if, for example, the steps weren't broken out and we didn't get any advice on how to vectorize. Of course, it would still be short work for anyone who (a) knows Matlab/Octave and/or (b) understands the material well, but it would also be an order of magnitude harder.

Secondly - and this is by far the larger point - the original CS 229 was really about math; the programming assignments were more of an afterthought. The lectures and homework mainly focused on the theoretical derivations and corollaries of the math that led to the algorithms. Once you'd done your bit on the math and cried to your classmates and the TA about it, you could go and implement the beautiful and extremely succinct result in Matlab.

As for my perspective on the difference, I believe it is a deliberate choice made with full knowledge of the difficulty drop. For starters, there are (with regards to homework help) no TAs in this course, so the absolute difficulty would have to decline to create an equivalent experience. More significantly, the enrollment has increased by a factor of about 700. If Stanford students had trouble with the original, you can bet that the median student in the course doesn't find it as easy as either of us does. If the goal is to generate the greatest benefit for the most people, and delivering the algorithms with a good intuition on their proper use will do so, then this course has succeeded marvelously. Of course, the smartest and most dedicated students will want more, which remains available through textbooks as well as the original course handouts (http://cs229.stanford.edu/materials.html). However, I would argue that the goal of most MOOCs (massive open online courses) should be to kindle interest and foster basic understanding, both of which the Coursera version achieves.


(slightly old) lecture videos for CS 229: http://www.youtube.com/course?list=ECA89DCFA6ADACE599


Hi,

I am also taking the course by Andrew Ng and understand your complaint that the programming assignments aren't as interesting ( from your perspective). Being quite comfortable with linear algebra, I was able to complete the assignments easily.

But when I go through the course forums, I find that for many people taking the course, the intuition behind the use of linear algebra in ML doesn't come as easy as it does for us. I think when Andrew Ng designed this online course, he must have had those people in mind also. I think he mentions it at the start of the course that it's more about understanding the concepts and the implementation details should come later. The programming exercises are designed keeping that in mind, I think.

I tried to make the programming exercises interesting for myself, by first thoroughly understanding the code that they had provided and tweaking it here and there. Once you have done that, you could apply what you've learnt on real world datasets from sources like Kaggle and see how you fare :)


> My one complaint is that the programming assignments weren't interesting at all. The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes. For someone who understands the basics of linear algebra and programming, it was just a syntax challenge, and that got irritating after a bit so I stopped doing them.

I agree with this. The programming assignments I've done so far in the Machine Learning class are usually 5-7 matlab functions, many of which are about 2 lines of code (the longer ones might be ~10 lines of code). If you've ever done matlab/octave programming the assignments will take about 20-30 minutes and be completely unenlightening as you're literally just translating mathematical notation into matlab (which is, by design, already a lot like mathematical notation anyway). They provide entirely way too much skeleton code to learn anything from if you're not actively trying to learn. If I weren't already mostly familiar with most of the material presented in the class, I imagine I would never retain knowledge of how the machine learning "pipeline" worked or have any high-level understanding of the algorithms, because the assignments just require you to implement the mathematical pieces of each step, without ever asking you to, for example, actually call any optimization routines, or put the pipeline together.

The problem, I think, is that it would just be too difficult to do automatic grading in a way that is reasonably possible to pass if they don't turn most of the work into skeleton code. Since the automatic grading needs nearly exactly matching results, one minor implementation difference in a perfectly good implementation of the algorithm itself (e.g., picking a single parameter incorrectly, picking the optimization termination conditions incorrectly, choosing a different train/dev split, etc.) would make the entire solution completely wrong.


I'm doing the Computational Finance class via Coursera at the moment, and I've done a number of other courses previously.

I agree the programming assignments in the Finance class tend to be too simple. Most of the code is literally handed to you, you just have to understand it well enough to change it. I also understand that even that can be a major challenge if you don't have the background for it.

But I'm choosing to see the class itself as a starting point. It's a framework for my own explorations into the topics. I can do the minimum and get the minimum out of it. Or I can use what's provided as a base and go further.

The Coursera Algorithms class, for example. Writing code that got the answer was relatively easy, so once that step was done it became about optimizing the code for my own learning benefit.

It's like any educational process, you get our what you put in.


Right, you can get more out of the assignments if you try, but to me the purpose of assignments (versus passive learning - lectures, reading, etc.) is to force your brain to synthesize rather than just comprehend. The ideal assignment, then, is one that forces you to synthesize as many concepts it intends to teach as possible.

Just like you could go back and implement for yourself the skeleton code they handed you, you could also go out and implement everything in the lectures without any assignments at all. It's just that, like you said, the assignments provide a useful starting point. And I'm only saying they could be even more useful by requiring you to implement more of the complete pipeline.

The fact that an incredibly self-motivated person could learn everything there is to know about machine learning with the course as a starting point doesn't mean that it's bad to make the course more useful for a somewhat lazier or less interested person.


I've noticed this to be the case too with other courses. So for this one I've decided to implement everything in Scala (I'm currently taking the functional programming course as well. This will work well since this machine learning course requires no code submission and just questions about the results


I thought about doing it in scala too but I thought there might be issues with grading, do you know if there's an auto-grader for this course?


Try the Learning From Data course : http://work.caltech.edu/telecourse.html A Fall run has just started (on the 2nd of oct.).

It's the same version as the course given at CalTech and is more in-depth than Andrew Ng's. There is no skeleton code for the programming assignments, answers are made through quizzes. I took the summer session and learned a lot from it.


Great. Do you get a completion certificate at the end of course?


I took the course in the spring, and found it interestng, and the programmin assignments fairly easy. This summer I took the ML course that Caltech offered, which was significantly more challenging (the homework assignments were multiple choice, but they often required writing substantial code, without any starter code.) The Caltech course is now available on iTunes U...


I took CS229 here at Stanford and I was also one of the TAs for the online version last year (I was one of 2.5 people involved with making the programming assignments).

First, the Stanford CS229 version is definitely much more difficult than what you guys had online. The focus in the actual class was on the math, derivations and proofs. The homeworks sometimes got quite tricky and took a group of us PhD students usually about 2 days to complete. There was some programming in the class but it was not auto-graded so usually we produced plots, printed them out, attached the code and had it all graded by TAs for correctness. The code we wrote was largely written without starter code and I do believe you learn more this way.

An online version of the class comes with several challenges. First, you have to largely resort to quizzes to test students (instead of marking proofs, derivations, math). There is also no trivial way to autograde resulting plots, so everything has to be more controlled, standardized and therefore include more skeleton code. But even having said all that, Andrew was tightly involved with the entire course design and he had a specific level of difficulty in mind. He wanted us to babysit the students a little and he explicitly approved every assignment before we pushed it out. In short, the intent was to reach as many people as possible (after all, scaling up education is the goal here) while giving a good flavor of applied Machine Learning.

I guess what I mean is that you have more experience than the target audience that the class was intended for and I hope they can put up more advanced classes once some basics are covered (Daphne Koller's PGM class is a step in this direction). But there are still challenges with the online classes model. Do you have ideas on how one can go beyond quizzes, or how one can scale down on the skeleton code while retaining (or indeed, increasing) the scale at which the course is taught?


I think peer-graded assignments might do the job. I am taking Gamification course on Coursera right now, and I liked peer-graded assignments a lot.

If there would be peer-graded assignments in machine learning course, I would definitely have tried them out.


> The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes.

Right; I agree. I'm not sure how they would go about making it more challenging though. They can't expect us to go out and collect data ourselves, after all. I suppose they could give us the data, then expect us to code the setup and algorithms up ourselves, but that, too, would become repetitive after a few assignments.

> Not to knock the course. I've learned a lot and the professor (Andrew Ng) does a good job.

Agreed once again. I knew nothing about machine learning before starting; now I know about neural networks, SVMs, and PCM. It's really cool how much I've learned already, for free, too!

I've also signed up for this course, but the quizzes really aren't up to par. As an example: the first quiz question was about training a neural network with too much data, and about whether or not said network would be able to generalize to new test cases. Overfitting neural networks wasn't even mentioned in the lectures; I had to rely on material from Andrew's class to answer the question correctly. This chasm between the lectures and the quizzes is likely because Geoffrey is the one creating the video lectures, but he's not the one creating the quiz questions; he is having TAs do it [1].

Nevertheless, it looks like they're responding to feedback, so hopefully it'll get better with time.

1. https://class.coursera.org/neuralnets-2012-001/wiki/view?pag...


(PCM) -> Do you mean PCA (Pricipal Component Analysis)?


I'm positive they wouldn't complain about not getting to retake quizzes after getting poor grades.

My experience is that students everywhere complain about grading. I've never been to Stanford, but I've attended and worked at several other top tier universities.


Hinton is a huge figure in the neural network literature and an important researcher in deep learning. After going through the first week of lectures, I can say he's also an excellent teacher.

The syllabus, draft though it is, indicates the second half of the class will focus on deep learning, a field of machine learning that has demonstrated huge potential.


Just browsing through the Coursera Computer Science listings, it looks like they are rapidly approaching the point where you could put together a CS curriculum superior to what you could get at any single school. The people they have teaching a lot of these topics are some of the best in the world in their field. The Micahel Collins NLP course looks really thorough and up to date, for example I took a similar course a few years ago, and I remember reading papers written by him.

As has been said by many already, of course, the remaining nuts to crack are high quality interaction with other students, professors, and TAs; and accreditation.

But the dis-intermediation of large universities may be nearer than we think.


An attempt to design a reasonable computer science curriculum using just Coursera courses, where “reasonable” is a curriculum that roughly mirrors the coursework required for a four-year university computer science degree: http://www.thesimplelogic.com/2012/09/24/you-say-you-want-an...


the only real problem with coursera is everyone is posting their solutions to github, so its gonna be impossible for them to prevent cheating. i agree with you though, that the flexibility it is offering is amazing


There is an interesting practical question here. Why cheat?

If you are taking a class voluntarily over the Internet, what benefit would be gained by cheating? I presume that a large fraction of people who are doing volunteer coursework are doing it to learn, not to keep a GPA up for some other reason (sports eligibility, scholarship requirements, parental expectations, Etc.) so looking at other solutions on Github might actually enhance the experience for you if you look at other solutions. If you find a way to do it better than the other solutions that could be a goal in itself.

This is one of those things I find most intriguing about 'free' classes on the Internet, the value equation is shifted around.


It depends on the purpose of your education. In an ideal world, it would be just to learn, but I think employers at some level look at grades/school as a qualification process.


I see where you are coming from but were I interviewing you I would never even think to wonder about a self reported grade in a volunteer class. If the topic was important to the position I'd ask you to talk about it and tell me what you learned. I would hope I could spot you trying to feed me a line.

At the end of the day, as an employer, I am looking for 'learners' not 'cheaters.' If it turns out that an employee's personality/choices lean toward the cheating side I try to manage them out of the organization as smoothly as I can.


If it were a traditional class at a traditional school you'd just assume they actually learned it?


Hmm, that is a fair question. I think I would give more weight to a class if they took it when they didn't have to, rather than having taken is a part of a requirement for a degree.


if you are never going to tell future employers, etc that you took the courses, there is no reason to cheat. if you plan on adding these things to your resume, with numbers associated with them, there is definitely an incentive to cheat. most people on the site right now are there to just learn, but coursera is hoping the ladder will eventually happen. it is a catch-22 ... sort of


people are already complaining that you can only take the quizes once ... he had to send out an email today to everyone saying:

"Many of you are unhappy with only being allowed to attempt a quiz once. Starting in week two, we have therefore decided to make up twice as many questions and to allow you to do each quiz twice if you want to. The second time you try it the questions will all be different. Your score will be the maximum of your two scores. For week one, the quizzes will remain as they are now.

Many of you would like the names of the videos to be more informative. We will change the names to indicate the content and the duration.

Some of you thought that some of the quiz questions were too vague. We will try to make future questions less vague.

Some of you are unhappy that we do not have the resources to support Python for the programming assignments. We sympathize with you and would do it if we could. You are still welcome to use Python (or any other language) if you can port the octave starter code to your preferred language. We have no objection to people sharing the ported versions of the starter code (but only the starter code!). However, if you get starter code in another language from someone else, you are responsible for making sure it does not contain bugs."

I thought that was pretty funny!


Yeap we got spoiled with earlier classes: Algorithms by Tim Roughgarden, Machine Learning by Andrew Ng, and many more. We probably need to follow a class on gratitude.

Oh well, to be fair I would donate quite a lot for each course that I enjoyed.


Actually all the entitled bitching and moaning on the ML class forum was by far the biggest turnoff of the whole experience for me. I was much happier after ignoring it and my "classmates" entirely.


The only course that is not significantly diluted is Koller's PGM. All others have been dumbed down to a degree where they provide no challenge to the courseree at all.


It is not such a huge problem when you take several courses at once. Sadly they run them only twice a year, each time I try to follow as many as possible. I cannot follow PGM because it requires too much of my time, I'd have to abandon 2 or 3 other courses. YMMV.


I'm looking at the same problem at the moment. PGM sounds really interesting, but I think that the time investment just isn't going to be workable for me unless I drop several of my other classes. My current plan is to watch the PGM videos and try to keep up with the programming assignments as long as I can, but if it comes down to a choice of one or the other, PGM will be the one to go.

As far as "dumbing down", I've found that the Coursera classes that I've taken (Compilers, Automata Theory, Algorithms 1, SaaS and Machine Learning) have varied in difficulty quite widely. Compilers and Automata were both challenging and enjoyable, Algorithms 1 was about what I'd expect from a freshman/sophomore algorithms class and SaaS and Machine Learning were easy enough that they should be approachable to anyone with basic programming experience.

I don't feel that the difficulty in the classes that I've taken had any particular correlation with teaching effectiveness. I found Andrew Ng's ML class to be simple, but still interesting and informative - you come out of it with enough of a basic understanding to implement simple ML techniques as well as a place to start if you wish to learn more. I think that while a theory-centric class would be a nice thing to have, he's done an amazing job of making a class that can appeal to a wide range of potential students and introduce them to a field that's usually very difficult to approach.


> Neural Networks are gradually taking over from simpler Machine Learning methods

And haven't SVMs and such gradually taken over from Neural Networks?


And RandomForests taken over from SVMs ;)

In seriousness when you look around at what's happening both in practice and in academia I would say RandomForests/SVM/Neural Networks all stand pretty equally and have different strengths. If you've just got rows and rows of data with numeric, categorical and missing values it's hard to beat the speed and quality of shoving it in a RandomForest. However to my knowledge SVMs are still better at solving NLP categorization tasks and handling sparse, high dimensional data. And Neural Networks always seem to be popping up solving very weird and/or hard problems.


Well not quite. While SVMs gained a lot of popularity for having nice properties e.g.

  1) a convex problem which means a unique solution and a lot of already existing technology can be used
  2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations
  3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)
There is an ongoing craze about deep belief networks developed by Hinton (who is teaching this course) who came up with an algorithm that can train them (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular

  1) They seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.
  2) They can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlaballed datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.
 3) They're "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.


Enlightening response, could you please post links to papers that explain online training of SVM?

Also, I found this paper [1] on unsupervised feature detection, if you have some additional material, I'll really appreciate if you could post it!

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44....


The way it's worded is not 100% clear. Hinton, who is an excellent lecturer and explainer, is talking about neural nets trained with "deep learning" techniques (not vanilla single-hidden-layer nets), which have had striking success at hard vision problems that have been difficult to solve top-to-bottom with SVMs (e.g., you could get good performance from an SVM, but you'd have to go on a hunt for good low-level features first).

That said, there is a rather unhelpful herd mentality in the field, with people moving from one Next Big Thing to another, disparaging the previous Big Thing along the way.


Here's the problem: There is no silver bullet in Machine Learning and many of these approaches (SVMs, Neural Nets, Random Forests, PGMs, etc.) have their pros and cons that depend on many variables, for example:

- How much data do you have wrt dimensionality?

- How "easy" do you suspect your problem to be? Is it likely linearly separable? Equivalently, how good are your features?

- Do you have many mixed data? Missing data? Categorical/Binary data mixed in? (Better use Forest, perhaps!)

- Do you need training to be very fast?

- Do you need testing to be very fast on new out of sample data?

- Do you need a space-efficient implementation?

- Would you prefer a fixed-size (parametric) model?

- Do you want to train the algorithm online as the data "streams" in?

- Do you want confidences or probabilities about your final predictions?

- How interpretable do you want your final model to be?

etc. etc. etc. Therefore, it doesn't make any sense to talk about one method being better than another.

One thing I will say is that, as far as I am aware, Neural Nets have a fair amount of success in academia (which should be taken with a grain of salt!), but I haven't seen them win too many Kaggle competitions, or other similar real-world problems. SVMs or Random Forests have largely become the weapon of choice here.

Neural Nets do happen to be very good when you have a LOT of data in relatively low-dimensional spaces. Many tasks, such as word recognition in audio or aspects of vision fall into this category and Google/Microsoft and others have incorporated them into their pipelines (which is much more revealing than a few papers showing higher bars for Neural Networks). In these scenarios, Neural nets will parametrically "memorize" the right answers for all inputs, so you don't have to keep the original data around, only the weighted connections.

Anyway, I wrote a smaller (and related) rant on this topic on G+: https://plus.google.com/100209651993563042175/posts/4FtyNBN5...


Well not quite. While SVMs gained a lot of popularity for having nice properties e.g.

1) a convex problem which means a unique solution and a lot of already existing technology can be used

2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations

3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)

There is an ongoing craze about deep belief networks developed by Hinton et al. (who is teaching this course) who came up with an algorithm that can train them reasonably well (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular

1) they seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.

2) DBNs can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlabelled datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.

3) DBS are "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.


"SVMs. . .3)can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)"

Please do. I want to read some about SVMs since i haven't heard that much about them.


I am not an expert in SVMs, but I consider myself fairly experienced in machine learning. In my professional experience the answer to your question is 'not quite'. SVMs have solved some problems very well, but I've had issues with them:

1. They are only for classification, not every problem is classification. The other big category is regression, for example predicting the sale price of a home rather than predicting a binary "will it sell"

2. They don't have a natural probabilistic interpretation for classification. Neural networks for classification (with a logistic activation function) are trained to predict a probability, not make a simple binary decision. In practice this probability is usually very useful, although I believe SVMs have been modified to give some kind of probability.

3. I have had a tough time getting them to run quickly. Linear kernel SVMs are fast, but aren't powerful. More complex kernels are more powerful but can be very slow on moderately large datasets.


SVMs are very much used for regression as well:

http://scikit-learn.org/stable/modules/svm.html#regression

Note: the scikit-learn implementation of SVMs is based on libsvm:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/


Interesting, a quick glance at a paper on SVRs indicate they kind of work in the opposite manner of a SVM - in an SVM you try to maximize the number of points far away from the separator (taking into account class), whereas in regression you are trying to minimize this.

Do you have much background using them? I'm curious how they perform on real-world tasks.


Yeah, there's the SVR "pipe" concept, where you attempt to fit the margin s.t. points are close to it. It's a great alternate use of SVM's obj. function optimization.

I haven't really used SVRs aside from some exploratory work, so I can't speak too much about them. But I know they exist!


for 1. you can definitely modify an SVM to be used for regression, as far as I know most standard SVM libraries have support for regression, and I have personally used them very successfully for this task. [0]

2. There are actually ways you can modify the output of an SVM to give a probabilistic interpretation[1]. But I'll agree with the not having a 'natural' probabilistic interpretation.

3. Is definitely correct, but I'm not sure NNs are that much better.

[0] http://www.svms.org/regression/

[1] http://www.cs.colorado.edu/~mozer/Teaching/syllabi/6622/pape...


I took Professor Hinton's course on Neural Networks as an undergrad. This man is the most intelligent person I have ever met. He is one of the giants.


Did it already start? Is it too late to start?

Also I took a nn class in college so do you think I would get much more out of this?


It just started on Monday, there's plenty of time to join in.

There have been some huge developments in neural networks in the last few years, particularly with respect to deep learning. If you missed out on that you might want to try this class. Hinton has been involved in many of these advances.

The second half of the course appears to focus on deep learning topics so you might want to start there if you already know the basics.


you cant start mid-way though ... right ?


You'll have to wait until those lectures are made available, but you don't have to complete the previous work to see the lectures.


I tried to do a couple coursera courses and found the video lectures highly inefficient; very needlessly time consuming, even watching them sped up. All I really want is a glorified text book with quiz grading and a final.


It sounds terribly privileged to say so, but I'm afraid I have to agree. Also, quite often the quizzes are directly based on the videos ("What did line A represent in ~ graph?"), while I find I self learn better through reading.


It depends on how much you value your time. Lectures usually are shorter than 2 hours per week. (There are some that have longer videos, but I think more than 2 hours is suboptimal.) I know that before courses I was wasting this time on hacker news or reddit, so I don't value my time that much. On the other hand I do now, and that's because I need to watch the lectures and do the home work. And really these lectures perform the same role in the learning process as the real lectures. You could graduate from university only with text books, but you might not get some insight that lecturers have.

My 0.02 chf.


this. No offense to these professors, but what are they presenting in their video lectures that I can't garner from their writing?


Humanity. Which, believe it or not, makes a huge difference in learning subjects.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: