I worked with this a bit at UMass, it's not bad at all. Also from UMass, be sure to check out Factorie the probabilistic factor graph framework in Scala.
I am currently using this toolkit and I must say that I really like it.
The main advantages of mallet over weka (the main java toolkit used in academic machine learning) for Natural Language Processing are:
- No need to map words and features to position in a feature vector yourself.
- Instances preprocessing can be defined in pipes that can be saved along the models. So no need to remember the pre-processing of data for each experiments.
- Contains algorithms for structured learning (CRF, HMM and general graphic models).
On the other hand, Mallet implements less algorithm (e.g. no Support Vector Machines to my knowledge).
In short, it is a nice toolkit to be aware of if you are planning to do Natural Language Processing.
Definitely depends on your math background (knowledge of analysis and linear algebra seems to be particularly helpful).
Witten's Data Mining is a very good beginner's book (has very little math, but lots of good explanations and discussion of real life issues).
Bishop's book is excellent, but it's easy to get lost if you don't have the mathematical background.
Duda, Hart & Stork's Patten Recognition book is also very well organized and has one of the best first chapters in any machine learning book. But it too requires mathematical background to be fully appreciated.
Hastie & Tibshirani's book is written by people from a statistical background, and is very very mathematical. I haven't progressed beyond Chapter 2, and I'm working on improving my math skills before I get back into it.
--
For NLP, a very good intro is the NLTK book.
Jurafsky & Martin's book covers more NLP topics, but Manning and Schutze cover statistical portions in depth. I think you should just read both :D.
Simple things, like actually spelling out the steps of a derivation vs. "Here's equation a, from which we trivially derive equation b" but it might take you a while to figure out the steps if you haven't done a lot of Calculus problems recently.
And Mitchell just seems really good at explaining things (having heard him speak in person a few times).
If you later want to drill down more into reinforcement learning specifically, check out "Reinforcement Learning:
An Introduction" by Sutton & Barto, full version available online in HTML form:
I'm currenty reading 'Machine Learning, an algorithmic perspective' by Marsland. He uses Python and assumes no higher math knowledge. I would recommend it to get a basic and intuitive understanding. He explains things in English, then presents the math notation, then explains the math notation in English again. Then, he provides Python code, and explains the Python code in English. Radical, I know.
With the primary difference being web frameworks are much better documented with real world examples because they are written by people in industry trying to make their real jobs easier, not university grad programs. The examples released by users never make it onto the search engines, if ever released (see below).
Most ML frameworks out there are stuck in academic-land and assume the users are experts - when the exact opposite is usually true - they actually attempt to use the most opaque language possible when describing usage.
ML is still a consulting gold mine because it's so difficult to wade past the jargon and bullshit to actually do something useful/profitable with these frameworks.
http://alias-i.com/lingpipe/index.html
http://gate.ac.uk/
http://rapid-i.com/content/view/181/190/
http://elefant.developer.nicta.com.au/
(tanagra, weka, orange, (depending on what what you're looking for) )