That aside, you really have to take it in bits. Ignoring the math or fundamentals behind it is by far the worst mistake you can make.
Once you get decent at understanding it, the points I emphasized (feature vector building) become a lot less of a problem with deep learning (http://deeplearning.net/).
Auto learned feature vectors are going to be among the best ways to do things in the coming years. More than happy to answer questions.
I got into machine learning through an article off of HN stating that Random forests would get you 80% of the way (I think they were right!) For my purposes rotation forest increased my accuracy considerably.
I have a few questions:
1. I have found that data manipulation and feature creation from a SQL database is harder than the actually using an algorithm, and knowing how to extract and aggregate data seemed to be more like "throw something at the wall and see what sticks" Do you have any suggestions or information on knowing how to extract the best data?
2. After getting a random forest going, I had a hard time figuring out which algorithm to try next, or how to figure out what would work best for my dataset. Any suggestions on how to take the next step?
1. Use what correlates best with the outcomes. Look in to feature selection and principal component analysis for this. This will cause less noise due to smaller feature vectors. It also allows more digestable outcomes. I would also highly reccomend visualization. Weka is great if you want plug and play; otherwise there's the more traditional R/matlab. It really depends on what you're comfortable with.
2 . Depends what kind of learning you're doing. I would look in to multinomial logistic regression for most applications (more than one class) for supervised classification. Then there's also k means if you're looking to understand trends in your data. Keep in mind this is my off the shelf/simple recommendation.
I would love input on a plug and play machine learning CLI. I planned on building out my current project in to a full blown command line app. Since it can handle most features including automatic visualization/debugging via matplolib I figure with some documentation it might be a neat tool for people who don't want to deal with feature selection but still want things simple. It's definitely a problem that there's really no clear way to build simple models. Domain knowledge is also an expensive problem.
That aside, you really have to take it in bits. Ignoring the math or fundamentals behind it is by far the worst mistake you can make.
Once you get decent at understanding it, the points I emphasized (feature vector building) become a lot less of a problem with deep learning (http://deeplearning.net/).
Auto learned feature vectors are going to be among the best ways to do things in the coming years. More than happy to answer questions.