There's so many of these, you'll have to put more of an explanation of what makes this different, other than "it's for hackers, not scientists" (whatever that means).
Also it seems to use the standard method of stacking layers, rather than allowing you to describe an arbitrary computational graph which seems (to me) to be the far superior method (see CNTK's Network Description Language).
I think CNTK and Tensorflow and Theano, too, have a declarative approach, representing the computation via a computational graph. Which in my opinion is beneficial for research. But for a hacker or a software developer who wants to build an application, this creates an unnecessarily steep learning curve and feels unintuitive. (I have the feeling, that this an important reason why Keras, Lasagne and co. exists)
Leaf takes an imperative approach and explores an easier API (only Layers (Functions)[1] and Solvers (Optimizer Algorithms)), reusability through modularity and abstractions that keep the implementation and concepts to a minimum or rather abstractions that feel as familiar to a hacker as possible.
For future versions e.g., we want to explore what is practically possible with auto-differentiation via dual numbers and differentiable programming.
My understanding is that Google developed Tensorflow to provide a single pipeline between the data science model and a production system. The idea is to avoid a translation step between modeling the real world and implementing it in production.
Of course, most people don't have the resources of Google with a layer of data scientists and another layer of software engineers [and maybe a layer of data engineers in the mix too]. So the idea of a tool tailored to a small team's needs rather than those of Google seems like an interesting niche.
A bit off-topic: does anyone know a good resource about when to use neural networks, when genetic algorithms, when bayesian networks,...? I know the basics of some of these algorithms and I could implement them (with some googling), but I wouldn't know which one to choose for a real-world application. Some kind of overview of strength and weaknesses of different AI approaches?
However, if you want to really understand how things fit together you're probably best reading one of the standard intro textbooks: Murphy's Machine Learning, Bishop's Pattern recognition and machine learning, Hastie et al's The Elements of Statistical Learning, or Wasserman's All of statistics.
Straight off the bat you can split machine learning algorithms into whether or not you need to be able to see how a decision has been made. For example, Neural Networks are probably really good at diagnosing patients, however due to them being blackbox, they require a great deal of trust (in real life). Whereas decision trees could show the path taken to make a disagnosis.
That particular advantage with Decision Trees is lost when its part of an ensemble classifier. Which is unfortunate, since their performance is more reliable in these setups.
That issue is only true for legal reasons. And even then it's entirely speculation what an actual court would decide.
If you actually cared about your patients, then you would use whatever method has the highest accuracy. False predictions mean injury or death. Using a suboptimal method means people die.
The best of both worlds is to use the whatever model gets the best predictions. Then train another model which is understandable on the output of the first one. I.e. generate random data, see what predictions the good model makes. Then the understandable model has infinite data to train with and doesn't need to worry about overfitting.
But still, the utility of being able to understand the model is limited. It's just a big set of parameters, without any reasoning or explanation of why the parameters are what they are.
Your account seems to assume that people will faithfully adhere to the suggestions of any model. In reality, not only have statisticians had a hard time accepting more prediction-centered approaches [1, see comments at end], but these approaches may need to win over practitioners and lay-people in the field in which they are being applied (e.g. how much do doctors value prediction over interpretable parameters).
I like machine learning, and prediction centered approaches--but there are many factors (such as adherence both by doctors and their patients) that are important, here. In a sense, the model needs to take into account "model type" into its predictions, which could lead to a model that predicts disease treatments well, but believes it should not be used!
On the (admittedly sciency) machinelearning subreddit I read a while ago that no one is using genetic algorithms anymore. So for a game it would be probably ok, but not for anything else. Just use neural networks.
Machine Learning for _Rust_ Hackers. I had to read one page in to check, but it looks like it's not a cross-language framework - you need to be fluent in Rust to use this.
Cool work! Kudos. Will try this.
FYI - I have been doing a similar project for evolutionary algorithms using Erlang OTP/Elixir. I plug `Collenchyma` (part of Autumn Architecture) to do number-crunching and computationally intensive tasks. I'm curious about how
>distributed optimization of networks
is done. :) (y)
This is not the simplified abstraction I'm looking for. Give me an ML library that exposes the API using metaphors I can understand a relate to and I'm all over it. Leaf is still using terms and concepts I don't understand.
We think it is more beneficial for collaboration if we stick to the common naming of layers, functions and concepts than using metaphors. We provided two links, which help to get you started with that.
But with Leaf it becomes very easy to create modules (Rust crates) that expose layers/networks/concepts, which can have a metaphorical name.
A bit out of topic, but has there been any research done on learn-time topology optimisations? E.g. Start with 4-4-4-4 network, remove some little contributing nodes/connections during training, resulting in 4-3-2-4 network with similar accuracy as 4x4, but higher performance?
Intuitively I think that there should be connections activated for less than x% of inputs that could be removed entirely. In some cases this would mean removal of whole node. Would interesting to read something about such an approach.
That's a basic part of regularization! Weights are "penalized" for being too big, and encouraged to be as small as possible. This results in most of the weights being zero, which is the same as them not existing at all.
There is some research on pruning networks after training, removing small weights and nodes that don't contribute much. This results in much smaller neural networks, which run better on smartphones or embedded devices.
This isn't done often though because on GPUs it doesn't result in higher performance. Because of the way SIMD instructions work or something, there's no way to take advantage of sparser neural networks. A synapse of weight 0 still requires a multiplication by zero, and adding the zero to the sum.
TWEANNS - Topology and Weight Evolving Neural Networks.
Gene Sher's book 'The Handbook of Neuroevolution Through Erlang' presents them, and codes them in Erlang. People have coded them in Elixir, and I am currently trying in LFE (Lisp Flavored Erlang). They can add neurons and connections or take them away based upon a GA approach (Genetic Algorithms). A big book, but that is because it covers the material thoroughly, and with lots of explanations. I have been reading NN books since the early 90s, and this one would have been a great format back then.
I don't think this has been tried. What would be the heuristic for removing connections/nodes? It does seem to me, though, that the heuristic should be "learn from the data using backprop" and that you're essentially calling for L1 regularization (which encourages sparsity), or L2 regularization (which encourages small weights), which are already pretty standard in machine learning.
Well, it has a bias towards NN and deep learning for now, as this was our initial focus for the proof of concept, but the architecture of Layers and Solvers should allow it to express any machine learning concept/algorithm. We are actually working on it (verifying it) with James from rusty-machine[1][2].
The thing is that as hacker is a cool word, people tend to become cool by bending reality. It's the same as renaming Ecmascript to Javascript because Java is cool. However I do believe that yes, having curiosity and passion to modify the inner working of things could be called hacking, the banalization of the term is something sad. Like a 'project for hackers'. Sorry, but you don't need to be a hacker to use it. If this project be necessary to accomplish something at your work, you'll need to learn the minimum, thus not configuring a 'hacker' way of doing it. Of course is somewhat hard to expect someone to agree with me, as I am at 'Hacker News' right now, hehe.
There are so many ways to describe "what I do". Hacker could be one, Software Engineer is my preferred one at present (as it implies a level of pragmatism I often see missing in the resume driven development crowd). Programmer, Systems Engineer. Database guy. Data Architect.
For most people nowadays the word hacker means a guy who pirates stuff and steals your facebook accounts, but the original meaning is just a clever programmer.
He isn't talking about HN readers but Joe Public. A few days ago our head of creative strategies passed my machine when I was doing a load of work in a terminal. He asked what I was doing and if I was a hacker. (I told him it depended what you define as a hacker).
Lets ignore how you actually don't know what the guy was saying and are instead projecting, a NN library with fairly rough documentation that's hosted on github is relevant to Joe Public how?
Also it seems to use the standard method of stacking layers, rather than allowing you to describe an arbitrary computational graph which seems (to me) to be the far superior method (see CNTK's Network Description Language).