Leaf – Machine Learning for Hackers

IshKebab · on April 25, 2016

There's so many of these, you'll have to put more of an explanation of what makes this different, other than "it's for hackers, not scientists" (whatever that means).

Also it seems to use the standard method of stacking layers, rather than allowing you to describe an arbitrary computational graph which seems (to me) to be the far superior method (see CNTK's Network Description Language).

mjhirn · on April 25, 2016

I think CNTK and Tensorflow and Theano, too, have a declarative approach, representing the computation via a computational graph. Which in my opinion is beneficial for research. But for a hacker or a software developer who wants to build an application, this creates an unnecessarily steep learning curve and feels unintuitive. (I have the feeling, that this an important reason why Keras, Lasagne and co. exists)

Leaf takes an imperative approach and explores an easier API (only Layers (Functions)[1] and Solvers (Optimizer Algorithms)), reusability through modularity and abstractions that keep the implementation and concepts to a minimum or rather abstractions that feel as familiar to a hacker as possible.

For future versions e.g., we want to explore what is practically possible with auto-differentiation via dual numbers and differentiable programming.

[1]: http://autumnai.com/leaf/book/deep-learning-glossary.html#La...

brudgers · on April 25, 2016

My understanding is that Google developed Tensorflow to provide a single pipeline between the data science model and a production system. The idea is to avoid a translation step between modeling the real world and implementing it in production.

Of course, most people don't have the resources of Google with a layer of data scientists and another layer of software engineers [and maybe a layer of data engineers in the mix too]. So the idea of a tool tailored to a small team's needs rather than those of Google seems like an interesting niche.

amenod · on April 25, 2016

There already is an explanation:

> Leaf is a Machine Intelligence Framework engineered by hackers, not scientists. It has a very simple API...

That is quite a diferentiator.

amenod · on April 25, 2016

A bit off-topic: does anyone know a good resource about when to use neural networks, when genetic algorithms, when bayesian networks,...? I know the basics of some of these algorithms and I could implement them (with some googling), but I wouldn't know which one to choose for a real-world application. Some kind of overview of strength and weaknesses of different AI approaches?

jamessb · on April 25, 2016

There are various cheat-sheets/decision trees to suggest which machine learning method e.g.

http://www.saedsayad.com/data_mining_map.htm

http://peekaboo-vision.blogspot.co.uk/2013/01/machine-learni...

https://azure.microsoft.com/en-us/documentation/articles/mac...

However, if you want to really understand how things fit together you're probably best reading one of the standard intro textbooks: Murphy's Machine Learning, Bishop's Pattern recognition and machine learning, Hastie et al's The Elements of Statistical Learning, or Wasserman's All of statistics.

Or Barber's textbook, which is freely available online and has some nice mind-maps/concept-maps/trees at the start of each section: http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...

danhardman · on April 25, 2016

Straight off the bat you can split machine learning algorithms into whether or not you need to be able to see how a decision has been made. For example, Neural Networks are probably really good at diagnosing patients, however due to them being blackbox, they require a great deal of trust (in real life). Whereas decision trees could show the path taken to make a disagnosis.

abhgh · on April 25, 2016

That particular advantage with Decision Trees is lost when its part of an ensemble classifier. Which is unfortunate, since their performance is more reliable in these setups.

Houshalter · on April 25, 2016

That issue is only true for legal reasons. And even then it's entirely speculation what an actual court would decide.

If you actually cared about your patients, then you would use whatever method has the highest accuracy. False predictions mean injury or death. Using a suboptimal method means people die.

The best of both worlds is to use the whatever model gets the best predictions. Then train another model which is understandable on the output of the first one. I.e. generate random data, see what predictions the good model makes. Then the understandable model has infinite data to train with and doesn't need to worry about overfitting.

But still, the utility of being able to understand the model is limited. It's just a big set of parameters, without any reasoning or explanation of why the parameters are what they are.

closed · on April 25, 2016

Your account seems to assume that people will faithfully adhere to the suggestions of any model. In reality, not only have statisticians had a hard time accepting more prediction-centered approaches [1, see comments at end], but these approaches may need to win over practitioners and lay-people in the field in which they are being applied (e.g. how much do doctors value prediction over interpretable parameters).

I like machine learning, and prediction centered approaches--but there are many factors (such as adherence both by doctors and their patients) that are important, here. In a sense, the model needs to take into account "model type" into its predictions, which could lead to a model that predicts disease treatments well, but believes it should not be used!

[1]: http://projecteuclid.org/euclid.ss/1009213726

gtani · on April 25, 2016

These might help, but there's always bias, as when somebody's benchmarking 6 dev languages, only 2 of which they know well.

https://www.reddit.com/r/MachineLearning/comments/4fl6kk/try...

http://www.kdnuggets.com/2016/04/deep-learning-vs-svm-random...

https://www.quora.com/Which-classifier-is-better-random-fore...

http://jmlr.csail.mit.edu/papers/volume15/delgado14a/delgado...

ralfd · on April 26, 2016

On the (admittedly sciency) machinelearning subreddit I read a while ago that no one is using genetic algorithms anymore. So for a game it would be probably ok, but not for anything else. Just use neural networks.

robbles · on April 25, 2016

Machine Learning for _Rust_ Hackers. I had to read one page in to check, but it looks like it's not a cross-language framework - you need to be fluent in Rust to use this.

njade · on April 25, 2016

Cool work! Kudos. Will try this. FYI - I have been doing a similar project for evolutionary algorithms using Erlang OTP/Elixir. I plug `Collenchyma` (part of Autumn Architecture) to do number-crunching and computationally intensive tasks. I'm curious about how >distributed optimization of networks is done. :) (y)

rrggrr · on April 25, 2016

This is not the simplified abstraction I'm looking for. Give me an ML library that exposes the API using metaphors I can understand a relate to and I'm all over it. Leaf is still using terms and concepts I don't understand.

mjhirn · on April 25, 2016

We think it is more beneficial for collaboration if we stick to the common naming of layers, functions and concepts than using metaphors. We provided two links, which help to get you started with that.

But with Leaf it becomes very easy to create modules (Rust crates) that expose layers/networks/concepts, which can have a metaphorical name.

philtar · on April 25, 2016

Why stop there? Let's make it program itself and do whatever you want it to do just by looking at it.

friendzis · on April 25, 2016

A bit out of topic, but has there been any research done on learn-time topology optimisations? E.g. Start with 4-4-4-4 network, remove some little contributing nodes/connections during training, resulting in 4-3-2-4 network with similar accuracy as 4x4, but higher performance?

Intuitively I think that there should be connections activated for less than x% of inputs that could be removed entirely. In some cases this would mean removal of whole node. Would interesting to read something about such an approach.

Houshalter · on April 25, 2016

That's a basic part of regularization! Weights are "penalized" for being too big, and encouraged to be as small as possible. This results in most of the weights being zero, which is the same as them not existing at all.

There is some research on pruning networks after training, removing small weights and nodes that don't contribute much. This results in much smaller neural networks, which run better on smartphones or embedded devices.

This isn't done often though because on GPUs it doesn't result in higher performance. Because of the way SIMD instructions work or something, there's no way to take advantage of sparser neural networks. A synapse of weight 0 still requires a multiplication by zero, and adding the zero to the sum.

eggy · on April 25, 2016

TWEANNS - Topology and Weight Evolving Neural Networks. Gene Sher's book 'The Handbook of Neuroevolution Through Erlang' presents them, and codes them in Erlang. People have coded them in Elixir, and I am currently trying in LFE (Lisp Flavored Erlang). They can add neurons and connections or take them away based upon a GA approach (Genetic Algorithms). A big book, but that is because it covers the material thoroughly, and with lots of explanations. I have been reading NN books since the early 90s, and this one would have been a great format back then.

argonaut · on April 25, 2016

I don't think this has been tried. What would be the heuristic for removing connections/nodes? It does seem to me, though, that the heuristic should be "learn from the data using backprop" and that you're essentially calling for L1 regularization (which encourages sparsity), or L2 regularization (which encourages small weights), which are already pretty standard in machine learning.

karterk · on April 25, 2016

Something like this?

http://arxiv.org/abs/1603.09382v1

argonaut · on April 25, 2016

Stochastic depth is just a form of regularization over layers, it doesn't save any parameters or remove any connections.

mining____ · on April 25, 2016

rtNEAT / NEAT is vaguely like this. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28....

JustFinishedBSG · on April 25, 2016

This isn't a machine learning library, it's a NN library, that's way more restricted

mjhirn · on April 25, 2016

Well, it has a bias towards NN and deep learning for now, as this was our initial focus for the proof of concept, but the architecture of Layers and Solvers should allow it to express any machine learning concept/algorithm. We are actually working on it (verifying it) with James from rusty-machine[1][2].

[1]: https://github.com/AtheMathmo/rusty-machine [2]: https://gitter.im/AtheMathmo/rusty-machine

dfsegoat · on April 25, 2016

Any relation to the O'Reilly book of the same name? http://shop.oreilly.com/product/0636920018483.do

Otherwise, this is a bit confusing.

akadien · on April 25, 2016

It seems like this project is re-discovered on HN every few months.

typeforce · on April 25, 2016

A bit off-topic: Does anyone know how this sort of documentation was built? Looking for something similar for a project. Would love to know.

mjhirn · on April 25, 2016

We built it with mdBook (https://github.com/azerupi/mdBook), which describes itself saying "Like Gitbook but implemented in Rust"

typeforce · on April 25, 2016

Thanks :)

esmevane · on April 25, 2016

I believe it's Gitbook - https://github.com/GitbookIO/gitbook

typeforce · on April 25, 2016

Thanks, that was the closest thing I could think of. Wasn't sure!

andersonmvd · on April 25, 2016

hackers != software engineers. The meaning for 'hackers' is completely different when you check the hacking/infosec community.

andersonmvd · on April 25, 2016

The thing is that as hacker is a cool word, people tend to become cool by bending reality. It's the same as renaming Ecmascript to Javascript because Java is cool. However I do believe that yes, having curiosity and passion to modify the inner working of things could be called hacking, the banalization of the term is something sad. Like a 'project for hackers'. Sorry, but you don't need to be a hacker to use it. If this project be necessary to accomplish something at your work, you'll need to learn the minimum, thus not configuring a 'hacker' way of doing it. Of course is somewhat hard to expect someone to agree with me, as I am at 'Hacker News' right now, hehe.

collyw · on April 25, 2016

There are so many ways to describe "what I do". Hacker could be one, Software Engineer is my preferred one at present (as it implies a level of pragmatism I often see missing in the resume driven development crowd). Programmer, Systems Engineer. Database guy. Data Architect.

I write / use software to solve problems.

lgessler · on April 25, 2016

cf. https://news.ycombinator.com/item?id=11563244

Ninn · on April 25, 2016

I don't see how this is for "Hackers"? It has nothing about security applications.

monsieurbanana · on April 25, 2016

For most people nowadays the word hacker means a guy who pirates stuff and steals your facebook accounts, but the original meaning is just a clever programmer.

scoot · on April 25, 2016

Visits Hacker News, pretends not know what 'hacker' means in this context. Nice try OP.

collyw · on April 25, 2016

"For most people nowadays the word hacker means"

He isn't talking about HN readers but Joe Public. A few days ago our head of creative strategies passed my machine when I was doing a load of work in a terminal. He asked what I was doing and if I was a hacker. (I told him it depended what you define as a hacker).

weego · on April 25, 2016

Lets ignore how you actually don't know what the guy was saying and are instead projecting, a NN library with fairly rough documentation that's hosted on github is relevant to Joe Public how?

collyw · on April 25, 2016

I have no idea what your comment means. Apparently I am projecting, but beyond that it makes no sense to me.

majewsky · on April 25, 2016

See also the definition of "hacker" in the Jargon File: http://www.catb.org/jargon/html/H/hacker.html

nxzero · on April 25, 2016

In pop culture, hacker is someone who breaks into computer systems.

For better understanding of the term, see "hacker culture":

https://en.m.wikipedia.org/wiki/Hacker_culture

datashaman · on April 25, 2016

the term shifts over time.