Neurogenesis Deep Learning

SubiculumCode · on Dec 14, 2016

They simulate neurogenesis, I guess, but they do not incorporate the most interesting part of that neurogenesis: That is the new neurons are born into the dentate gyrus, a region thought to have a particular capacity to orthoganalize feature representations that are similar (e.g. pattern separate) allowing distinct memories to be formed for similar events. The dentate gyrus outputs to a region called Cornu ammonis 3 (CA3) which is heavily recurrent, and thought to br able to pattern complete a full representation from partial inputs. That is, CA3 can encode and retrieve the relations between 2 or more features or objects. For a mathematical model and review one might read: Rolls (2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812781/

but many others exist. I'd write more but typing in my phone is driving me to distraction.

arkymark · on Dec 14, 2016

this is really interesting, where/how did you learn this?

I'd like to learn more about these things - brain regions, connections, functions - and what they might imply about the kinds of computations that are going on, but my background is mainly on the AI/math side of things.

SubiculumCode · on Dec 14, 2016

I'd like to add that our knowledge of the details of hippocampal neuroanatomy are probably the most advanced of any brain region, which allows the somewhat informed construction of computational models. I wish I had more time to learn modeling methods, I have specific developmental hypotheses I'd like to test in such a model. In the end, I'd probably need to find a knowledgable collaborator though.

SubiculumCode · on Dec 14, 2016

My dissertational research was on the development of the subfields of the hippocampus in childhood. So these papers from the rat literature were relevant and often inspiring.

infinite8s · on Dec 14, 2016

One place to start is the "Principles of Neural Science", the intro textbook to neuroscience.

jostmey · on Dec 13, 2016

Neurogensis? How about neural death as a way to prune large neural networks into more compact ones--now that is a research idea!

cing · on Dec 13, 2016

I believe several papers have examined efficiently pruning neural networks, but neural death would be better branding ;) (https://arxiv.org/pdf/1506.02626v3.pdf https://arxiv.org/pdf/1510.00149v5.pdf)

simonster · on Dec 13, 2016

Yann LeCun called a technique for pruning weights "optimal brain damage" back in 1990: http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf

SimonKinds · on Dec 14, 2016

There is also this one https://arxiv.org/abs/1506.02515 which takes pruning a step further to reduce the sparsity. Also this one https://arxiv.org/abs/1608.04493 which makes sure not to kill any neurons which proves to be useful at a later stage in the pruning process.

kajecounterhack · on Dec 14, 2016

https://arxiv.org/abs/1503.02531

Modern applications of small networks regularly reduce sizes from larger state-of-the-art networks using distillation. Distillation compacts neural networks while affecting accuracy minimally.

Instead of pruning directly from the large network, just learn how it generalizes. Takes fewer nodes / overall operations (Multiplications / Additions).

Vitrified · on Dec 14, 2016

Now that is interesting. I hadn't realized methods to combine trained networks so efficiently were already readily available.

kajecounterhack · on Dec 14, 2016

Certain companies use these methods to make state of the art neural nets work on your phones :)

Also "combine" might not be the right word, since it's really transfer learning. "Distill" is really a descriptive verb.

Maybe my original wording was confusing; I shouldn't have said "distillation compacts" -- distillation is a process by which you can create a more compact version of a complex neural net.

antome · on Dec 13, 2016

This idea is at least partially in use with regularisation and dropout. The difference at least with dropout is that the "killed" neurons are then massaged back into the network in order become useful again.

shmageggy · on Dec 13, 2016

Agreed that this is another way of framing the problem of regularizing a network. Rather than starting with a big network and penalizing complexity, they are starting with a simple network and adding complexity. To that end, I'd've liked to see a comparison to dropout or L1/L2 regularization.

visarga · on Dec 14, 2016

Biological neurons themselves are stochastic so they have an internal "dropout" that doesn't seem to hurt, on the contrary, these perturbations and imperfect communication increase learning ability.

10b5-1 · on Dec 13, 2016

Looks like these researchers are trying to make a network more adaptive, I think that deleting nodes would only make them worse at the current task they're being trained on as well as worse on the tasks they're being adapted to.

You could train a model using neurogenesis to increase its accuracy, and then use distillation to train a smaller network to comparable accuracy.

But these are two very different, but complementary, problems.

rdlecler1 · on Dec 14, 2016

You're assuming that all nodes are functionally important/non-spurious.

10b5-1 · on Dec 14, 2016

I'm not assuming that, I'm giving the model more options and letting it decide what is functionally important/non-spurious. It might take it longer, but I don't assume that.

leegao · on Dec 14, 2016

More parameters also means that the likelihood of overfitting (the training set) increases. Currently (and rather unintuitively, considering that ML is an applied optimization field, and optimization is usually concerned with underfitting), the bane of ML is overfitting. It's easy to supply a model with high representational capacity, but it's impossible to learn anything interesting in a reasonable amount of time. You'll learn how to fit your training set perfectly because your model has enough degrees of freedom to let you fit a million points arbitrarily well, but that doesn't mean that the resulting fit describes the data in a meaningful way. This is why a core tenet of ML is to prune parameters whenever possible. Neurogenesis increases representational capacity whenever it detects that your underlying model does not have sufficient representational capacity to fit the data; from this perspective, you start small (undercapacity) and then you gradually increase your capacity until you hit the optimal model. In other words, Neurogenesis is also a way for you to minimize the number of options.

On the other hand, giving the model with more options than it necessarily needs and letting it decide what is important will usually backfire. Rather than learning a few meaningful/functional features, it can just go ahead and completely fit the training data from the very beginning. It will therefore decide that everything is important, because all those extraneous parameters will let it squeeze that last 0.5% out of your training set.

chriswarbo · on Dec 14, 2016

As others have mentioned, there are approaches like regularisation and dropout which try to do similar things. What I find interesting is the fact there are two reasons to do this: to generalise/avoid-overfitting and to reduce resource usage.

It seems like almost all effort is spent on the former, since everyone's aiming for higher accuracy numbers. Are there any widely-used methods to tackle the latter?

For example, I'm imagining a system which is either given measurements of its resource usage (time, memory, etc.) or uses some simple predictive model (e.g. time ~ number of layers * some constant), and works within some resource bound:

- If we're below the bound, expand the model (add neurons, etc.) to allow accuracy increases (note "allow": it's ok to ignore/regularise-to-zero the extra parameters to avoid overfitting)

- If we're above the bound, prune the model (in a way which tries to preserve accuracy)

- Allocate resources to optimise some objective, e.g. reduce variance by pruning the parameters of the best-performing class/predictor/etc. and using those resources to expand the worst performer.

The closest thing I know of are artificial economies, but they seem to be more like a selection mechanism (akin to genetic programming) than a direct optimisation procedure (like gradient descent on an ANN).

visarga · on Dec 14, 2016

There are many ways to compress networks - by pruning neurons, by enforcing sparsity, by representing activations and gradients on one bit (or a few bits), and by transfer learning where a large net is transferred into a smaller one.

chriswarbo · on Dec 14, 2016

Yes, my question was more about meta-level algorithms for balancing size against performance. Especially adaptive methods such that we're not just growing up to a limit and stopping, but selectively allocating resources to those parts which need them. Adapting over time would be nice too: "thinking harder" when there are idle resources, but shrinking the results back down under load.

SimonKinds · on Dec 15, 2016

This paper http://dl.acm.org/citation.cfm?id=2830854 kind of has a solution to being more efficient. It has two networks and uses the smaller one (more efficient) to infere first. If the result is accurate with high probability (the probability of one class is much larger than the probability of any other class) then there is no need to run the big (expensive) network.

irinarish · on Dec 19, 2016

Exactly!

For that, check out our OpenReview ICLR submission on NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD, by Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano https://openreview.net/revisions?id=HyecJGP5ge

hyperbovine · on Dec 14, 2016

So basically, give your deep networks drugs and alcohol.

groar · on Dec 13, 2016

Basically trying to achieve a certain level of plasticity in deep neural nets by getting inspiration from https://en.wikipedia.org/wiki/Adult_neurogenesis

andreyk · on Dec 14, 2016

To add on to this - they "specifically consider the case of adding new nodes to pre-train a stacked deep autoencoder", by basically keeping track of when certain layers cannot reproduce their input and then adding more nodes+retraining with both new (not reproduced) and old data. It is quite intuitive, basically the most naive and obvious first attempt at the problem (not meant in a condescending way, just want to point out it's not that generalizable and is pretty ad-hoc).

argonaut · on Dec 14, 2016

Sorry if I'm being snobbish, but I do wonder why this paper is only being submitted to IJCNN, a 2nd tier machine learning conference. I know students who publish undergrad research at workshops with lower acceptance rates than IJCNN. I can't think of any important machine learning papers published in IJCNN in the recent past.

habitue · on Dec 14, 2016

It depends on what conclusions you're trying to draw from that information. What conference a paper was accepted to is a second-order signal of the noteworthiness. It's probably easier for someone versed in the field to just read the paper to determine if it's interesting. If you're using the conference as a quick pass/fail as you skim through the abstracts of hundreds of papers, ok, but you probably wouldn't make time to comment on HN about it in that case.

This paper looks like it builds on pretty well-known techniques like stacked autoencoders, so let's see what first-order noteworthiness data we can gather from a quick skim of the paper. If I had to guess why it wasn't accepted into a better conference:

- It uses stacked autoencoders, which are pretty out of fashion

- It bothers reporting results on MNIST

- (more subjectively) It pulls an unfortunately common technique of saying "here's something the brain does" and then hand-waving that it's a deep reason why a technique they've come up with is useful, when in fact the relationship is just "inspired by the general idea of", not "performs the same function as" the biological mechanism. In this case, I think the tenuous connection of their technique to research on neurogenesis is pretty flimsy. Clearly neurogenesis is not how an adult human brain forms new memories or gains proficiency in new skills (which they acknowledge in the conclusion)

argonaut · on Dec 14, 2016

> It's probably easier for someone versed in the field to just read the paper to determine if it's interesting.

> If you're using the conference as a quick pass/fail as you skim through the abstracts of hundreds of papers, ok

You answered your own statement, I think. Most researchers will skip a paper in a second tier conference. In fact, most I know won't read an entire paper - they'll only read some of it and skip stuff.

You're correct that I am not an active researcher (otherwise I would not have time to be commenting). I merely did some research back in college. But honestly that little experience gives me a huge leg up on most HN commenters in understanding research. It's unfortunate that the only reason this paper is #1 on HN is because it has a cool title.

That being said, MNIST is not really a disqualifier. (Unfortunately) MNIST is the most popular dataset referenced in NIPS 2016 papers (https://twitter.com/benhamner/status/805864969065689088). The handwaving is also forgivable; many NIPS papers handwave a lot too.

eruditely · on Dec 14, 2016

Does it matter? I assume they just wanted to get it out and publish it.

argonaut · on Dec 14, 2016

See my sibling comment. It matters because it's a very strong (to academic/industry researchers) sign of quality and whether the paper is worth reading. If you wanted to just put something out there you could just put on arxiv. The authors are academics (?) so they clearly want to publish in the best possible venue.

irinarish · on Dec 19, 2016

A good point was made that a model of neurogenesis must also incorporate neuronal death besides neuronal birth (since hippocampus and the brain as a whole have physical constraints, you can't keep growing your network infinitely :). That's why any model of neurogenesis must incorporate interplay between birth and death of new (and old) neurons; that's was the main idea of the paper I mentioned in an earlier post (this year ICLR submission https://openreview.net/forum?id=HyecJGP5ge) Note that just adding nodes to networks was proposed before, eg. the classical work on cascade correlations.

irinarish · on Dec 19, 2016

For a model that incorporates both neuronal birth and death, see ICLR submission at OpenReview:

https://openreview.net/revisions?id=HyecJGP5ge

NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano

gallerdude · on Dec 13, 2016

Very wishful thinking on my part, but I think we're far closer to a general intelligence than most expect.

empath75 · on Dec 14, 2016

I think what we might see is a kind of autonomous corporation that is nominally under the control of shareholders, a CEO or a board, but which makes decisions without very much or any human input, and which gains some amount of legal rights through corporate personhood.

It won't be a 'general ai', though. More like a set of loosely connected systems that operate 'in the best interests of the shareholders', however that's defined.

It's pretty much the end state of the trend of pushing decision making to algorithms to remove moral and legal culpability from individuals.

BeingIncubated · on Dec 14, 2016

I'm hoping that eventually translates to the state.

amelius · on Dec 13, 2016

Well, an interesting property of the brain is that any I/O relation happens within X milliseconds, which puts a limit on the depth of the network (if the speed of a neuron is limited). It would be nice to have some hard numbers on this.

sharemywin · on Dec 14, 2016

general intelligence seems way over rated to me...

I have plenty of wants and desires that could take a whole army of idiot savants working 24/7 to fulfill.

hmate9 · on Dec 13, 2016

Out of curiosity: how far away do you think we are?

gallerdude · on Dec 13, 2016

I'll say... 5 - 7 years. This is all based on pure speculation, and being a little more than a ML hobbiest. One thing is for sure though - when we do reach that point, everything changes forever.

spynxic · on Dec 13, 2016

I'm immensely confused as to how such a number can be put on a discovery.

Then again.. I also don't understand how researchers come up with a yearly budget for making discoveries.

gallerdude · on Dec 20, 2016

My number is based on raw-nothingness and just pure speculation on my part. I'm by no means an expert in any regard.

hmate9 · on Dec 13, 2016

I would have said 30 years, and I thought I was the optimist :) Anyway I hope you're right and I'm wrong!

Filligree · on Dec 13, 2016

That's not a thing to hope for. We haven't solved the value alignment problem, so the longer until we have human or better-than-human AI, the better.

gallerdude · on Dec 13, 2016

Theory - does empathy rise with intelligence?

Filligree · on Dec 14, 2016

There's no reason to think it does, and plenty of reason to think it shouldn't. We have a fairly good reason of why humans have empathy; it has to do with our evolution as a social species.

zardo · on Dec 14, 2016

On the other hand, the longer it takes to find general AI, the more powerful hardware will be when it is found.

crazypyro · on Dec 13, 2016

Compared to researchers in the field, even your prediction is very optimistic.

thibauts · on Dec 14, 2016

Can you provide references ? I'd like to know what researchers think about it.

chriswarbo · on Dec 14, 2016

There are a few sources at https://intelligence.org/all-publications/

Click on "FC Forecasting" near the top to limit the list to those about AI predictions, including:

- The Errors, Insights and Lessons of Famous AI Predictions - and What They Mean for the Future

- Predicting AGI: What Can We Say When We Know So Little?

- How We're Predicting AI - or Failing To

iverjo · on Dec 14, 2016

How does this relate to Progressive Neural Networks [0]? That technique is also about accumulating knowledge (while not forgetting existing knowledge)

[0] https://arxiv.org/abs/1606.04671

joantune · on Dec 13, 2016

It never ceases to amaze me that the best steps towards achieving AI is to look at how we perceive that a Neuron works and simulate it.

And the thing is, we aren't exactly sure why exactly that is.. it's amazing.

Sometimes the best thing we can do is imitate nature

throwaway287391 · on Dec 14, 2016

I completely blame my own community, rather than you, for writing this, but as an AI researcher, your comment is terribly painful to read. We have little to no idea how actual neurons (let alone entire brains) really work. The things that are often called "(artificial) neural networks" really shouldn't be called that. I strongly prefer terms like "computational networks" or (where applicable) "recurrent/convolutional networks".

sxg · on Dec 14, 2016

Actually we really know a lot about how neurons work. We've got the biophysical properties down, and we understand neurotransmission at the cellular/molecular level for a lot of different types of neurons. We understand signal processing where we transduce sound, smell, sight, touch, taste into neurochemical signals. We even know a decent amount about the early phases of the processing of these "raw data" signals into higher levels of abstraction (e.g. edge detection for vision). What we don't understand is the later phases of processing (advanced layers of abstraction) all the way up to conscious sensation.

intrasight · on Dec 14, 2016

We do know a lot. But the knowledge gap is still tremendous as far as the details of neural synaptic plasticity.

hyperbovine · on Dec 14, 2016

> We have little to no idea how actual neurons (let alone entire brains) really work.

I think that slights neuroscience, which has devoted the past 60 years to answering this question, to a fair degree. But I agree that the biomimetic motivations offered up for various flavors of neural net feel pretty bogus. It seems to me like, among the major old-school researchers in the field, only Geoff Hinton still does this.

throwaway287391 · on Dec 14, 2016

Fair. I was definitely unnecessarily harsh on neuroscience; my quibble is only with my own community's claims that what we're doing is anything like how the brain works. Thanks to you and sxg for correcting the record.

SubiculumCode · on Dec 14, 2016

I think that what neuroscience has found out in the last 60 years is that neurons and synapses are more complex than they had ever dreamed.

I could say the same about genetics, btw. Biology has turned out to be incredibly complex.

joantune · on Dec 14, 2016

I meant that it's a bit similar in the way that the passing on of signals and how over time neurons prefer some connections rather than others.

This part is a bit similar, no?

bmh100 · on Dec 14, 2016

In a very hand-wavy sense, yes. The same can be said of paths to food by ant colonies. The way that ANNs have been drawn as circles with arrows between them looks like a cartoon version of neurons and synapses, which is the origin of the "neural network" part. The timing of data from hidden node to hidden node, the activation functions, and the hidden node outputs have very little to do with biological neurons. ANNs have more in common with a CPU than a brain.

joantune · on Dec 14, 2016

Wasn't the attempt of modeling a collective of neurons, their synapsis, and the way that some connections are reinforced the genesis of the artificial neural networks? That's how the first person brought that concept to life, no? He didn't even have a theoretical explanation on how/why that would work for something, right?

> ANNs have more in common with a CPU than a brain

How so? which parts are similar?

bmh100 · on Dec 14, 2016

Yes, ANNs are inspired by the brain.

Here is a list of properties that ANNs shared with CPUs that are different from brains:

* Synchronized activation vs. asynchronous / partially synchronous activation

* Digital signals vs. analog signals

* Instantaneous transmission of signals vs. delay imposed by axon and dendrite length

* Uniform signal vs. use of various neurotransmitter signals

* Rapid activation speed (GHz) vs. slow activation speed (Hz)

* The use of negative signals vs. strictly positive quantities of neurotransmitters

* Low average connections (10-1000) vs. high average connections (5,000-100,000)

* Low energy efficiency vs. high energy efficiency

For a detailed essay on the topic, see: http://timdettmers.com/2015/07/27/brain-vs-deep-learning-sin...

joantune · on Dec 14, 2016

Sure, they are still running on CPUs, but ANNs are still modeled with CPUs to do what NNs do, at least at some levels where experiments showed that they work.

Sure, some of the properties of NNs do not transpose well to ANNs. As someone pointed out in a comment here with an article showing that if you apply the same kind of signal it doesn't work.

But the fact remains: we are being more successful on AI advancements by trying to emulate parts of our brain than we were with other techniques.

We didn't knew that this would happen when it all started, but it did.

Out of the blue, no-one could look at a model of a yet to be implemented ANN and say it would work, and why. It has all been experimentation, taking the brain as a raw blueprint.

And although many other phenomenas that happened with the brain didn't work well with ANNs, neurogenesis apparently did.

It's impressive IMO and quite humbling that we are getting so many achievements out from mimicking nature, and we aren't 100% sure why it worked in the first place.

That's all that I meant to say

bmh100 · on Dec 14, 2016

I just don't want you to get the wrong impression. This is a single paper about a technique for adding neurons to ANNs over time, and it is only one of many over the last few decades. The paper does not have the evidence to indicate that this is a major breakthrough. The industry as a whole generally does not add neurons to an existing model when updating that model. The vast majority of applications also use backpropagation for training, which is not what our brains use. So even if we ignore the implementation on CPUs, ANNs are still far from behaving similarly, even in a conceptual way, from brains. I must disagree that "we are getting so many achievements out from mimicking nature".

lawless123 · on Dec 14, 2016

> Instantaneous transmission of signals vs. delay imposed by axon and dendrite length

Would there be anything to gain by simulating this?

bmh100 · on Dec 14, 2016

It adds an additional parameter that influences RNN behavior over time, so I could see it possibly being useful. I would speculate that this could have value for providing slowly-updating subsystem information to real-time control systems.

AstralStorm · on Dec 14, 2016

Temporal recurrent neural networks have been tried, I think by Microsoft Research.

AstralStorm · on Dec 14, 2016

Also:

* local regular structure vs irregular structure with global elements

bmh100 · on Dec 14, 2016

Oh yes. If we want to put a bigger one on the list, then there's the whole matter of vast quantities of circumstantial data being left out by ANNs (sight, sound, past memories, emotions, arousal, touch sensations, etc.). But lists like this can go on for very long.

SubiculumCode · on Dec 14, 2016

As a brain scientist, it pains me as well.

spott · on Dec 13, 2016

This isn't strictly true though.

Spiking Neural Networks [0] attempt to be more accurate representations of human neurons, but haven't really caught on because they aren't really much better than our perceptron model of neurons, at least for the things we are trying to do with them.

[0]http://www.ane.pl/pdf/7146.pdf

joantune · on Dec 14, 2016

nice, thanks for sharing, interesting read so far (read the introduction), will definitely give it a better look out of curiosity

partycoder · on Dec 14, 2016

Well, neurons have many properties... their information processing capabilities are one aspect, but they also deal with the physical level of communication and staying healthy.

Neurons are also a family of cells, and are very diverse in shapes and functions. We tend to oversimplify our representation of neurons. There are simple neurons and then you have neurons like the Purkinje cell that are massive.

Neurons also rely on their counterparts, the glial cells, that are much less often mentioned.

I think because of this, it will be a while until we fully understand the role of each one of them.

joantune · on Dec 14, 2016

If i'm not mistaken (I had the introductory class on Neural networks quite some years ago) this all started out of trying to simulate neurons in a manner.

Much so that one of the pioneers of this got discredited by other scientist that for some reason simply could not accept that these would work, and that same pioneer started to get his funding discredited and started believing in his opposition so much so that he sailed away to his death, some argue intentionally (as his life's work had been, even by his eyes, seen as useless).

Now, I'm having a hard time remembering the names of the people on that story, if someone knows who I'm talking about, please remind me of those

partycoder · on Dec 14, 2016

Probably you might be referring to McCulloch and Pitts.

visarga · on Dec 14, 2016

We're not simulating brain neurons:

- real neurons are stochastic and communicate through spikes, artificial neurons can communicate real values efficiently

- real neurons are more like automatons, they have a dynamic in time, learning happens as a continuous interaction with only its neighbors; artificial neurons are "static" (use discrete time) and implemented by forward and backward pass, and also can use nonlocal information

- real neurons can't backpropagate, because backprop requires the transmission of gradients back the same connections, but in reverse - brain connections don't support that kind of bidirectional data flow; artificial neurons work best by backprop

- real neurons can't implement convolutions, it would require a neuron to slide over a field; also real neurons can't implement RNNs as they are, and don't use backpropagation through time BPTT

So, artificial neurons are much less hampered and can do many things that real neurons can't do or have to use some less efficient method. That means brain neurons still have some tricks up their sleeve. Artificial neurons are quite different from brain neurons, and it's right to be so, because they can be more efficient that way.

spynxic · on Dec 13, 2016

I find that somewhat strange.

Why attribute the idea of introducing new nodes to a graph to biological concepts? It seems like a simple step in exploration, similar to how one might think to vary the weights of the nodes randomly over some range.. unless there is some technique biology uses to pre-configure the nodes upon introduction to the network, that might be rather interesting.

joantune · on Dec 14, 2016

Because they tried to model neurogenesis, the same way that artificial neural networks were invented while trying to mimic some parts of how neurons work I guess..

m3kw9 · on Dec 14, 2016

Very nice coin term for just another deep learning method

hmate9 · on Dec 13, 2016

Slightly off topic, but I hate how publications are written. It seems like authors are purposely using big words and sentences that are often 5-6 lines long in order to make it seem more clever.

I find myself often having to reread a sentence in order to understand it.

These algorithms are often very simple and can be easily explained. Don't over complicate them.

bicubic · on Dec 13, 2016

A lot of the time the verbosity isn't so much to sound more clever as it is to be very specific and explicit about what the author is trying to convey. There's a lot of changing assumed knowledge and jargon in various fields and our use of language changes over time. The publication writing style is an attempt to factor that out.

rdlecler1 · on Dec 14, 2016

That's what an appendix is for. Letters to Nature are 1500 words. You'd be surprised how effective that forcing function is. Not only do you preserve meaning, but you can convey it better because it respects the readers cache size limitations.

bicubic · on Dec 14, 2016

By that analogy, an appendix is even worse; requiring a jump and a complete cache flush for the reader :p

amelius · on Dec 13, 2016

Then here's a challenge: could you write the abstract of the article in "simple English", without changing the meaning?

unfamiliar · on Dec 14, 2016

Here's my attempt; not a huge number of changes because it was not too bad to begin with, but with slightly less self-indulgent language, and a lot of the jargon has to stay (partly because I don't know the field):

Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - areas where humans have always performed better than conventional algorithms. In contrast to biological neural systems, which are capable of learning continuously, deep artificial networks have a limited ability for incorporating new information after a network has been trained. As a result, continuous learning methods could be very helpful in allowing deep networks to handle data sets which change over time. Here, inspired by the process of adult neurogenesis in the hippocampus, we investigate how adding new neurons to artificial neural networks can allow them to acquire new information, while preserving what they have already learned. Our results on the MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes lower and upper case letters and digits, show that neurogenesis looks like a good approach for tackling the "stability-plasticity dilemma" that has been a problem for adaptive machine learning algorithms for some time.

As an academic, I tend to agree that we frequently feel compelled to apply more verbosity than is strictly required in order to communicate the intended semantic constructs.

SubiculumCode · on Dec 14, 2016

Academic writing avoids using periods. Unfortunately.

hmate9 · on Dec 14, 2016

Problem: current methods for training neural networks cannot learn continuously. Once they have been trained, they cannot easily learn new information. We propose a method to fix this problem. Our algorithm adds new neurons to the neural network which allows it to learn new information, while maintaining everything it already "knows".

I showed this to my father, a surgeon, and he said he understood it. But not the original abstract.

shmageggy · on Dec 13, 2016

Although I just glanced at a few parts of this, I did not find it to be poorly written. Can you give an example where you thought it was too verbose or unnecessarily complex?

hmate9 · on Dec 14, 2016

For example in the abstract: "adding new neurons to deep layers of artificial neural networks in order to facilitate their acquisition of novel information while preserving previously trained data representations"

Nobody talks like this. In my head I read this sentence and I have to translate it to "we add extra neurons to existing networks so they can learn new information while remembering everything it already knows".

the8472 · on Dec 14, 2016

> Nobody talks like this.

> I have to translate it

But you have to distinguish active and passive vocabulary. Words you commonly use and words you understand when others use them. And you also have to distinguish between written and spoken language. E.g. nobody would actually say "exempli gratia" but it's commonly used in writing.

English is not my mother tongue and for me it mostly falls into passive vocabulary, but it is perfectly understandable. I don't have to mentally translate the whole sentence into simpler words before being able to grasp its meaning.

And it's not like those words are obscure, they are just place 2 or 3 of most common uses among their respective set of synonyms.

bmh100 · on Dec 14, 2016

However, your statement is actually vague and ambiguous.

"Extra neurons"? Input layer? Output layer? Just before a final, fully-connected layer? Somewhere in between?

"Everything it already knows"? What does it know? Character probabilities, like a charnn? Image categories like in a CNN? Input distributions, like a GAN?

From the abstract, I can immediately tell that this paper is about modifying deep auto-encoders in the hidden layers. From that, I can immediately understand that the paper is not about adapting to new input formats or output formats, but instead about inputs from a new distribution but in the same format.

The author's intended audience, researchers and academics, do talk like this. They do so because it is quickly understandable and actually information dense, as indicated in my above paragraph.

bjterry · on Dec 14, 2016

Anytime someone uses the phrase "in order to facilitate" they are being more verbose than is necessary in order to signify their greater erudition. There is no meaningful semantic or technical distinction between "in order to facilitate" and "to help."

bmh100 · on Dec 14, 2016

I'll grant you that. It's simple wordiness.

SubiculumCode · on Dec 14, 2016

Academic Writing : The battle between precision and TL;DR

mountainriver · on Dec 14, 2016

Seems a bit like a GAN

paulsutter · on Dec 13, 2016

TL/DR:

- "We specifically consider the case of...a stacked deep autoencoder (AE), which is a type of neural network designed to encode a set of data samples such that they can be decoded to produce data sample reconstructions with minimal error

- "The first step of the NDL algorithm occurs when a set of new data points fail to be appropriately reconstructed by the trained network...When a data sample’s RE is too high, the assumption is that the AE level under examination does not contain a rich enough set of features to accurately reconstruct the sample.

- "The second step of the NDL algorithm is adding and training a new node, which occurs when a critical number of input data samples (outliers) fail to achieve adequate representation at some level of the network.

- "The final step of the NDL algorithm is intended to stabilize the network’s previous representations in the presence of newly added nodes. It involves training all the nodes in a level with both new data and replayed samples from previously seen classes on which the network has been trained.