Hacker News new | past | comments | ask | show | jobs | submit login

I'm curious about the method chosen to give short term memory to the gradient. The most common way I've seen when people have a time sequence of values X[i] and they want to make a short term memory version Y[i] is to do something of this form:

  Y[i+1] = B * Y[i] + (1-B) * X[i+1]
where 0 <= B <= 1.

Note that if the sequence X becomes a constant after some point, the sequence Y will converge to that constant (as long as B != 1).

For giving the gradient short term memory, the article's approach is of the form:

  Y[i+1] = B * Y[i] + X[i+1]
Note that if X becomes constant, Y converges to X/(1-B), as long as B in [0,1).

Short term memory doesn't really seem to describe what this is doing. There is a memory effect in there, but there is also a multiplier effect when in regions where the input is not changing. So I'm curious how much of the improvement is from the memory effect, and how much from the multiplier effect? Does the more usual approach (the B and 1-B weighting as opposed to a B and 1 weighting) also help with gradient descent?




I assume that multiplying by a given factor shouldn't matter since you still have the learning rate as a factor (which is itself a factor of the gradient). This might just mean that the learning rate should be lower or higher with this method.


The question is then really about which method makes it easier to tune parameters or which helps intuition the most.


this is a good way to think about this.


Very good question! I have considered this issue too. This form of weighting is the kind used in ADAM, and is qualitatively different from the updates described here. The tools of analysis in this article can be used to understand that iteration too, (this amounts to a different R matrix) and I would be curious to see if it too allows for a quadratic speedup.

[EDIT] As per halfling's comment, this is just a change of the learning rate by (1-beta)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: