Better than a forgetting factor, add a Kalman filter (http://en.wikipedia.org/wi...

treeface · on May 29, 2012

Could you expound on this a bit? What attributes would you have to add? How would you calculate scores?

IgorPartola · on May 30, 2012

You would add a variance (P), estimate of the value and the timestamp of the last measurement. Using the last timestamp you can calculate Q. Generally, the older the last measurement, the higher Q.

The calculation is straightforward once you let some things be the value of identity:

  P1 = P0 + Q
  K = P0 / (P0 + R)
  x1 = x0 + K * (z - x0)
  P1 = (1 - K) * P0

Now you have the new score for your data (x1) and a new variance to store (P1). Other values are:

x0, P0 - previous score, previous covariance Q - Roughly related to the age of the last measurement. Goes up with age. R - Measurement error. Set it close to 0 if you are sure your measurements are always error-free. z - the most recent measured value.

Let's say you measure number of clicks per 1000 impressions. Now you can estimate the expectation value (x1) for the next 1000. After the second 1000 re-estimate again.

treeface · on May 30, 2012

Thanks for explaining that!

JeanPierre · on May 29, 2012

How does a decay factor not "trust your 'new' data more than really 'old' data"?

IgorPartola · on May 30, 2012

The Kalman filter is much more sophisticated. Typical re-estimation will be:

  x1 = x0 + alpha * (z - x0)

where alpha is static. The Kalman filter will make it dynamic, taking into account how you obtained the measurements, how old the last re-estimation was, how noisy the process is, etc. Want to do multi-variate analysis? Make alpha a matrix transform.