Hacker News new | past | comments | ask | show | jobs | submit login
IIR filters can be evaluated in parallel (raphlinus.github.io)
79 points by zdw on Feb 16, 2019 | hide | past | favorite | 21 comments



The transposed direct form II allows a biquad to be calculated with two scalar by vector multiplications and some shuffling, which should be faster than the proposed matrix solution I believe.


I'd be happy to see benchmarks of that. The problem is that the "shuffling" creates serial data dependencies, while the matrix form doesn't. Sure, the number of multiplications is smaller for direct forms, but that's not what has the most effect on performance.


  y[i] = x[i] * c + (x[i - 1] * c + y[i - 2] * (1 - c)) * (1 - c)
So you can compute y[i]=f(y[i-n],x) for any n, but that doesn't give you intermediary results : all y[i-j] for j between n and 0...

I feel like trying to compute (y[i-j] for j between n and 0) on the GPU (n+1 things to compute in parallel here) would not be efficient because the more you increase n the more y[i] becomes really longer to compute (at least linear with n) and you'll need to wait for the calculus of y[i] to finish before having the result of the result of y[i-n]. What am I not seeing here ?


You can treat it as a multiplexing operation. The first y[i] is computed normally. y[i+1], y[i+2] etc. are computed with a parallel form, up to as many cores as is optimal. Normally cores will wait for data, finish it very quickly, and then sit idle since they're waiting on memory, but this allows each processing core to return more results from data at a given time i without a serial readback(which introduces memory bandwidth pressure). The optimal throughput strategy is to push the parallelization upwards until the latency of doing this heavy computation outstrips the memory latency savings.


Interesting thanks for the answer. From what you said it feels like it could be good for AVX-like CPU accelerated instructions with all those latencies it would be an optimization like loop unrolling ; but for GPU, really ?


Sorry for being "that" person, but this article uses terminology I'm so unfamiliar with that I don't know what it's about, or what it's trying to explain. Would someone mind enlightening me? The Wikipedia page on IIR (which I assume means infinite impulse response) isn't very helpful: https://en.wikipedia.org/wiki/Infinite_impulse_response


If x is an array of input samples and y is an array of output samples, a simple FIR filter might look like:

  y[t] = a*x[t] + b*x[t-1]
Whereas a simple IIR filter might look like:

  y[t] = a*x[t] + b*y[t-1]
In the first equation, only two input samples (x[t] and x[t-1]) are used to compute each output sample. A sharp "impulse" at x[0] will only affect the values of y[0] and y[1]. That's what makes it an FIR filter—the number of output samples affected by an impulse is finite.

On the other hand, the second equation has a y[t-1] term on the right hand side. Each output sample feeds back into the next one. To see what this looks like purely in terms of x samples, you can substitute using the equation for y[t], then multiply:

  y[t] = a*x[t] + b*y[t-1]
  y[t] = a*x[t] + b*(a*x[t-1] + b*y[t-2])
  y[t] = a*x[t] + b*a*x[t-1] + b^2*y[t-2]
  y[t] = a*x[t] + b*a*x[t-1] + b^2*(a*x[t-2] + b*y[t-3])
  y[t] = a*x[t] + b*a*x[t-1] + b^2*a*x[t-2] + b^3*y[t-3]
  ...
Eventually you'll get the sum (assuming the samples start at x[0]):

  y[t] = sum from n=0 to t of b^n*a*x[t-n]
The sum ranges over the entire array of samples, all the way back to the beginning. Each input sample from x influences every output sample afterward. The output for an impulse—a single positive x[0] sample followed by x[1] = x[2] = ... = 0—is

  y[t] = b^t*a*x[0]
Since these samples are non-zero for all t (though decaying exponentially according to the value of b), the second equation describes a filter with infinite impulse response.


By the way, in signal processing that’s an example of an IIR filter, but that particular filter is also referred to as an exponential moving average (https://en.wikipedia.org/wiki/Moving_average) and is a really key tool in statistical process control and other kinds of “industrial” mathematics (https://en.wikipedia.org/wiki/EWMA_chart). It has effectively the same purpose — remove fine detail (“high-frequency noise”) from a time series.


Apart from time-series data, I've seen EWMA used by Elasticsearch to do load balancing (?) and also outright by actual load balancers as well to help with latency sensitive flows.

https://twitter.github.io/finagle/guide/Clients.html#power-o...

https://blog.cloudflare.com/i-wanna-go-fast-load-balancing-d...

https://www.nginx.com/blog/choosing-nginx-plus-load-balancin...


This is an excellent description. Thank you. I’ve been a consumer of these in an audio capacity (CSound digital audio synthesis[0]), but haven’t seen as simple and lucid a description as this.

[0] http://www.csounds.com/manual/html/SigmodStandard.html


Thanks for the explanation; this clears up the mathematics behind the operation. Follow up question, though: why would you want to do this? Is this some sort of digital signal processing procedure?


Yeah; say for example you're trying to recognize a swipe with a certain velocity on a touchscreen:

    def touchDown():
        velocity = 0
        lastTime = now
    def touchMoved(from, to):
        velocity = (to - from) / (now - lastTime)
        lastTime = now
    def touchFinished():
        if velocity.x > threshold:
            performGesture()
You test it out, and discover that the gesture is kind of flaky. Sometimes it works properly, but just as often it fails when it feels like it should have worked. It turns out that maintaining a constant velocity is hard—as your finger moves across the screen, the velocity can fluctuate around between samples. The test in touchFinished() looks at only the last sample taken, which is often the worst, as your finger has just lifted off from the screen.

So instead of just looking at the last velocity, you keep track of a "smoothedVelocity" which averages your velocity samples together:

    def touchDown():
        smoothedVelocity = 0
        lastTime = now
    def touchMoved(from, to):
        velocity = (to - from) / (now - lastTime)
        smoothedVelocity = 0.25 * velocity + 0.75 * smoothedVelocity
        lastTime = now
    def touchFinished():
        if smoothedVelocity.x > threshold:
            performGesture()
With this change, your smoothedVelocity moves a bit toward the measured velocity with each sample, but won't jump directly there. This feels a lot more predictable. Stopping your finger momentarily as it lifts off the screen won't stop the gesture from being performed, for example.

The smoothedVelocity code implements a low-pass IIR filter (it filters out the high-frequency fluctuations while letting the lower-frequency motion pass through). You could also imagine just keeping the last 2 velocities and averaging them together at the end: that would be an FIR filter.


Good example. I would add that you don't need to understand DSP and IIR to understand that particular example (IIR with only one memory case), it's only a 1D barycenter between previous version and last weighted value.


In DSP when you implement a filter you typically have conflicting requirements. If you have a particular transfer function you wish to approximate, FIR is one choice. It is relatively easy to synthesize, and has linear phase response, and no ringing (Finite Impulse Response is the name, after all). However, it may take many more “taps” (delay stages) to implement as compared to an IIR. Sometimes, you care about the extra resources required (making it fit into your FPGA) but more often you can not tolerate the latency of all of those delay stages.

With IIR, you typically have much lower latency because it requires many fewer stages to get the same-ish transfer function. Tradeoffs: ringing, perhaps problematic phase response, more complicated synthesis problem.


It sounds like IIR is about reusing previous calculations? But you don't actually want every previous sample to affect the result; it's more a side-effect of how the calculation is done?


This is off the top of my head.

When you want to filter some data you often start by defining the response you're after, for example for a sub-woofer output you might want a low-pass filter that cuts off frequencies higher than 250 Hz.

Then you use some math to figure out the filter coefficients that will give you that response. For many (all?) filters you can get away with far fewer coefficients, often orders of magnitude like 4 vs 1024, if you go for an IIR filter rather than a FIR filter. This of course makes it far cheaper computationally to implement.

IIR filters have some downsides though compared to FIR, for one an impulse "never goes away" (at least in theory) since there's always some of it begin fed back into the next step. It's also difficult to make them have linear phase response, which can be important in certain settings.


Think of a filter as an electronic circuit that has a voltage input and a voltage output. In control theory a filter is described by how it reacts to an "impulse", which is essentially a brief peak followed by zero voltage.

Finite Impulse Response means that the output of the filter will eventually go to zero - after a finite duration. Infinite Impulse Response means that the output of the filter may never go to zero.

It can be proved that FIR and IIR filters have different properties; the article mentions that IIR filters are necessarily stateful, for instance.


Another aspect from a pure CS perspective: FIR filters are pure functions; IIR filters are not because they are stateful. FIR filters have lots of nice properties but [at least] one big downside: They tend to require more computation to accomplish the same result. So sometimes an IIR filter is a better choice when you don't have much processing power (e.g. in a small embedded single-core processor).


One of my favorite technical books teaches Digital Signal Processing (and on the way, a lot of engineering pearls of wisdom), and it's available for free as a PDF from the author's website:

http://www.dspguide.com/pdfbook.htm


Since an infinite impulse response is a specific type of impulse response its article might be a better introduction to the topic.

https://en.wikipedia.org/wiki/Impulse_response


Click through to the page on digital filters for a better entry point. It'll still be pretty tough to understand though unless you get hands on and start playing with inputs and outputs and filters in something like Matlab or Numpy.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: