The AP is essentially a silicon implementation of regular expressions, augmented with a relatively small number of Boolean gates and counters. It has many simple matching elements that can be connected together to create automata networks that advance in parallel. The output of a single automaton is a Boolean decision, indicating whether the automaton is in an accepting state or not. It is natively a MISD architecture, a rather dusty corner of Flynn's Taxonomy, because all active states receive the input symbol at the same time.
In contrast, FPGAs are much more general purpose and as a result have fewer individual processing elements. Indeed, an FPGA can be made to emulate the AP for small designs, but the AP is able to accommodate much larger (or more instances of) automata on one chip, resulting in much greater throughput or bandwidth, depending on the configuration.
It is fair to compare the AP to FPGAs in the context of problems that can be reduced to regex (augmented with digital logic and counters), but not in a general purpose sense. Just because a problem might reduce to regex and can be run on the AP, doesn't mean that it can be done so efficiently. But there a host of problem domains in pattern matching that do map efficiently to the AP.
I don't think this is true. All non-deterministic finite automata (NDFA) can be converted to deterministic finite automata (DFA), and regular expressions are equivalent in power to DFAs
Edit: Actually just read the paper that someone linked to. You're right that their grammar is larger than that of DFAs and regular expressions, but it appears that's because they extended it rather than because they're using nondeterminism.
The problem with DFA is that they explode exponentially for number of choices containing ".*". Then you fail to get a locality of reference, etc. DFA also very sequential.
This is why NDAs are better - you can run several NDAs in parallel with their states and inputs. It basically becomes vectorized problem.
Yeah, so it becomes a space/time tradeoff. I think whether which is better depends on the problem you're solving. Many DFAs don't blow up space exponentially, so you'd rather have the DFA in that case. You're right though that for DFAs with an exponential increase in space requirements you are better off just doing the NDA in parallel.
Micron announced they were working on this some time ago. Have they released something new? Or are we just talking about the same unfinished R&D project that we were back in the spring?
It sounds a like like the Parallela boards, as in it's a 2-dimensional grid network and it sounds like they broadcast to the edge of the grid. My question is how do you scale up bandwidth, if it's broadcast, and not multicast?
AFAIK GPUs are floating-point centric, meaning you'd have to reimplement some functionality on top of FP. Plus it seems that each 'cell' can be it's own state machine, I also believe that GPUs are less fine-grained than that (you allocate clusters of units to compute a programmable shader).
http://www.micronautomata.com/documentation/download_white_p...
"An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing
Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, Harold Noyes Member, IEEE"