I felt the article undersold an important technical detail about this work. Many randomized algorithms have a "with high probability" guarantee, which for linear systems might look like:
for any ε > 0 and 0 < δ < 1, using poly(dimension, #nonzeros, condition, 1/ε, 1/δ) floating-point operations, our algorithm finds a solution having less than ε error with probability 1 - δ.
However, that is not the kind of guarantee stated in Theorem 1.1 in the paper. Even though the algorithm uses randomness, the analysis yields a classic big-O complexity guarantee that does not involve probabilities.
This is a big difference, and it makes the result stronger.
Not really. This algorithm is only fast with high probability, and getting a linear equation solver to be always-correct is just a matter of checking the output at the end and trying again if you're (too) wrong. So always-correct and fast-with-high-probability is a feature of any randomized linear equation solver.
It still amazing stuff, I'm just pointing out that this particular aspect of the behaviour is not surprising.
This is a pretty common omission to make, even though it's strictly speaking wrong. But suppose for a moment that the algorithm was always fast. In that case, it would not need randomness anymore, as you could just pick a fixed string as the "random" outcome, and you would be guaranteed success.
So, if an algorithm uses randomness, it will either fail with some probability or be slow with some probability. If neither is true, that algorithm doesn't need the randomness.
Randomized quick sort could pick the wrong pivot at every step (or at every other step, that's still quadratic complexity), so it's only O(n log n) with high probability.
Quicksort is popular, and Quicksort bad inputs can happen in highly plausible specific cases (e.g. almost sorted input vs. pivot on first array element) and can be easily engineered whenever an attacker controls a large input to be sorted and can guess what sorting algorithm is being used. Probability of bad performance is only low on implausibly random input instances.
Is this a theoretical difference, or a difference in terms of an actual implementation? I have no knowledge of linear systems algorithms, but as far as I was aware, many w.h.p algorithms resolve to correctness with probability of 1 - (1/n)^c (with c effectively being a hyperparameter), which would seem like quite a strong result in itself.
This is a big difference, and it makes the result stronger.