Galois field instructions on 2021 CPUs

baby · on Nov 6, 2022

I was hopping this would be about larger fields. For zero-knowledge proofs we need to handle 255-bit field and operations are quite slow due to that (https://o1-labs.github.io/proof-systems/specs/pasta.html#pal...)

NickM · on Nov 6, 2022

Could someone with more knowledge than myself help explain what the practical applications of this are?

adonovan · on Nov 6, 2022

You can think of a Galois field as a specially chosen permutation of a set of numbers such as 0-255 and a redefinition or remapping of the arithmetic operators + - * / such that each one is reversible: if a op b = c, then given c and b you can find a by applying the inverse of op. (In normal arithmetic of course, multiplication and division aren't reversible because of zero.) The actual operators aren't ADD SUB MUL DIV, but they are like them in the sense that they are easy to implement in a hardware ALU as functions over bit patterns.

This unlocks all kinds of clever techniques. For example, it lets you efficiently compute a "rolling" hash of every n-byte substring of a document, by simply sliding an n-byte window across the document one byte at a time, multiplying the previous hash by the incoming byte and dividing by the outgoing byte. This has lots of applications in cryptography, compression, searching, and so on.

superjan · on Nov 6, 2022

Thanks for the explanation. Are addition/ multiplication still linked the same was as for normal integers? Like 2*a == a + a? Edit: I do get that ‘2’ in GF might not be the same as integer 2.

gizmo686 · on Nov 6, 2022

Pretty much. Fields are generally considered to be the structure that links the common notion addition and multiplication. In particular, you need 3 things to qualify as a field:

1) Multiplication behaves as expected (without reference to addition)

2) Addition behaves as expected (without reference to multiplication)

3) a(b+c) = ab + bc

The third requirement is the only part of fields that links multiplication to addition. In particular, if we assume that 2=1+1, then we have: 2a = (1+1)a = a(1+1) = a+a

ngcc_hk · on Nov 6, 2022

What is meant by “without reference to”.

How about - Addition is move and multiplication is also a move. a + b move from 0 a step then b steps. a * b is move a step b times.

Is that count?

hither_shores · on Nov 6, 2022

> Addition is move and multiplication is also a move. a + b move from 0 a step then b steps. a * b is move a step b times.

By "move" do you mean adding 1? If so, then no. Consider, for instance, the complex numbers: multiplication by i isn't any sort of iterated addition.

gizmo686 · on Nov 7, 2022

I tried to write it without going into to much jargon or machinery, but to be more formal, what I would say:

1) The elements (excluding 0) form an abelian group under multiplication, with the identify element 1.

2) The elements form an abelian group under addition, with the identity element 0

4) 1 != 0

The addition of 4 is really just a technicality to exclude a single structure from qualifying as a field.

If we look at 1 in more detail, we can see that it literally make to reference to addition. If you have a structure with a proposed multiplication and addition function, there is literally no function you could pick for addition that would cause requirement 1 to fail. Similarly, there is no function you could pick for multiplication that would cause requirement 2 to fail.

To see this more explicitly, we can expand out the definition of abelian group to see what is required by 1 and 2:

1.1) a * b = b * a

1.2) (a * b) * c = a * ( b * c)

1.3) a * 1 = 1 * a = a (a != 0)

1.4) For all a != 0, there exists an a^-1 such that a * a^-1 = 1

Technically, given my original definition, 1.1 and 1.2 should also exclude 0, but given the overall field structure, they turn out to be true even if you allow 0.

Note that nothing in the expension of (1) invokes any notion of addition.

Simmilarly, for addition, we would have:

2.1) a + b = b + a

2.2) (a + b) + c = a + ( b + c)

2.3) a + 0 = 0 + a = a

2.4) for all a, there exists a -a such that a + (-a) = 0

In this case, we have imposed a bunch or restrictions on addition without invoking any notion of multiplication.

> How about - Addition is move and multiplication is also a move. a + b move from 0 a step then b steps. a * b is move a step b times.

I'm not sure how to interpret this. In the case of GF(p) for prime p, I can make sense of that statement by thinking of GF(p) as a circle with p tick marks. E.G, as modular arithmetic.

In the more general case of GF(p^n), I tend to think of the field as an n-dimensional vector space over GF(p). As such, I'm not clear on what you mean by "move b steps". For instance, consider GF(5^7) (using a standard polynomial construction)

We want to compute: (3x^6 + 2x^3 + x + 4) + (x^2 + x + 1). With your proposal, I would need to move "x^2 + x + 1" steps, and I have no idea how to do that.

In fact, it gets even worse when you consider multiplication. Consider x^6 * x^1. This results in x^7, which should reduce to a polynomial of smaller degree. For instance, 2 possible constructions of GF(5^7) are:

   GF(5^7) = A = Z5[X]/<x^7 + x^6 +1>

   GF(5^7) = B = Z5[X]/<x^7 + 2 * x^6 +1>

Which can be read as:

  Z5[X] are the polynomials with coefficients in the integers mode 5.

  Z5[X]/<f(x)> is Z5[X] mod f(x). That is to say; generalize the idea of modular arithmetic to polynomials. For instance x^7 + 3x^6 = x^3 + 3x^2 mod x^4. In the case of Galois fields, we add the additional requirement that f(x) have no non-trivial factors. This is analogous to requiring that the modulus be prime when working with integers.

In the case of A, you have x^6 * x^1 = x^7 = 4x^6 + 4

In the case of B, you have x^6 * 2x^1 = x^7 = 3x^6 + 4

As you can see, the specific value you arrive at depends on how you choose to construct GF(5^7). As it turns out, you can prove that any possible construction ends up being equivalent in a meaningful technical sense (called isomorphism).

twic · on Nov 7, 2022

> In normal arithmetic of course, multiplication and division aren't reversible because of zero.

If a and b are in the range 0-255, am I right in thinking that (((a + 1) * (b + 1)) % 257) - 1 is also in the range 0-255? Does that also work as a group operation? Does division done this way have the properties you'd need?

vba616 · on Nov 7, 2022

>(In normal arithmetic of course, multiplication and division aren't reversible because of zero.)

"In the beginning, zero was created. This has made a lot of people very angry and been widely regarded as a bad move..."

corsix · on Nov 6, 2022

The next article in blog order is one application: https://www.corsix.org/content/reed-solomon-for-software-rai...

Another application is crypto: the SubBytes step of AES maps very neatly onto gf2p8affineinvqb, so algorithms that are similar to AES but not exactly AES could make use of gf2p8affineinvqb

salicideblock · on Nov 6, 2022

Expanding on this,a very nice property of Galois Counter Mode (GCM) for AES is that encrypting one block does not require the previous block to be encrypted, like in AES-CBC.

This means that AES-GCM can take advantage of data parallelism and there are big speedups in threaded and pipelined CPUs.

In short, you can get big latency and throughout gains by using AES-GCM over AES-CBC.

nwellnhof · on Nov 6, 2022

> algorithms that are similar to AES but not exactly AES

The SM4 cipher, for example: https://en.wikipedia.org/wiki/SM4_%28cipher%29

karma_fountain · on Nov 6, 2022

Error correcting codes, specifically https://en.wikipedia.org/wiki/BCH_code

hughw · on Nov 7, 2022

You can do discrete Fourier transforms in Galois fields. That used to be more important when floating point was less ubiquitous. And if you’re designing custom hardware to do FFTs you can save a lot transistors if you’re willing to discretize.

A fantastic reference is [1]

[1] Fast Algorithms for Digital Signal Processing. https://www.google.com/books/edition/Fast_Algorithms_for_Dig...

greesil · on Nov 6, 2022

I guess cryptography, but they're also used in GPS, and CDMA.

Rebelgecko · on Nov 6, 2022

I think these instructions can drastically speed up the GCM part of AES-GCM, which is used by a lot of https websites (not sure if it's the most popular TLS cipher, but it's almost definitely top 3). Part of why Salsa/Chaha become popular on phones and embedded devices is because for a while only x86 had specialized instructions for GCM

sbf501 · on Nov 6, 2022

Why is 256 == 0 computer friendly and 251 == 0 reduction math friendly? Other than modulus bias, that initial assertion isn't clear to me. I mean, you can say "well, 251 isn't 2^N" which is abundantly obvious, but why does that matter? Is it because you can't create a bit polynomial for a non-power-of-two reduction?

jacoblambda · on Nov 6, 2022

`251 == 0` is maths friendly because 251 is a prime number. Like the article outlines, this means that for any given number x in the field, there exists some other y where `x * y == 1`. This isn't necessarily the case for non-primes.

When that breaks down, you can no longer invert numbers in a reversible way. aka `1/(1/x) == x` no longer holds true. This means you can no longer easily manipulate values or equations using the common set of properties established for most mathematics, hence "not maths friendly".

sbf501 · on Nov 6, 2022

So P=256 cannot define a field because it isn't prime?

creata · on Nov 6, 2022

There is a field with 256 elements, because 256 is the power of a prime. But that field is not the integers mod 256: it has different rules for addition and multiplication.

sbf501 · on Nov 6, 2022

Ah, thanks! Fields are a lot less intimidating than I thought they would be! Well, I mean: the basic idea (after reading these replies + wikipedia).

vmilner · on Nov 6, 2022

There is a finite field (or Galois field GF(p)) of size p for any prime p. This can be exhibited by integers mod p.

There are also finite (Galois) fields GF(p^n) of size p^n (positive integer powers of p)

These can be exhibited by polynomials with coefficients in the GF(p) field with up to n terms.

Eg for p = 2 and n = 3

0

1

x

1 + x

1 + x + x^2

1 + x^2

x + x^2

x^2

kccqzy · on Nov 6, 2022

The article is trying to get to the concept of a group inverse without using too much jargon. If you don't mind jargon, as usual you'll want to go to Wikipedia to learn more.

https://en.wikipedia.org/wiki/Multiplicative_group_of_intege...

wging · on Nov 6, 2022

I'd suggest https://en.wikipedia.org/wiki/Finite_field - they'd probably get to the page you linked in trying to understand finite fields, but finite fields (AKA Galois fields) are exactly what the post is about.

chriswarbo · on Nov 6, 2022

It's mentioned later, but the "math friendly" versions satisfy the axioms of a group and a field, which only works if the modulus is a prime.

In particular, all non-zero values have a unique inverse:

> As 251 is prime, this reduction rule is math-friendly. By that, I mean that for any x other than zero, there is some y such that x * y == 1. For example, taking x of 16, we have 16 * 204 == 1. This property is not true for the 256 == 0 reduction rule; with that rule, there is no y such that 16 * y == 1. Where it exists, this y can be called inv(x) or 1/x.

hither_shores · on Nov 6, 2022

> It's mentioned later, but the "math friendly" versions satisfy the axioms of a group and a field, which only works if the modulus is a prime.

The integers mod n >= 1 are always a group (under addition), only the field structure requires prime n.

voldacar · on Nov 7, 2022

It doesn't have to be a prime, it can also be a prime power

AlotOfReading · on Nov 6, 2022

All modern computers are implemented with binary digital logic. If you build the hardware the obvious, computing mod 256 is just masking the lower 8 bits or nothing at all if it's an 8 bit register. Computing mod arbitrary N is equivalent to doing integer division, which is an inherently complex operation that's typically anywhere from 10-100x slower. It also scales much worse for very large operands.

vbezhenar · on Nov 6, 2022

There were computers with triary digital logic. Can we make a computers which would store 251 values per bit (using variable voltage, for example, or something like that)?

AlotOfReading · on Nov 6, 2022

There's no inherent reason you couldn't, but it'd almost certainly be faster, cheaper, and vastly more reliable to do it with binary logic.

In actual practice what we typically do is select primes that are "near" a power of two, called the Mersenne primes (which have the form 2^N - 1) and Pseudo-Mersenne primes (2^N - a). These have special properties that let us work with them more efficiently. Modern algorithms for these kinds of special cases are branchless, divisionless, and constant time, even if they're not quite as fast as operations in the binary fields.

nullc · on Nov 6, 2022

The by-a-constant trick using smaller lookup tables at the end can be done with pretty much arbitrary table sizes-- it's particularly helpful to fine tune the table size if you're building the table on the fly (e.g. because you perform many multiplications each with a dynamic value).

ngcc_hk · on Nov 6, 2022

My intro take : for base 4 you have a set of number {1 2 3 4}. Imagine you start with 2. Can you use multiplication to get to 1. No. Because it goes to 2 then 4 then 2 …

However if you do base 5 which is a prime. Then {1 2 3 4 5} meant start with any number you can reach the others by multiplication.

(if you use 0…3 and 0…4 you have to mention except 0).

We use 2*N as basis in computer which is non-prime though. The article tried to explain how to use polynomial and certain intel/arm feature to handle this …

tmaly · on Nov 6, 2022

A long time ago in my university days we implemented blowfish encryption on an FPGA using Galois fields.

tenebrisalietum · on Nov 6, 2022

Converting arithmetic to things based on polynomials - when I was reading about the Atari 2600's TIA I recall some things about how polynomial counters were implemented internally to avoid having to deal with decimal->binary conversion in the chip (this is an 8-bit ASIC clocked at NTSC rate that generates a TV signal and must be manipulated in near real time by the CPU to generate graphics). Wondering how this could be related/applicable.

DerekL · on Nov 7, 2022

I don't see how decimal-to-binary conversion would have anything to do with the TIA chip. I suspect that it used a polynomial counter instead of a binary integer counter because it used fewer gates and had less delay.

By the way, the CPU, like many others back then, could do addition and subtraction with binary-coded decimals.

tenebrisalietum · on Nov 7, 2022

To position objects horizontally ideally you would have a register where you load a value between 0 and 160, but instead you have to strobe a register when the CRT beam is near the horizontal position you want, which will be +/- some pixels due to the CPU being 1/3 the clock of the TIA and 6502 instructions consuming 2 to 8 cycles. So then you have to "fine tune" using an additional register to shift it left/right -7 to 8 pixels.

k__ · on Nov 6, 2022

Do such hardware enhancements help with ECC performance or does ECC on other fields?

nullc · on Nov 6, 2022

Do you mean Error Correcting Codes or Elliptic Curve Cryptography?

In the former, yes, in the latter mostly* no-- as characteristic-2 field elliptic curve crypto isn't common these days due to a weaker and more complicated security story.

*Mostly but not entirely, e.g. https://eprint.iacr.org/2022/1325

k__ · on Nov 6, 2022

I meant the latter, thanks.

throwaway81523 · on Nov 7, 2022

Elliptic curve cryptography is usually done with prime fields since there have been too many breaks against systems using binary fields, the ones that these cpu instructions operate on. Binary field arithmetic, as people have mentioned, are used widely for error correcting codes, and to a lesser extent in cryptography for modes like AES-GCM and inside of AES itself.