That's freaking awesome. For those unaware, the RP2040 microcontroller has sever...

TillE · on Feb 16, 2022

I love the PIOs, it's such a great idea. I've only used one to emulate a 4021 shift register being clocked at 100 kHz, which is light work compared to what they're capable of, but still something an Arduino struggles to keep up with.

nomel · on Feb 16, 2022

I used them (PRU in Beaglebone) to implement e-fuse burning with JTAG (the chips stops working if you take a little too long) and MDIO (the interface for controlling a network PHY).

My interesting battle story around this is that I first implemented the MDIO as bit banging in the kernel. This used quite a bit of CPU, which I wanted to use for other things. I switched it over to the PRU and CPU usage dropped to 1%. Great! But the data rate was much slower, with huge latency spikes. It turned out that the CPU usage was so low that the CPU governor thought the system was idle, so was throttling the CPU to just a few hundred MHz. I had to change the governor to keep the CPU clock to 100%, then everything was about 10x faster.

fps-hero · on Feb 16, 2022

It is very humbling to know I’m not the only person to have had to implement bit banged MDIO recently.

If your PHY’s data sheet says clause 45 register access only and you don’t have GPIO access as the bus master, run away.

Unfortunately I was tasked with emulating a clause 45 PHY on a clause 22 bus. Without creative use of the microcontroller peripherals, it is next to impossible to achieve implemented a 2.5Mhz slave.

I had to use an edge interrupt to jump in to a while loop state machine, whose state was determined by a timer counter driven by the MDIO bus, with all GPIO acessd via hard coded bit banded GPIO.

Fun times, but this is definitely an FPGA job.

Teknoman117 · on Feb 16, 2022

The PRUs in the beaglebone are way more flexible to be honest. Most of my effort with the Pico PIOs is squeezing everything to fit inside the 32 instruction limit.

synack · on Feb 16, 2022

If you're short on PIO memory, the OUT EXEC instruction will execute an instruction directly from the FIFO. Feed the FIFO with a DMA channel and it'll keep up with the system clock. From the RP2040 datasheet section 3.4.5.2:

> OUT EXEC allows instructions to be included inline in the FIFO datastream. The OUT itself executes on one cycle, and the instruction from the OSR is executed on the next cycle. There are no restrictions on the types of instructions which can be executed by this mechanism. Delay cycles on the initial OUT are ignored, but the executee may insert delay cycles as normal.

dexterhaslem · on Feb 16, 2022

is this open source? im using an am355x and would be interested in looking if so

nomel · on Feb 16, 2022

No, sorry. I also don't have documentation that I can give. When I left the company, the Beaglebone community was pretty well into a C compiler, which is probably complete by now. There was also some coprocessor library being developed, to provide a nice interface to the host. It was all asm and kernel modules back then. I'm sure things are much easier now.

akavel · on Feb 16, 2022

Noob question: can it be said that the microcontroller contains a very simple FPGA? Or are there some crucial differences between the two technologies?

a2800276 · on Feb 16, 2022

The pio subsystem is like a special purpose coprocessor for IO. You write small assembly programs defining the behavior.

You could compare this to a GPU (special purpose coprocessor for vector calculation) or a very complicated IO peripheral you are configuring.

An FPGA allows you to build up actual hardware gates and wire them together. For pio you configure existing hardware.

There are some similarities in the domain you're using them, though. FPGA are often used for timing critical IO as well.

bayindirh · on Feb 16, 2022

Oh, like BeagleBone Black's realtime co-processors. Neat.

fps-hero · on Feb 16, 2022

Much more limited capability wise, but much tighter guarantees about latency. More like a 80Mhz AVR 8bit than an ARM coprocessor.

The real shame of modern microcontrollers is the decoupling of peripherals and GPIO from the main processor. It prevents these sorts of hacks as all access has to be through the memory bus, effectively capping bitbanging at sub 10MHz speeds.

guenthert · on Feb 16, 2022

> but much tighter guarantees about latency

Could you clarify that? It was my understanding that the PRUs of the TI AM355x Sitara specify execution time true to the cycle.

fps-hero · on Feb 18, 2022

By that I mean how IO is coupled to the processor. In AVR8 (and other older processors I presume) GPIO access was via a CPU register, thus a single instruction (one clock RISC) can directly modify the GPIO state.

Every GPIO implementation I’ve seen on modern processors accesses it via a memory mapped peripheral. The difference being accessing a memory bus is not a single cycle operation. You have to wait for the bus to be free, then wait to fetch or write the data.

The most extreme analogous example of this is modern cpus are effectively infinitely fast but bounded by cache misses that necessitate memory access.

This is fundamentally why every toggle a GPIO pin benchmark is flawed. What is really being measured is memory bus latency.

This misunderstanding is why people have trouble reconciling why a multi ghz processor cannot also bitbang GPIO at ghz speeds, although if such a processor existed it would be amazing.

rasz · on Feb 19, 2022

Throughput isnt even the main problem, its the latency that kills you in modern GPIOs.

rusk · on Feb 16, 2022

Would I be right in saying this is something similar to BPF?

trasz · on Feb 16, 2022

More like Amiga’s Copper.

msla · on Feb 16, 2022

How does this compare with, say, channel I/O from mainframe designs?

https://en.wikipedia.org/wiki/Channel_I/O

pxx · on Feb 16, 2022

I think we've just reinvented the concept with a harder to search name. "Programmable I/O (PIO)" is sure similar to "programmed I/O (PIO)"