Porting the Slint UI Toolkit to a Microcontroller with 264K RAM

dmitrygr · on April 9, 2023

Forget running "a simple ui toolkit". I have ENTIRE PalmOS 5.2 running on the pico with that a screen. At 47FPS !

Feel free to hit me up for pointers.

Video proof: https://photos.app.goo.gl/zZiLnDNLQs2omBXz5

mkj · on April 9, 2023

Do you know what kind of ui drawing architecture PalmOS uses? Seems like it's probably more aligned with that type of device.

dmitrygr · on April 9, 2023

Software blitter. Rectangle based. To a frame buffer. It knows NOTHING about rp2040. It just writes pixels to a memory area. The rest I do with PIO and DMA. I am publishing an article in a few days on how to drive this display properly with RP2040 here: http://dmitry.gr/?r=06.%20Thoughts&proj=09.ComplexPioMachine...

detrites · on April 8, 2023

If you get to the inlined video and think it looks a little sluggish, keep reading - they later implemented DMA to speed it up. Here's the link to this video showing it: https://youtube.com/watch?v=dkBwNocItGs

KRAKRISMOTT · on April 8, 2023

It's slow even with DMA. The original iPhone was snappier than this.

wmf · on April 8, 2023

The original iPhone was far more powerful than this hardware.

mmoskal · on April 9, 2023

Typically these SPI screens only work up to 50MHz (which is already way out of spec, which is often 15MHz), there is 5M bits in 640x480x16 so you won't get more than 10fps no matter what you do rendering side.

Which is why they often also offer 8 bit serial interface. I guess one can use it with the PIO on rp2040.

Edit: whoops it's only 320x240. Not sure how fast are they are running it.

abujazar · on April 8, 2023

The original Mac was also snappier.

skavi · on April 9, 2023

The original Mac was BW and didn’t do the same level of animation.

Of course a snappy interface is always more important than visual flair, but I imagine this is more a tech demo than anything else.

pjmlp · on April 9, 2023

The original Mac was mostly Assembly for the graphics code.

cstrahan · on April 10, 2023

For an example of a faster demo, see my Slint backend for the Teensy MicroMod:

https://twitter.com/charlesstrahan/status/163002622435647488...

Now, admittedly, this is 600 MHz beast (the NXP i.MX RT1062). But I’d like to think it’s still impressive for the hardware involved.

As mentioned in the Twitter thread, to eke out the performance I had to implement an 8-bit parallel bus using FlexIO along with DMA (all written in Rust) — but with that implemented, the display is pretty zippy.

cryo · on April 8, 2023

The Pico runs at 133 MHz. I don't know how the slint-ui works under the hood but the shown demo (with DMA speedup) could be much more snappy imho. For that the code needs to be aligned to what the SPI protocol of the display offers instead of treating it as a general purpose frame buffer. For example while scrolling not sending the whole area but only the part which becomes visible.

devbent · on April 8, 2023

The Microsoft Band managed a fancier UI with only 96mhz of CPU, it did have an FPU though which speeds things up quite a bit.

Getting good throughput requires DMA, and Band also paid the price and dedicated tons of local SRAM for the frame buffer. A local framebuffer that can be read from is needed for antialiased fonts and alpha blending, not to mention things like screen fades!

Band also had real true type fonts, and ran at 30FPS[1] with vsync!

I did a write-up of how at https://meanderingthoughts.hashnode.dev/cooperative-multitas...

[1] the 30fps cap was due to bandwidth to the display controller, uncapped the UI could run internally at around 100fps if it wasn't doing loads of text rendering.

solarkraft · on April 8, 2023

Little side note: Props for the regard for UI quality.

> At launch, the Microsoft Band dropped fewer frames than an Apple Watch

> if any module took more than 2ms before it returned, a crash dump was created and the code was investigated and optimized.

If only Microsoft cared even half as much about the UI in its other products. Visual Studio still hangs up the main thread for many seconds.

(side side note: touch input/output latency is a field Microsoft is actually remarkably good at, I'm guessing because they did some research on how much it matters decades ago. if only they had done some research about how glaring UX defects impair the over all usage of a product ...)

jgable · on April 9, 2023

I’m excited about Slint. It’s great to have a solid Qt alternative for embedded UIs. I doubt I will need to use it on a MCU rather than an embedded Linux processor, but even so, it’s also nice to know that it’s lightweight.

sitzkrieg · on April 8, 2023

the "if we can port it to the constrained pico we can make it run on any mcu" bit made me wat. the pico is a pretty large mcu really.

i get displays are not the realm of single digit cents 4 bitters or something. but i can think of other mcus more constrained w probably still enough flash and fast enough SPI bus and dma that would be a much better showing.

i guess i feel the pico is overkill, but at any rate this is mostly unfair because its not the toolkits target market in the first place

MrBuddyCasino · on April 8, 2023

Its weird they don't mention RAM and code sizes. If they require just two lines (240 x 2 Bytes x 2) thats less than 1K for the pixels, but the code size can't be that small if they compile in fonts etc.

I feel people that don't usually work mit MCUs think a Pico2040 or ESP32 is a crazy constrained environment, when those are really rather luxurious. I'm not sure how much it matters - maybe its like complaining about Elektron while Slack (deservedly) succeeds.

karmicthreat · on April 8, 2023

Does this offer me much over LVGL?

tronical · on April 8, 2023

A typed and tooled (lsp, live-preview, etc) DSL for the UI and Rust/C++20 APIs, compared to C APIs.

karmicthreat · on April 9, 2023

Looks interesting, I like the online demos. I would look at using it but it doesn't look like there is a port for esp32. The lack of explicit instructions/info about getting this running with esp-idf puts a bit of a damper on it. Rust is nice, but I'm not sure how to get that working with an existing C, freeRTOS project. So while the technical parts of this seem really nice, the developer experience for someone outside of slint seems a little hard.

gregulrajani · on April 9, 2023

Great to see the band Slint https://en.m.wikipedia.org/wiki/Slint branching out into micro-ui toolkits !

avx56 · on April 8, 2023

(2022)

ilyt · on April 8, 2023

It's funny to compare "modern" with microcomputers of the old. Original Amiga 500 ran with 512kB RAM, with OS requiring only 256kB (as in "you could run something other than just OS on that").

Meanwhile we have this laggy mess...

gijou6 · on April 8, 2023

The Amiga had a dedicated video chip (and it output vga signals which are fairly cheap).

This is a slow SPI bus with the cpu needing to push W x H x BPP pixels and with a 320x240 16bpp that comes out to 9 million bytes/sec for 60fps or 4.5 million for 30 fps. Cortex M0 I believe has 4 cycles for load and store, so even if you had a perfect parallel 16 bit bus where you could do 1 load + 1 store to send a pixel, that comes out to a best case ~80 fps @ 100MHz with 100% cpu utilization (i.e you could do nothing else on that cpu, not even serve interrupts). Another core wont help much because it shares the memory bus, and fill rate is the bottleneck here.

There's a good reason why we have dedicated chips for pushing framebuffer -> lcd physical pixels even back in the 80s.

linster · on April 8, 2023

Yep. The really neat part of their port is their weird DMA/scan program architecture. It works for them, and is a decent way to work around what (I would assume) is a toolkit that needs a frame buffer instead of computing the line on the fly from a draw list.

duskwuff · on April 9, 2023

> with the cpu needing to push W x H x BPP pixels

Only if it needs to update the entire screen for every frame, which it probably doesn't.

> Cortex M0 I believe has 4 cycles for load and store, so even if you had a perfect parallel 16 bit bus where you could do 1 load + 1 store to send a pixel

DMA can shovel data from RAM straight into the SPI peripheral with no CPU involvement beyond the initial setup (which is simple).

ilyt · on April 10, 2023

But it doesn't need to push every frame every second, the whole screen not always changes

> Another core wont help much because it shares the memory bus, and fill rate is the bottleneck here.

Depends on CPU, some (RP2040 for example) have segmented memory which means you can just have one core working on the graphics, and dedicate segment to DMA

cstrahan · on April 10, 2023

Slint already supports rendering just the dirty regions of the screen. SPI is a major performance handicap though — they could use PIO on the RP2040 to implement an 8 or 16 bit parallel bus (and then they’d need a display with support for parallel IO), and that would help a ton.

Here’s Slint running on my own backend for the Teensy MicroMod (has an NXP i.MX RT1062 processor) using DMA and an 8-bit parallel bus using FlexIO:

https://twitter.com/charlesstrahan/status/163002622435647488...

I’d say that’s fairly smooth.