I understand the logic behind the Raspberry Pi Foundation’s decision to continue releasing a single Raspberrian version in 32-bit mode for compatibility reasons, but... on the other hand, it does severely handicap their newer hardware.
One of the first comments there also say about the RPi 3:
> The GPU is a 32 bit processor. I haven't checked, but I'm expecting that there's a heck of a lot more work to do to get Khronos or other multimedia extension stuff up and running against a 64bit kernel than just getting userland to build.
Don’t know if that’s the case for the RPi 4 as well.
And another comment too:
> I'm sure there are certain applications where having a 64-bit kernel (let alone userland) may be beneficial, but I suspect the hoped-for performance improvements didn't materialise, otherwise people would be waving benchmark results at us demanding an RPi-supported aarch64 kernel.
I’m missing this too, and am planning on eventually doing some benchmarking of my own to see if there is any advantage of running the aarch64 kernel for my applications.
The GPU has nothing to do with what the OS on the CPU is doing - it's an entirely separate architecture anyway (VideoCore 6 on RPi4).
The reason their /opt/vc stuff doesn't support 64bit is because there are proprietary binaries directly from Broadcom. Also, they use interfaces provided only by the downstream RPi kernel.
There is a blog post from the RPi foundation about support for VC6 in Mesa.
The raspberry pi 4 apparently isn't supported. I'm not exactly clear on why, and how this links to the above issue, and if any of the things they suggest are problems in their bug tracker are also not solved in Fedora, or are simply raspberry pi people being unwilling to fix in their setup.
The fedora/etc problem has to do with the lack of uboot/edk2 and upstream kernel support. Once support lands in the appropriate repos you can expect fedora will enable it.
There's a full 64-bit Gentoo image available. It seems to work fine for all my use-cases. I'm not sure which parts of your comments really apply at this point in time.
Hardware accelerated OpenGL. As far as I have been able to tell, that will still be missing if you run an aarch64 kernel.
And I think using the GPU to get hardware accelerated video decoding and encoding will not be available either.
Edit: But if I understand https://github.com/popcornmix/omxplayer/issues/714 correctly then you could do hardware accelerated decoding of HEVC on the CPU. I don’t know how the performance of that compares to the kind of video decoding that the GPU can do. That’s one of the things I’d like to see someone benchmark, or benchmark myself.
That was the case but the VC4 instruction set is open/documented and there is a Mesa driver worked on to REPLACE the closed binary driver which is only available for 32-bit.
When you use the Mesa VC4 driver you can also compile the whole user space for 64 bit including OpenGL ES support in a even newer version then the closed source one. (Supporting user kernels etc.)
Some side notes: Mesa is quite hungry in performance and memory to compile shaders (to VC4 instructions), way more then the closed driver requires, thats why older versions of the Pi with less powerfull ARM cores couldn't really use this approach.
Source: I used to write custom user shaders in VC4 assembly to run on all Pis because the closed binary didn't offer OpenCL.
You now have a 64-bit kernel with 32-bit userland.
It is unclear to me what kind of code gcc now generates with -march=native. If somebody could clear that up, would be very much appreciated ( ie, does it use 31 GPRs ).
If it's a 32bit userland, then you don't have access to the other GPRs. The GPR field in AArch32 instructions is a max of 4 bits, so you can only encode r0-r15.
Does it? What does the 64-bit mode provide that the 32-bit one doesn't? On x86 things were different because the 64-bit ABI also provided access to more registers and other benefits, but as far as I know, the 32-bit mode has the same features as 64-bit on ARM.
This reminds me of when the UltraSPARC CPU's came out. I was working at Sun at the time, and I remember asking why Solaris 2.6 was released without 64-bit support. The reason was the same, there was really no benefit to it (except for support for more than 4 GB of RAM in a single process, which wasn't really needed back then).
Even as later versions of Solaris came out with native 64-bit support, the entire userspace was still 32-bit because it worked on both architectures, and the binaries were both smaller and also faster.
> On x86 things were different because the 64-bit ABI also provided access to more registers and other benefits, but as far as I know, the 32-bit mode has the same features as 64-bit on ARM.
It doesn't. 64-bit arm has 31 GPRs. It also has much cleaner decode and some low-end cpus that can do it and 32-bit execute it faster than they do 32-bit code.
Btw, for those who are using RPi3 and want to try aarch64 kernel -archlinux|ARM has one and I've used it in my experiments to getting smoother desktop experience on Raspberry Pi[1].
Slightly off-topic: is there anything that is as or more powerful than RPi4 but has more/expandable memory (say 16G or more) and still works with mainline kernel/distributions? In other words, something that I could setup as a build machine for CI?
If you're doing it just for the novelty or really want to do it on site natively at a reasonable cost check out the Jetson TX2 dev kit. The TX2 has been mainlined for a while now and this kit offers 8 GB RAM with a fast CPU and the ability to connect fast SATA/NVMe storage. While most distros would run on it you would probably need to make your own install package (or just run container images for builds if you don't NEED to have an exact kernel)
They're using 5.3.0 which isn't supported by the RPi Foundation yet. The 4.19 line had an issue with RAM too but it's been fixed now so if you build from git it's fine.