FYI, if you're on Ubuntu 24.04, it's easy to build llama.cpp with AMD ROCm GPU acceleration. Debian enabled support for a wider variety of hardware than is available in the official AMD packages, so this should work for nearly all discrete AMD GPUs from Vega onward (with the exception of MI300, because Ubuntu 24.04 shipped with ROCm 5):
sudo apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential
# add yourself to the video and render groups
sudo usermod -aG video,render $USER
# reboot to apply the group changes
# download a model
wget --continue -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf \
https://huggingface.co/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/resolve/main/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true
# build llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
git checkout b3267
HIPCXX=clang++-17 cmake -S. -Bbuild \
-DGGML_HIPBLAS=ON \
-DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" \
-DCMAKE_BUILD_TYPE=Release
make -j8 -C build
# run llama.cpp
build/bin/llama-cli -ngl 32 --color -c 2048 \
--temp 0.7 --repeat_penalty 1.1 -n -1 \
-m ../dolphin-2.2.1-mistral-7b.Q5_K_M.gguf \
--prompt "Once upon a time"
I think this will also work on Rembrandt, Renoir, and Cezanne integrated GPUs with Linux 6.10 or newer, so you might be able to install the HWE kernel to get it working on that hardware.
With that said, users with CDNA 2 or RDNA 3 GPUs should probably use the official AMD ROCm packages instead of the built-in Ubuntu packages, as there are performance improvements for those architectures in newer versions of rocBLAS.
With that said, users with CDNA 2 or RDNA 3 GPUs should probably use the official AMD ROCm packages instead of the built-in Ubuntu packages, as there are performance improvements for those architectures in newer versions of rocBLAS.