Hacker News new | past | comments | ask | show | jobs | submit login

FYI, if you're on Ubuntu 24.04, it's easy to build llama.cpp with AMD ROCm GPU acceleration. Debian enabled support for a wider variety of hardware than is available in the official AMD packages, so this should work for nearly all discrete AMD GPUs from Vega onward (with the exception of MI300, because Ubuntu 24.04 shipped with ROCm 5):

    sudo apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential
    # add yourself to the video and render groups
    sudo usermod -aG video,render $USER
    # reboot to apply the group changes

    # download a model
    wget --continue -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf \
        https://huggingface.co/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/resolve/main/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true

    # build llama.cpp
    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    git checkout b3267
    HIPCXX=clang++-17 cmake -S. -Bbuild \
        -DGGML_HIPBLAS=ON \
        -DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" \
        -DCMAKE_BUILD_TYPE=Release
    make -j8 -C build

    # run llama.cpp
    build/bin/llama-cli -ngl 32 --color -c 2048 \
        --temp 0.7 --repeat_penalty 1.1 -n -1 \
        -m ../dolphin-2.2.1-mistral-7b.Q5_K_M.gguf \
        --prompt "Once upon a time"
I think this will also work on Rembrandt, Renoir, and Cezanne integrated GPUs with Linux 6.10 or newer, so you might be able to install the HWE kernel to get it working on that hardware.

With that said, users with CDNA 2 or RDNA 3 GPUs should probably use the official AMD ROCm packages instead of the built-in Ubuntu packages, as there are performance improvements for those architectures in newer versions of rocBLAS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: