oscar: Switch llama-cpp out for Vulkan extensions

This results in a substantial speedup. Before:

    [ Prompt: 2.9 t/s | Generation: 2.5 t/s ]

After (I haven't figured out what the story is with variable speeds,
these are three successive messages of increasing length in the same
conversation):

    [ Prompt: 95.7 t/s | Generation: 11.7 t/s ]
    [ Prompt: 2866.0 t/s | Generation: 13.4 t/s ]
    [ Prompt: 133.1 t/s | Generation: 14.0 t/s ]
    [ Prompt: 188.3 t/s | Generation: 13.6 t/s ]

(benchmarks on Framework 13 AMD 7640U)
This commit is contained in:
Chandler Swift 2025-12-25 18:16:31 -06:00
parent 36df179501
commit 0ae0946f7a
Signed by: chandlerswift
GPG key ID: A851D929D52FB93F

View file

@ -161,7 +161,7 @@
wl-clipboard wl-clipboard
# ✨ AI ✨ # ✨ AI ✨
llama-cpp llama-cpp-vulkan
# compilers/language utils # compilers/language utils
cargo cargo