This results in a substantial speedup. Before:
[ Prompt: 2.9 t/s | Generation: 2.5 t/s ]
After (I haven't figured out what the story is with variable speeds,
these are three successive messages of increasing length in the same
conversation):
[ Prompt: 95.7 t/s | Generation: 11.7 t/s ]
[ Prompt: 2866.0 t/s | Generation: 13.4 t/s ]
[ Prompt: 133.1 t/s | Generation: 14.0 t/s ]
[ Prompt: 188.3 t/s | Generation: 13.6 t/s ]
(benchmarks on Framework 13 AMD 7640U)
|
||
|---|---|---|
| .. | ||
| configuration.nix | ||
| hardware-configuration.nix | ||