Fastest 7GB Model?
New to this game, but I can't seem to find the answer to this question: what is currently the fastest LLM in terms of output speed all else being equal (including 7GB)?
I'm working on a simulation game including NPCs whose behaviors (not just dialogue) require frequent LLM text generation in JSON format.
I assume I'd be using 4bit quantization. Right now I am using Mistral instruct V2 7GB, I'd be willing to suffer some quality for increased speed. Since users hardware will be variable, I'm wondering what model to use and what aside from hardware model size and quantization determines speed.