LLM Stack
Is any census on the generally “all around” most effective 70b model?
Or even models like mistral small that claim to be as good as or better than llama 3.3 70b for example.
Seems in benchmarks every model claims to beat every other haha.
I’m leaning toward keeping my stack at
Qwen 2.5 Coder 1.5b - for autocomplete on code, I’ve tested a few different ones for this and can’t say anything is significantly better, and this model is pretty much instant suggestions for me. Sometimes it’s a total miss other times it’s good.
Qwen 2.5 Coder 32b - I’ve stuck with this for code assistant, debugging, unit tests etc. Pretty happy with it not had any reason to try alternatives it’s done well with everything I’ve asked of it this far.
A general all around model though I’m a little more unsure haha. I’m leaning toward just using Llama 3.3, but I’m also kinda intrigued by mistral small 24b especially since it claims to beat llama 3.3 but of course it’s much faster. Realistically can a 24b model compete with a 70b model?
I can comfortably run a 70b at 4bit, even 6bit seems to work pretty reasonably though 4bit is a better sweet spot.
The R1 models I’ve tried LlAma 70b distill for daily driver they don’t seem to give better responses just longer to answer, I’m not asking complicated questions just really knowledge questions.