I have it running on my Mac using ollama now; Does it say anywhere what quantization scheme is being used? ollama seems a bit opaque here.
When it downloaded the model it only downloaded about 4GB. Which, for a 7.3B parameter model implies that it's 4-bit quantized. But I don't see that listed anywhere (or an option to use, say, Q8 instead)
If this is the case I'm pretty impressed with a quick tinker, it feels pretty coherent for a 7B @ Q4.
When it downloaded the model it only downloaded about 4GB. Which, for a 7.3B parameter model implies that it's 4-bit quantized. But I don't see that listed anywhere (or an option to use, say, Q8 instead)
If this is the case I'm pretty impressed with a quick tinker, it feels pretty coherent for a 7B @ Q4.
reply