What software are you using for inference? I hate plugging my own app[1] here, but I know many people on my app's discord that are running 4-bit OmniQuant quantized Mixtral 8x7B on it on >= 32GB M1, M2 and M3 Macs. I run it all the time 64GB M2 Mac Studio and it takes up just under 24GB of RAM.
Also runs Yi-34B-Chat, which takes up ~18.15GB of RAM.
reply