Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

You can run 4-bit quantized Mixtral 8x7B with unquantized MoE gates, if you've got at least 32GB of RAM. The model itself takes up about 24GB of RAM.


view as:

I have a 32GB M3 and... Mixtral 8x7B likes to completely crash the machine haha

What software are you using for inference? I hate plugging my own app[1] here, but I know many people on my app's discord that are running 4-bit OmniQuant quantized Mixtral 8x7B on it on >= 32GB M1, M2 and M3 Macs. I run it all the time 64GB M2 Mac Studio and it takes up just under 24GB of RAM.

Also runs Yi-34B-Chat, which takes up ~18.15GB of RAM.

[1]: https://privatellm.app


Legal | privacy