Hacker Read

woadwarrior01 | karma 1565 | avg karma 2.96 · 2024-03-27 19:42:45

You can run 4-bit quantized Mixtral 8x7B with unquantized MoE gates, if you've got at least 32GB of RAM. The model itself takes up about 24GB of RAM.

yawnxyz | karma 3565 | avg karma 3.69 · 2024-04-05 20:23:19

I have a 32GB M3 and... Mixtral 8x7B likes to completely crash the machine haha

woadwarrior01 | karma 1565 | avg karma 2.96 · 2024-04-08 13:39:57

What software are you using for inference? I hate plugging my own app[1] here, but I know many people on my app's discord that are running 4-bit OmniQuant quantized Mixtral 8x7B on it on >= 32GB M1, M2 and M3 Macs. I run it all the time 64GB M2 Mac Studio and it takes up just under 24GB of RAM.

Also runs Yi-34B-Chat, which takes up ~18.15GB of RAM.

[1]: https://privatellm.app

reply