Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Yeah, it can split weights. Whatever fraction of the weights that don't fit into vram will be computed on the CPU (with reasonable speed).

Additionally, prompt processing will work with large models even with low vram GPUs.



view as:

Legal | privacy