If you have a 6GB GPU, it can hold all the weights. My lowly laptop 2060 can can spit out a 200 token response with full context almost immediately.
If you don't have a dGPU, short prompts are OK with fast RAM, but long prompts will be slow.
reply
If you have a 6GB GPU, it can hold all the weights. My lowly laptop 2060 can can spit out a 200 token response with full context almost immediately.
If you don't have a dGPU, short prompts are OK with fast RAM, but long prompts will be slow.
reply