Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Yes, a GGUF-converted version works fine with llama.cpp for me


sort by: page size:

It works with current version of GGUF models for llama.cpp - you can find them on huggingface, or you can convert them manually.

Only a few download links are baked-in at the moment but whatever *.gguf file you put in your Downloads folder should appear in the dropdown.


Does anyone know if this works with llama.cpp?

llama.cpp has preliminary support already. https://github.com/ggerganov/llama.cpp/issues/1063#issuecomm...

Also, doesn't llama.cpp use ggml.cpp ?

Or at least has parts of that project copy pasted into its tree?



GGUF is just a file format. The ability to offload some layers to CPU is not specific to it nor to llama.cpp in general - indeed, it was available before llama.cpp was even a thing.

Can this be run with llama.cpp?

I don't know the answer to your question, but did you know you can download the standalone llamafile-server executable and use it with any gguf model?

Anyone wanna convert this to GGML so we can run it with LLaMa.cpp?

Thanks, maybe it's as easy as downloading the ggml and running it with Llama.cpp. I'll try that, thanks!

Same question, so far I've found this thread https://github.com/ggerganov/llama.cpp/issues/1602 where people work on it.

It just uses llama.cpp as a "backend", so anywhere where llama.cpp works this should work too if I see this correctly.

So... this won't work with llama.cpp right? I need a different runtime for it? I'm new to this LLM stuff.

I believe ggml is the basis of llama.cpp (the OP says it's "used by llama.cpp")? I don't know much about either, but when I read the llama.cpp code to see how it was created so quickly, I got the sense that the original project was ggml, given the amount of pasted code I saw. It seemed like quite an impressive library.

No, llama.cpp only works with llama-based models, like base llama, alpaca, vicuna, ...

It is based on Mistral which llama.cpp supports, so I assume it does run (you might need to convert to GGUF format and quantize it).

Absolutely, I'll add llama.cpp support soon.

I assume this doesn’t yet run on llama.cpp?

It's possible to run llama.cpp on windows, e.g. see this tutorial:

https://www.youtube.com/watch?v=coIj2CU5LMU

Would this version (ggerganov) work with one of those methods?

next

Legal | privacy