GGUF is just a file format. The ability to offload some layers to CPU is not specific to it nor to llama.cpp in general - indeed, it was available before llama.cpp was even a thing.
I believe ggml is the basis of llama.cpp (the OP says it's "used by llama.cpp")? I don't know much about either, but when I read the llama.cpp code to see how it was created so quickly, I got the sense that the original project was ggml, given the amount of pasted code I saw. It seemed like quite an impressive library.
reply