Do you have any documentation, or even random notes, on how you set that up?
I have an optimus laptop and couldn't make the proprietary drivers work ( tried both rpmfusion and negativo17). I m happily using nouveau for now but at some point I ll need to use CUDA again.
Yes you need to install CUDA and MSVC for GPU. But here's some good news! We just rolled our own GEMM functions so llamafile doesn't have to depend on cuBLAS anymore. That means llamafile 0.4 (which I'm shipping today) will have GPU on Windows that works out of the box, since not depending on cuBLAS anymore means that I'm able to compile a distributable DLL that only depends on KERNEL32.DLL. Oh it'll also have Mixtral support :) https://github.com/Mozilla-Ocho/llamafile/pull/82
I wish they would spend five minutes documenting how to use the GPU on Ubuntu. My 1080ti is just sitting idle while my CPU is busy folding. Any instructions I came across said something like “make sure you have the libraries” but then failed to describe even at a high level how to locate and install those libraries. Last time I installed any CUDA libraries it involved adding an Nvidia repo or something.
Edit: I’d be glad to be proven wrong with a link to an FAQ or some part of the docs.
I have an nVidia card and I never managed to get any AI stuff working on my base system. The only thing that's worked off the box is Docker images with CUDA support.
Hey, I know the feeling, I felt bad when I had my GPU just sitting there and it's just a little Vast server lol. If you want to use your hardware to run this software I'd be more than happy to help get it setup!
An important prerequisite was left out: Meshroom requires an Nvidia card for CUDA. Sadly, I swapped out my Nvidia card for an AMD one so I could switch to Sway/Wayland. I've contemplated putting both cards in, AMD for the desktop and the Nvidia for CUDA, but I haven't got around to it. I need my desktop intact for WFH duties.
I got it after a few hours. You need to install drivers/CUDA yourself, but all very straightforward. Unfortunately due to having 20GB of VRAM, I'm limited to mixtral:8x7b-instruct-v0.1-q2_K but it runs fine, generating at about 40 tokens/s (65 tok/s for eval). As per official specs, it's running maxed out at 70W (being an SFF card).
(I've now tried running the Q4 mixtral which is 26GB. 18GB is on GPU, 8GB through CPU. Gets about 11 tok/s.)
reply