Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

You can do your own fine tuning on existing models

> How do I integrate changes to this model that others have made

Typically with a LoRA



sort by: page size:

Is there a big difference in the result from fine tuning a model compared to using a LoRA? I thought the idea with LoRAs was that updating all model weights are unnecessary?

At the very beginning of my journey I did some fine tuning with Lora on a (I believe) Falcon model, but I haven't looked at it since. My impression was that injecting knowledge via fine tuning doesn't work, but tweaking behavior does. So your answer makes much sense to me. Thanks for bringing that up! I will definitively try that out.

Look at QLoRA. The QLoRA can be attached to all layers, allowing you to alter behavior with much less data than the original LoRA implementation. It seems to "stick" better.

I just fine tuned a ~30b parameter model on my 2x 3090s to check it out. It worked fantastically. I should be able to fine tune up-to 65b parameter models locally but wanted to get my dataset right on a smaller model before trying.


What about by fine-tuning using LoRA? That would introduce new layers and re-arrange the data for additional uses.

Great, we can get authoritative answers. (I'm trying to understand the ML space and have mostly done readings, not an expert.)

I am assuming you can have n LoRA fine-tunings, say each specializing in one aspect of a coherent task, with n summers, running in parallel, and then combine them at the end? Or more generally, does LoRA enable a sort of modularizing around a core (un-merged) model?

And curious if you ever tried merging 2 or more fine-tunings and then testing the resultant single model (merge all) against the original tests to check retention?


Just look at instruct fine tuning that is being done to completion models to turn them into assistant models. A few thousand examples are enough to alter the behavior of the model and what and how it outputs things significantly and thoroughly.

Mechanisms like LoRA (a very efficient fine-tuning mechanism that has a accuracy penalty) change only a few layers at the top to alter the model considerably.


"Fine-tuning even Lora on the open source models is nearly always better than these other approaches"

Can you expand on that? I've not seen evidence of that myself yet, but maybe I haven't looked in the right places.


Fine-tuning with LoRA is pretty cheap though.

I prefer the not from scratch, but from configuration approach by Axolotl. Aolotl supports fine-tuning mistral, llama-2, with lots of the latest techniques - sample packing, flash attention, xformers.

I concentrate on collecting and curating the fine-tuning data, do "data-centric" fine-tuning - not learning LoRA from scratch.


Correct me if I am wrong, to use LORA fine-tuned model in inference you would still need the original model + trained additional layers, right?

If we can perfect methods to fine-tune large models for specific task while reducing the overall model size, then it can fit into more consumer grade hardware for inference and can be broadly used. The objective is to prune unnecessary trivia and memorization artifacts from the model and leverage LLMs purely for interpreting natural language inputs.


Aren't a lot of base models fine-tuned with (Q)Lora on instruct-based datasets with good results? I thought this was a very common practice?

LoRA is an alternative to traditional fine tuning (which is usually done on specific layers as you mentioned).

To quote the LoRA paper[1]:

> We hypothesize that the change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural network indirectly by optimizing rank decomposition matrices of the dense layers’ change during adaptation instead, while keeping the pre-trained weights frozen

It's truly revolutionary: It basically lets you create a very small "diff" which you apply yo an existing model and it is suddenly fine tuned. These diff models are very small (5M for example).

[1] https://arxiv.org/abs/2106.09685


To me, the crazy thing about LoRA is they work perfectly well adapting models checkpoints that were themselves derived from the base model on which the LoRA was trained. So you can take the LCM LoRA for SD1.5 and it works perfectly well on, say, RealisticVision 5.1, a fine-tuned derivative of SD1.5.

You’d think that the fine tuning would make the LCM LoRA not work, but it does. Apparently the changes in weights introduced through even pretty heavy fine tuning does not wreck the transformations the LoRA needs to make in order to make LCM or other LoRA adaptations work.

To me this is alchemy.


> more than a single model and a lot of finetunes/high rank LoRAs

I can imagine a way might be found to host a base model and a bunch of LoRA's whilst using barely more ram than the base model alone.

The fine-tuning could perhaps be done in such a way that only perhaps 0.1% of the weights are changed, and for every computation the difference is computed not over the weights, but of the output layer activations.


Can we supply our own fine-tuned models?

Edit. I'm sure it's answered on your site but sometimes it's better to include it right here! :)


Yup. I guess LoRA counts as fine tuning. Except I've never seen inference engines where they actually let you take the base model and the LoRA parameters as separate inputs (maybe it exists and I just haven't seen it). Instead, they bake the LoRA part into the bigger tensors as the final step of the fine tune. That makes sense in terms of making inference faster, but prevents the scenario where a host can just run the base model with any finetune you like, maybe switching them mid-conversation. Instead, if you want to host a fine-tuned model, you take the tensor blob and run a separate instance of the inference program on it. Incidentally, this is the one place where OpenAI and Azure pricing differs; OpenAI just charges you a big per-token premium for fine-tuned 3.5, and Azure charges you for the server to host the custom model. Likewise, the hosts for the open-weights models will charge you more to run your fine-tuned model than a standard model, even though it's the almost the same amount of GPU cycles, just because it needs to run on a separate server that won't be shared by multiple customers; that wouldn't be necessary if overlays were separated.

I wouldn't be surprised if GPT-4's rumored mixture of many models does something like this overlay management internally.


You can take a general model and fine tune it for a specific task. There are various tutorials out there for creating fine-tuned models.

I think part of the benefits of LoRA is that you can load the base model once, and then just swap out the vastly smaller LoRA fine-tune to fit the specific task it is working on.

You fine-tune an existing pretrained model on your proprietary dataset.
next

Legal | privacy