It's weird that more than a day after the weights dropped, there still isn't a proper announcement from Mistral with a model card. Nor is it available on Mistral's own platform.
Announcing 2 new non-open source models, and they won't even release the previous mistral medium? I did not expect... well I did expect this, but I did not think they would pivot so soon.
To commemorate the change, their website appears to have changed too. Their title used to be "Mistral AI | Open-Weight models" a few days ago[0].
It is now "Mistral AI | Frontier AI in your hands." [1]
This. MistralAI is also underdog and released Mitral 7b and Mixtral 8x7b, but as soon as they got traction, they closed their models (e.g., Mistral Medium).
I had the wrong assumption that Mistral was built "on top of" Llama. Then again, I find sentences like "Mistral's models are based off on Meta's Llama".
I’ve been studying and tinkering with open weight LLMs since the original llama weights leaked. I’ve very recently become convinced that the true data and compute requirements needed to fine tune and produce an “unsafe” model are orders of magnitude less than what’s needed today. We are no more than a year away from anyone with a 4090 being able to fine tune their own mistral. The cat is out of the bag on this one.
It depends on what's being evaluated, but from what I've read, Mistral is also fairly competitive at a much smaller size.
One of the biggest problems right now is that there isn't really a great way to evaluate the performance of models, which (among other issues) results in every major foundation model release claiming to be competitive with the SOTA.
Huh! Nevermind then! I take it back. Would be interesting to see what kind of tuning they did/pit the model head-to-head with LLaMA-2-7B-chat. Seems like they did just instruction tuning but not RLHF? So I assume Mistral won't be refusing to answer etc., probably doesn't have many safety guardrails (I guess that's desirable for some!)
Right, but they can just use Llama/Mistral for free, instead of their inferior models, which I'm sure take quite a bit of resources to train in the first place.
reply