Hacker Read

Taek | karma 7521 | avg karma 4.97 · 2023-05-09 01:52:44

75B tokens is not really enough data to make an intelligent model. Llama was trained on over 1 trillion tokens.

And yes, training on top of LLaMA could introduce a lot of unexpected behavior, but that's just where the State-of-the-Art is today