Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

All Pythia models were trained on 300B tokens, LLaMa models were trained on 1/1.4T tokens.


view as:

Legal | privacy