Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

West Coast US. You would have been placed in our queuing system because with all the attention we are getting we are very busy right now!


view as:

Thanks! I did notice the queue count showing up occasionally but not every time. Maybe someone could repeat the test who has access without the queue so we can get an understanding of the potential latency once scaled and geo-distributed. What I'm really trying to understand is time to first token output actually faster than GPT 3.5 via API or just the rate of token output once it begins.

I don't know about GPT 3.5 specifically, but on this independent benchmark (LLMPerf) Groq's time to first token is also lowest:

https://github.com/ray-project/llmperf-leaderboard?tab=readm...


Legal | privacy