Hacker Read

tome | karma 8885 | avg karma 1.75 · 2024-02-19 19:08:11

West Coast US. You would have been placed in our queuing system because with all the attention we are getting we are very busy right now!

totalhack | karma 118 | avg karma 1.15 · 2024-02-19 19:29:36

Thanks! I did notice the queue count showing up occasionally but not every time. Maybe someone could repeat the test who has access without the queue so we can get an understanding of the potential latency once scaled and geo-distributed. What I'm really trying to understand is time to first token output actually faster than GPT 3.5 via API or just the rate of token output once it begins.

tome | karma 8885 | avg karma 1.75 · 2024-02-19 19:39:55

I don't know about GPT 3.5 specifically, but on this independent benchmark (LLMPerf) Groq's time to first token is also lowest:

https://github.com/ray-project/llmperf-leaderboard?tab=readm...

reply