Thanks! I did notice the queue count showing up occasionally but not every time. Maybe someone could repeat the test who has access without the queue so we can get an understanding of the potential latency once scaled and geo-distributed. What I'm really trying to understand is time to first token output actually faster than GPT 3.5 via API or just the rate of token output once it begins.
reply