... because a typical Google service will have to call many other services to process a request (keyword: fan-out). The effective latency is badly impacted by the _worst_ latencies of these services.
Yes, it's pretty much that simple. Google has terabits of dedicated bandwidth capacity and datacenters located all over the world. Fewer hops and a shorter physical distance to the server = lower latency.
This is kinda crazy considering the tremendous effort Google has gone to over the decades to shave milliseconds off their response time. They invented a whole TCP replacement to reduce page latency, and now this?
Single-stream throughput is directly proportional to latency and jitter.
It is nearly impossible to push 1Gb/sec across the public internet because of the latency and jitter that get introduced via physical distance and multiple hops across multiple networks.
Outside of hitting a server within your metro area, and likely on Google's backbone itself, it would be nearly impossible to hit a gigabit (~110MB/sec) of throughput with a single stream of data.
Yes, users care about latency, but less efficiency doesn't necessarily lead to higher latency. (I have no idea what "worse servers" means.)
As long as Google can afford to provide decent anwers with acceptable latency, it will have users.
I think that efficiency is like programming effort, algorithms, and programming languages in that users don't care. They only care about results and the costs that affect them.
> For real world applications, this is absolutely crucial, users want latency numbers on the order of milliseconds, not seconds.
You should follow Google's approach - give fast live results that don't depend on data from the future, but also go back and correct old words when you do have that data. It's kind of how humans work really.
I might not be as precise as I've wanted, but I've taken a class where my professor mentioned that Google has their own physical infrastructure, thus achieving such low latencies for requests in general.
I'm aware that this info is not as precise as one would want, so I'd love to read comments on this!
This is exactly backwards. My network latency to North America is >200ms (RTT). Three round-trip times is about 750ms. You can do 75 disk accesses and three billion mathematical calculations in that time.
If your database and computations are requiring multiple seconds on a normal web page, you have serious user experience problems. When you're under 140ms, it feels like the response is happening at the same time as the request (Dabrowski and Munson weren't able to reproduce the old 50- or 100-millisecond rule of thumb in what sounds to me like a poorly-controlled experiment; http://books.google.com/books?id=aU0MR-MA-BMC&pg=PA292&#...). Increasing Google search page render time from 400ms to 900ms dropped traffic by 20%, according to Marissa Mayer (http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20....). Traditional OLTP systems tried to keep response times under one second; beyond a second, people start to get frustrated and wonder if something is broken.
So, for a normal application, the milliseconds you might save by optimizing your database and computations are peanuts in comparison to the second or more that TCP-level optimizations could save you.
Really? I could have sworn that Google was all about local content distribution, to the point of creating new fundamental internet infrastructure to reduce latency.
reply