Hacker Read

littlecranky67 · 2020-01-03 11:12:18+00:00

I call BS on the benchmarks AND the theoretical analysis. Every time I read those HTTP/X benchmarks, people don't mention TCP's congestion control and seem to just ignore it in their analysis. Well, you can't. At least if you call "realistic conditions". Congestion control will introduce additional round trips, depending on your link parameters (bandwidth, latency, OS configuration like initcwnd etc.) and limit your bandwidth on the transport layer. And based on your link parameter, 6 parallel TCP connections might achieve a higher bandwidth on cold start because the window scaling in tcp slow start is superior than in a single tcp connection used by HTTP/2.

Additionally, the most common error people do while benchmarking (and i assume the author did too) is to ignore the congestion control's caching of the cwnd in the tcp stack of your OS. That is, once the cwnd is raised from usually 10 mss (~14,6kb for most OSes and setups), your OS will cache the value and re-use the larger cwnd as initcwnd when you open a new tcp socket to the same host. So if you do 100 benchmark runs, you will have one "real" run, and 99 will re-use the cwnd cache and produce non-realistic results. Given the author didn't mention tcp congestion control at all and didn't mention any tweaks to the tcp metrics cache (you can disable or reset it through changing /proc/sys/net/ipv4/tcp_no_metrics_save on linux) I assume all the measured numbers are BS.

reply

rijoja | karma 576 | avg karma 0.99 · | 2017-01-26 19:43:16+00:00

Wow had no idea. But isn't TCP good enough? I guess I could google for a benchmark but I am to lazy.

Because it's made for speed issues right?

reply

avianlyric | karma 6100 | avg karma 3.93 · | 2024-03-14 15:24:34

Have you got any data to backup that claim?

It’s also a specious argument anyway. The six connection limit isn’t purely artificial, opening and tracking TCP connection state is expensive, and happens entirely in the kernel. There’s a very real cap on how many TCP connections a machine can serve before the kernel starts barfing, and that cap is substantially lower than the number of multiplexed streams you can push over a single TCP connection.

You’re also completely ignoring TCP slow start process, which you can bet your bottom dollar will prevent six TCP streams beating six multiplexed streams over a single TCP stream when measuring latency from first connection.

reply

neilc | karma 4819 | avg karma 3.66 · | 2010-11-10 22:46:23

The are a number of issues at play here — RTT (Round Trip Time, i.e. ping/latency), window sizes, packet loss and initcwnd (TCP's initial window).

Initial window size: not relevant AFAICS, I'm not talking about connection startup behavior.

RTT, Window size: if the bandwidth-delay product is large, obviously you need a large window size (>>65K). Thankfully, recent TCP stacks support TCP window scaling.

Packet loss: you need relatively large buffers (by the standards of traditional TCP) and a sane scheme for recovering from packet loss (e.g., SACK), but I don't see why this is a show stopper on modern TCP stacks.

I'm not super familiar with the SPDY work, but from what I recall, it primarily addresses connection startup behavior, rather than steady-state behavior.

reply

The_rationalist | karma -7 | avg karma -0.0 · | 2020-10-07 18:58:55

Those incremental gains doesn't seems much better than what linux Tcp improvments get each year, especially if turning on state of the art congestion / bufferbloat algorithms. Also Tcp fast open is ridiculously old and I can't see how mainstream equipment still wouldn't support it on average.

sp332 | karma 55607 | avg karma 2.75 · | 2012-01-23 21:08:27+00:00

This paper seems unreservedly in favor of larger windows. http://research.google.com/pubs/pub36640.html

Based on our large scale experiments, we are pursuing efforts in the IETF to standardize TCP’s initial congestion window to at least ten segments. Preliminary experiments with even higher initial windows show indications of bene?ting latency further while keeping any costs to a modest level. Future work should focus on eliminating the initial congestion window as a manifest constant to scale to even large network speeds and Web page sizes.

reply

Lukasa | karma 333 | avg karma 3.87 · | 2015-02-12 20:07:44

Because running multiple TCP connections in parallel plays havoc with TCP congestion control and also plays poorly with the TCP slow-start logic. Every TCP connection begins its receive window again and so it starts small, so fetching many moderately-sized or large resources (think images) will cost you many round trips you didn't need to spend.

ksec | karma 41816 | avg karma 3.09 · | 2022-08-25 06:44:20

Are there any reason why we cant have TCP slow start initial window to 100 packets or higher?

I could easily see 95% of internet could be 150KB page on first load.

reply

colanderman | karma 15615 | avg karma 3.76 · | 2014-12-07 19:40:07

Ah yes. I'd say that's more a problem with TCP in general than in using multiple connections. The assumption that "1 TCP connection == 1 share of bandwidth" is at best a useful first approximation. I don't know that N TCP streams eating N times their "fair share" is really any different a problem than some-important-interactive-application being given equal bandwidth with some-irrelevant-background-download. (Though it might be a worse problem.)

I'd love to live to see the day that something actually better than TCP (that addresses these and other issues) dethrones it, but given how long IPv6 took to gain traction, I wouldn't be surprised if it didn't happen in my lifetime.

reply

jakobegger | karma 8283 | avg karma 4.45 · | 2016-06-07 03:12:22

Is slow start really still in use? I'm not too familiar with congestion control algorithms, but I thought that TCP stacks have moved beyond that...

dilyevsky | karma 5634 | avg karma 2.11 · | 2022-06-05 14:52:53

With scaling tcp can use window up to 1GB. Which is enough for 1s rtt which is ridiculously high @ 8gbps which is also ridiculous for long haul on a single flow. In practice you probably almost always just parallelize into multiple flows

learc83 | karma 13583 | avg karma 3.05 · | 2018-08-10 16:25:05

>Actually, one TCP connection all by itself will do fine.

That depends on your metric. It isn't necessarily true if you're targeting stable latency for whatever reason.

reply

pcwalton | karma 42908 | avg karma 5.48 · | 2014-07-02 15:39:41

In our benchmarks our raw TCP performance has generally been best-of-breed. I suspect the issue is rust-http, which is not an official package.

baybal2 | karma 13766 | avg karma 1.35 · | 2020-12-03 13:15:09+00:00

> And make no mistake, TCP head-of-line blocking is a real issue on high-loss networks like the outskirts of wi-fi and cellular networks.

On high loss, and high speed networks. Think of wifi, or lte, with a very fat pipe after it.

On really slow network, there is not much difference.

Google's "real world" telemetry, and benchmarks shown HTTP2 as great, but it wasn't.

Opening multiple TCP connections may well be cheaper than dealing with all of that.

reply

beagle3 | karma 16421 | avg karma 2.62 · | 2015-03-28 13:36:43

> For the most part it works fine.

Only for apps that don't try to utilize maximum throughput. Skype, YouTube, most web browsing, most mail use.

But those that do - like large file transfers over ftp/sftp or a very large email, for example - will cause the meltdown described in this article.

There are some TCP stacks that use RTT rather than packet loss as their congestion metric; Those fair well under a TCP-over-TCP regime (but have other problems)

reply

m0th87 | karma 4929 | avg karma 6.75 · | 2009-10-21 09:34:28

This is true, but my understanding is TCP's congestion control mechanism is far from ideal and the root cause of its performance problems.

ycmbntrthrwaway | karma 2002 | avg karma 4.34 · | 2017-01-02 17:06:17+00:00

True. Minimal TCP with 1 MSS window may be easy, but proper congestion control with fast recovery, F-RTO, tail loss probe, SACK etc. is much harder. Miss one of these aspects and you get a TCP that takes minutes to recover from a lost packet in some obscure case. It took years to debug Linux TCP stack. Even BSD stack is already way behind.

snissn | karma 1291 | avg karma 1.86 · | 2012-03-01 00:53:50

I don't disagree, but it makes it an apples an oranges comparison. It introduces the variables of how linux vs windows deals with TCP (not withstanding that linux 2.x vs 3. might have some internal ipv4 changes, but youre recommendation is to upgrade anyway, so that's fine) but also changes in the webserver.. It seems like the changes are hard coded into the compiled kernel, so there's no way to simply change configuration flags?

That said, thanks for the post, and I'll definitely be tcpdumping in the upcoming week and reading some more about slowstart!

Maybe testing with net.ipv4.tcp_slow_start_after_idle 0 vs 1 would make a cleaner comparison?

reply

toast0 | karma 25207 | avg karma 2.17 · | 2016-09-17 04:59:12

> It also seems like the protocol intentionally runs slower that possible as to not create buffer pressure on the receiving side, if I'm understanding this quick description properly: "then cruising at the estimated bandwidth to utilize the pipe without creating excess queue".

> The this line just scares me: "Occasionally, on an as-needed basis, it sends significantly slower to probe for RTT (PROBE_RTT mode).

I haven't read the proposal, but I think the reason for this is that they're comparing RTT during load with idle RTT to determine packet queuing, but the idle RTT may change over time.

Depending on how accurate you want to be, it could be as simple as after some time or packet count of full data packets sent from socket buffer in response to ack moving the window, leave a small gap for the next packet, and then resume sending. If that packet is acked faster than the rest, the idle RTT is shorter than the under load RTT, which means you should slow down in general (to optimize latency). If the RTT is the same for the after gap packet, then the load RTT is close to idle, and you can keep going at the current rate.

(I probably wouldn't implement it like I described it. With TCP timestamps, we have pretty continuous RTT measurements, some sort of last N packet min/max/average/stddev to drive the congestion window from all measurements, and a mechanism to add a small gap for the PROBE_RTT would make more sense: any low RTT response should inform the system, not just one that comes in response to a probe)

reply

ajross | karma 32824 | avg karma 3.42 · | 2012-03-01 10:38:02

No, TCP performance goes up for the connection in question. Latencies are lower because the packets arrive earlier. Latency (for all protocols) goes down on the whole though due to the backlog.

Balancing these requirements against each other is a really hard problem. TCP slow start is (well, was, c.f. this article) an early attempt to get an auto-tuning solution. But it isn't the only part of the problem, nor is it an optimial (or even "good") solution to its part of the problem. It's defaults are very badly tuned for modern networks (though they'd be a lot better if everyone was using jumbograms...).

reply