Virtual Machines for Multi-Tenancy in Varnish

pbowyer | karma 3461 | avg karma 4.14 · 2020-08-13 19:33:29+00:00

Does anyone else experience Varnish using a surprising amount of CPU?

I tried after seeing this benchmark [0] and thinking "that must be wrong". I got the same results.

Benchmarking on a 1GB, 1vCPU cloud server (to get an idea of what it would handle), with minimal VCL and one 15kb file in Varnish's in-memory cache it used over half the CPU (with hitch, unsurprisingly, using the other half for SSL termination).

[0] https://twitter.com/BlackIkeEagle/status/1274977171400847361

reply

fwsgonzo | karma 1530 | avg karma 2.62 · 2020-08-13 19:54:47+00:00

I'm just doing regular benchmarking here: https://1drv.ms/u/s!Aqr3qxYHlh9rgfVUKdE7Y9oYo5pOVQ?e=2GWdJU

But as you can see, it's not much CPU at 450'000 reqs/sec, and the response time is just fine, at 1ms average. The benchmark is at first hitting one of my emulated VMs, but it gets cached, which is a fast-path.

I think benchmarking anything other than a cache hit, is not doing a cache justice. Generally, the cache hit ratio is very high when properly configured. I did not look into what the Litespeed benchmark was doing though, so I can't comment on that.

What exactly is your VCL here?

reply

pbowyer | karma 3461 | avg karma 4.14 · 2020-08-13 21:21:33+00:00

This is my VCL: https://gist.github.com/pbowyer/8822ab1d61fc209056f0f6bc6943...

And how Varnish is initialised ExecStart=/usr/sbin/varnishd -a :80 -a 127.0.0.1:6081,PROXY -f /etc/varnish/default.vcl -s malloc,64m -p feature=+http2

I've verified everything is a cache hit, and the ratio is 1.000 by the end of testing.

Screenshots and more in this thread: https://twitter.com/peterbowyer/status/1292765478100795393

reply

fwsgonzo | karma 1530 | avg karma 2.62 · 2020-08-14 07:13:22

Hey, just woke up. I don't see anything wrong here, so I think I will second the other commenter. I wonder if the VMs are over-subscribed? Is there any way to tell or would that be a business secret?

I also think we should consider how CPU usage is calculated as well. I don't know how it's done inside VMs, but I can easily imagine that there are several cases where it would include time where the CPU is just waiting for things like access to memory or even waiting on I/O as long as that happens through a shared memory ring buffer, via spinning or something to that effect. So, if that is how it works, then over-subscription of VMs could cause them to show higher CPU usage. But, I don't know, and I hope someone more familiar with the subject will join the discussion.

reply

pbowyer | karma 3461 | avg karma 4.14 · 2020-08-14 08:13:05

There's no way to tell if over-subscribed AFAIK. The VM is with UpCloud as I had credits there.

I haven't used strace before but I ran it and perf top and have put images at https://twitter.com/peterbowyer/status/1294181276061241344.

The __memset_avx2_erms is present even with 5 concurrent requests (but at 11% overhead then) and the core dump still happens.

Very odd.

reply

flas9sd | karma 299 | avg karma 1.33 · 2020-08-17 23:03:47+00:00

If I'd like to profile VCL subroutines, is valgrind the obvious tool to reach for?

67868018 | karma 83 | avg karma 1.2 · 2020-08-13 22:10:43+00:00

There is something else wrong with your server. Very wrong. You need to do some process tracing and find out what syacalls are so expensive that you're seeing this kind of CPU usage

pbowyer | karma 3461 | avg karma 4.14 · 2020-08-14 06:53:34+00:00

Thanks - will strace attached to the master Varnish process do the job?

pbowyer | karma 3461 | avg karma 4.14 · 2020-08-14 08:42:41

I did my best and I've put images & info @ https://twitter.com/peterbowyer/status/1294181276061241344.

qznc | karma 8930 | avg karma 2.79 · 2020-08-13 22:04:02

Looks similar to the Bytecode Alliance wasmtime thing: https://hacks.mozilla.org/2019/11/announcing-the-bytecode-al...

sschueller | karma 21616 | avg karma 4.57 · 2020-08-14 12:17:19+00:00

Anyone using open source varnish with some sort of shared storage across multiple instances? What is your setup?