Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Consider a blog which gets posted to once a week. Do you really need dynamic processing for every page render 24/7/365? Well, that's the way it works.

Isn't this the exact problem that a reverse proxy cache solves?



sort by: page size:

> Doing this is a horrible waste of resources.

It's not really: we don't have a proper caching infrastructure anyway to conserve both CPU cycles and bandwidth by allowing the whole page to be cached and just minor fragments to be loaded dynamically. Back in the days, when most individuals/offices/ISPs would have some sort of cascaded Squid setup, it was more helpful.

As for caching in the client itself: possible and useful (provided the user will load the page many times per day), but not helping with the "re-rendering for each user" part as far as bandwidth is concerned.

The rendering effort itself is largely a non-issue. If your web framework is inefficient and you run out of juice on the server side, you can simply cache the static parts of your page with memcached or even use some setup with ESI on a load balancer.


> The biggest thing we do to help ourselves when we're under attack is making sure that the pages being ddosed (homepage, etc) is being cached by them.

What about pages which can't be cached? For example an updated comment feed? How would you deal with dynamic data?


> Take Wordpress as an example. There really should not be a need to cache a Wordpress site. But some of the most popular plugins are those that cache.

Caching gives sites about two orders of magnitude speed uplift in my experience. Sometimes more. Especially if you are building the pages live and compressing them.

Brotli -11 is really really expensive and egress data transfer is one of the more expensive commodities to purchase from cloud venders.

Caching also reduces latency significantly - which is a much better user experience.


> The majority of the payload will be made up of dynamically generated (and non-ideal for caching) HTML

For a blog or CMS this is probably not the case. The majority of the page is going to be the same for all users. Maybe you have something at the top of tbe page showing the user is logged in, and admin links to edit it, but that's probably about as dynamic as you will get.

> Rendering apps on the client-side allows fetching partials and content data piecemeal, which is ideal for caching

Yes.. but why? 99% of websites aren't going to have enough traffic that this would cause load issues on the server. On the client the browser probably does a better job of caching pages than you could.

For something more complicated that a blog or CMS you need to think about security. It's not a trivial task to cache something and securely serve it to only people who have an active session and are authorised to view it.


> But when it’s on every page, from a web performance perspective, it equates to a lot of data

It's not cached?


> But when it’s on every page, from a web performance perspective, it equates to a lot of data.

How does browser caching come into play here? Doesn't it make a difference?


> there are ways to make caching work. But it needs to be done very carefully in a large system or you risk long-term damage.

There are ways of using caching that are relatively simple to reason about that ought to be evangelized a bit.

For example, many caching proxies expose quite a bit more functionality than just caching, such as replacing placeholders in server-side rendered pages with transclusions (by the proxy) from another URL (which can be part of the same web app). This lets you compose server responses from fragments that have different caching rules, neatly side-stepping many of the issues mentioned in the OP as well keeping global state disentangled.

For example, you can cache most of a rendered web-page with a generous TTL so that the cache only queries the web app behind it once an hour or even less often, and have the proxy insert the portion of the page that must be current, which is also cached, but with a TTL of <1 second and relying on if-modified-since to speed up the usual case where nothing has changed.

The resulting setup is easy to reason about, doesn't have any particular gotchas (unless you are doing a lot of A/B testing), and will absolutely solve the problem of your site slowing to a crawl because 50k impatient users are obsessively hitting Ctrl-R over and over on your homepage at the same time every Monday morning to check to see if that one thing has changed yet. Notably, this type of setup works without having to beef up with more server instances to handle the load, or having to accept a slightly stale homepage being served for several minutes, or even accepting a longer response time for the very first user to hit the server after the content does change and the cache is invalidated and repopulated. All you've done with the cache is eliminate wasted cycles at the cost of adding less complexity to the code than you need for feature-switching.

This approach of fast composition (transclusion or even simpler concatenation) of separately cached values avoids the problem of global state entanglement because the final transclusion or concatenation is simple and fast enough to not (or hardly) bother with caching the final result at all.


> With regard to caching, I think I would opt for a solution at a different layer. For example, render the page and just cache the output in memcached, Redis or similar. Or just in memory maybe. As always, it depends.

I totally get that, it is dramatically simpler and it probably does well enough for most things.

My theory was that you can effectively cache parts of a page, and then build pages out of the cache. Like if my page relies on a user ID and some kind of "is logged in" boolean, the "is logged in" parts have probably already been rendered before so we only have to dynamically render the parts that rely on this user ID (presuming we haven't seen this user ID and cached those parts already).

Caching the whole page requires rendering both the user ID and the "is logged in" part, because it caches over the totality of the parameters.

Now that I'm writing this out, it does seem a touch excessive lol. It's kind of like server side React, but I'd probably rather push that processing onto the client device because I don't pay for their CPU cycles.


> heck, they even specifically instruct browsers and proxies to cache a lot of their content.

Solution: implement the scraper as a proxy that forwards the HTML, but also saves the user-authored content.


> Have you considered requesting a 24 hour private cacheable resource and counting the requests on the server? Or is the browser cache too unreliable?

Nice idea.


> What if the cache can parse cookies and vary the cached response by the value of a specific cookie, not the entire header?

> This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done.

This is false. I'm not at all very familiar with Varnish, but I know this is easily possible, and has been used for many, many years.

E.g. for Drupal + Varnish, i.e. to only keep Drupal's session cookie, I found these examples, in less than a minute of googling:

- https://www.varnish-cache.org/trac/wiki/VarnishAndDrupal

- https://www.lullabot.com/blog/article/configuring-varnish-hi... (grep for "inclusion")

Everything in this article has been well-known for at least half a decade, yet is being presented as major technical breakthroughs. Too much marketing, IMO.


But what does it do that makes it apparently uncacheable? Your pages change when you write a new blog entry, or when someone posts a comment - and even then there's no hard constraint that requires their comment appear immediately - serve a stale-by-60-second page and I doubt most people would notice, never mind care. If your blog engine can't serve one page every sixty seconds to the nginx/vanish/other reverse caching proxy in front of it, just what is it doing?

(This is partly a genuine question and partly a rant caused by seeing my VPS go into swap death spiral when the googlebot came around, just because WP apparently couldn't serve more than about ten simultaneous requests. Intellectual curiosity says I'd like to know, but life really is too short to go back to having to deal with it fo rreal)


> Personally I prefer to have explicit control over the caching mechanism rather than leaving it to network elements or browser caching.

I'm not talking about browser caching, I'm talking about the reverse proxy that fronts your ("backend") service to the internet. High traffic global/unauthenticated reads, especially those that never change, should get cached by the frontend (of the "backend", not the SPA) reverse proxy and not tie up appservers. (In our case, app servers were extremely fat, slow, and ridiculously slow to scale up.)


> Was HTTP client-side caching even a thing in the early 90s? Did disks have the room to hold client-side caches?

I was considered mad at the time for upping my cache to a whopping 2Mb by friends, but Netscape's cache was highly configurable.

Things like cache pages, but not images, always cache bookmarked pages, cache iframe pages (which were often navigation), etc. Netscape 4 added CSS to that mix.


>And, if your web app serves literally 0 static content?

Even real time dashboards have some cacheable content. I would still want to use something like Varnish/NGINX in between.


> This is brilliant, not only for privacy but for speed.

But these resources are probably already cached by the browser anyway (using the appropriate http headers). So how can this solution add any improvements to that, once the resources have been loaded for the first time?


> I don't need to implement any caching because I already have sufficient resources.

Define "sufficient resources". It might be enough for your normal traffic, but without caching those blogs will crash from the /. effect (unexpected traffic spike).


> Could you give me a little insight into your use case?

I'm not the OP but a use case I might find useful is if you have a blog, want to comment on a news article, and want to have a cached copy of that news article. Since web pages often go way after some time, making sure the news article stays up might be useful. This is particularly true if the entity writing the news article decides it is embarrasing to them and wants to flush it down a memory hole.


> unless they're talking about proxy caching

I'm fairly certain that's what he's talking about. A) He's Yves Lafon, and B) when you're talking web architecture, you care very much about what intermediaries can do.

next

Legal | privacy