Hacker Read

snowwrestler · 2014-04-06 18:38:14+00:00

> Consider a blog which gets posted to once a week. Do you really need dynamic processing for every page render 24/7/365? Well, that's the way it works.

Isn't this the exact problem that a reverse proxy cache solves?

reply

lazyjones | karma 5054 | avg karma 1.69 · | 2014-11-24 14:48:05+00:00

> Doing this is a horrible waste of resources.

It's not really: we don't have a proper caching infrastructure anyway to conserve both CPU cycles and bandwidth by allowing the whole page to be cached and just minor fragments to be loaded dynamically. Back in the days, when most individuals/offices/ISPs would have some sort of cascaded Squid setup, it was more helpful.

As for caching in the client itself: possible and useful (provided the user will load the page many times per day), but not helping with the "re-rendering for each user" part as far as bandwidth is concerned.

The rendering effort itself is largely a non-issue. If your web framework is inefficient and you run out of juice on the server side, you can simply cache the static parts of your page with memcached or even use some setup with ESI on a load balancer.

reply

busymom0 | karma 4982 | avg karma 2.44 · | 2021-08-02 05:53:05

> The biggest thing we do to help ourselves when we're under attack is making sure that the pages being ddosed (homepage, etc) is being cached by them.

What about pages which can't be cached? For example an updated comment feed? How would you deal with dynamic data?

reply

locutous | karma 79 | avg karma 1.13 · | 2022-11-27 12:00:05

> Take Wordpress as an example. There really should not be a need to cache a Wordpress site. But some of the most popular plugins are those that cache.

Caching gives sites about two orders of magnitude speed uplift in my experience. Sometimes more. Especially if you are building the pages live and compressing them.

Brotli -11 is really really expensive and egress data transfer is one of the more expensive commodities to purchase from cloud venders.

Caching also reduces latency significantly - which is a much better user experience.

reply

lucaspiller | karma 4736 | avg karma 2.65 · | 2016-01-19 00:27:23

> The majority of the payload will be made up of dynamically generated (and non-ideal for caching) HTML

For a blog or CMS this is probably not the case. The majority of the page is going to be the same for all users. Maybe you have something at the top of tbe page showing the user is logged in, and admin links to edit it, but that's probably about as dynamic as you will get.

> Rendering apps on the client-side allows fetching partials and content data piecemeal, which is ideal for caching

Yes.. but why? 99% of websites aren't going to have enough traffic that this would cause load issues on the server. On the client the browser probably does a better job of caching pages than you could.

For something more complicated that a blog or CMS you need to think about security. It's not a trivial task to cache something and securely serve it to only people who have an active session and are authorised to view it.

reply

petesergeant | karma 3370 | avg karma 2.51 · | 2022-08-16 06:21:36

> But when it’s on every page, from a web performance perspective, it equates to a lot of data

It's not cached?

reply

p4bl0 | karma 17781 | avg karma 7.56 · | 2022-08-16 04:59:24

> But when it’s on every page, from a web performance perspective, it equates to a lot of data.

How does browser caching come into play here? Doesn't it make a difference?

reply

webmaven | karma 13964 | avg karma 2.46 · | 2017-01-02 09:35:03+00:00

> there are ways to make caching work. But it needs to be done very carefully in a large system or you risk long-term damage.

There are ways of using caching that are relatively simple to reason about that ought to be evangelized a bit.

For example, many caching proxies expose quite a bit more functionality than just caching, such as replacing placeholders in server-side rendered pages with transclusions (by the proxy) from another URL (which can be part of the same web app). This lets you compose server responses from fragments that have different caching rules, neatly side-stepping many of the issues mentioned in the OP as well keeping global state disentangled.

For example, you can cache most of a rendered web-page with a generous TTL so that the cache only queries the web app behind it once an hour or even less often, and have the proxy insert the portion of the page that must be current, which is also cached, but with a TTL of <1 second and relying on if-modified-since to speed up the usual case where nothing has changed.

The resulting setup is easy to reason about, doesn't have any particular gotchas (unless you are doing a lot of A/B testing), and will absolutely solve the problem of your site slowing to a crawl because 50k impatient users are obsessively hitting Ctrl-R over and over on your homepage at the same time every Monday morning to check to see if that one thing has changed yet. Notably, this type of setup works without having to beef up with more server instances to handle the load, or having to accept a slightly stale homepage being served for several minutes, or even accepting a longer response time for the very first user to hit the server after the content does change and the cache is invalidated and repopulated. All you've done with the cache is eliminate wasted cycles at the cost of adding less complexity to the code than you need for feature-switching.

This approach of fast composition (transclusion or even simpler concatenation) of separately cached values avoids the problem of global state entanglement because the final transclusion or concatenation is simple and fast enough to not (or hardly) bother with caching the final result at all.

reply

everforward | karma 1747 | avg karma 1.82 · | 2024-03-15 13:26:35

> With regard to caching, I think I would opt for a solution at a different layer. For example, render the page and just cache the output in memcached, Redis or similar. Or just in memory maybe. As always, it depends.

I totally get that, it is dramatically simpler and it probably does well enough for most things.

My theory was that you can effectively cache parts of a page, and then build pages out of the cache. Like if my page relies on a user ID and some kind of "is logged in" boolean, the "is logged in" parts have probably already been rendered before so we only have to dynamically render the parts that rely on this user ID (presuming we haven't seen this user ID and cached those parts already).

Caching the whole page requires rendering both the user ID and the "is logged in" part, because it caches over the totality of the parameters.

Now that I'm writing this out, it does seem a touch excessive lol. It's kind of like server side React, but I'd probably rather push that processing onto the client device because I don't pay for their CPU cycles.

reply

amelius | karma 42902 | avg karma 1.63 · | 2018-08-07 15:57:00

> heck, they even specifically instruct browsers and proxies to cache a lot of their content.

Solution: implement the scraper as a proxy that forwards the HTML, but also saves the user-authored content.

reply

dustinmoris | karma 7072 | avg karma 5.16 · | 2020-10-06 15:57:33+00:00

> Have you considered requesting a 24 hour private cacheable resource and counting the requests on the server? Or is the browser cache too unreliable?

Nice idea.

reply

WimLeers | karma 71 | avg karma 1.11 · | 2015-01-06 08:48:08

> What if the cache can parse cookies and vary the cached response by the value of a specific cookie, not the entire header?

> This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done.

This is false. I'm not at all very familiar with Varnish, but I know this is easily possible, and has been used for many, many years.

E.g. for Drupal + Varnish, i.e. to only keep Drupal's session cookie, I found these examples, in less than a minute of googling:

- https://www.varnish-cache.org/trac/wiki/VarnishAndDrupal

- https://www.lullabot.com/blog/article/configuring-varnish-hi... (grep for "inclusion")

Everything in this article has been well-known for at least half a decade, yet is being presented as major technical breakthroughs. Too much marketing, IMO.

reply

telent | karma 107 | avg karma 1.91 · | 2012-02-09 15:46:35

But what does it do that makes it apparently uncacheable? Your pages change when you write a new blog entry, or when someone posts a comment - and even then there's no hard constraint that requires their comment appear immediately - serve a stale-by-60-second page and I doubt most people would notice, never mind care. If your blog engine can't serve one page every sixty seconds to the nginx/vanish/other reverse caching proxy in front of it, just what is it doing?

(This is partly a genuine question and partly a rant caused by seeing my VPS go into swap death spiral when the googlebot came around, just because WP apparently couldn't serve more than about ten simultaneous requests. Intellectual curiosity says I'd like to know, but life really is too short to go back to having to deal with it fo rreal)

reply

sneak | karma 23753 | avg karma 1.62 · | 2021-03-15 16:53:40

> Personally I prefer to have explicit control over the caching mechanism rather than leaving it to network elements or browser caching.

I'm not talking about browser caching, I'm talking about the reverse proxy that fronts your ("backend") service to the internet. High traffic global/unauthenticated reads, especially those that never change, should get cached by the frontend (of the "backend", not the SPA) reverse proxy and not tie up appservers. (In our case, app servers were extremely fat, slow, and ridiculously slow to scale up.)

reply

shakna | karma 12286 | avg karma 3.3 · | 2020-11-12 02:51:40+00:00

> Was HTTP client-side caching even a thing in the early 90s? Did disks have the room to hold client-side caches?

I was considered mad at the time for upping my cache to a whopping 2Mb by friends, but Netscape's cache was highly configurable.

Things like cache pages, but not images, always cache bookmarked pages, cache iframe pages (which were often navigation), etc. Netscape 4 added CSS to that mix.

reply

aphextron | karma 13989 | avg karma 5.79 · | 2016-06-28 07:01:53+00:00

>And, if your web app serves literally 0 static content?

Even real time dashboards have some cacheable content. I would still want to use something like Varnish/NGINX in between.

reply

amelius | karma 42902 | avg karma 1.63 · | 2016-02-01 13:31:01+00:00

> This is brilliant, not only for privacy but for speed.

But these resources are probably already cached by the browser anyway (using the appropriate http headers). So how can this solution add any improvements to that, once the resources have been loaded for the first time?

reply

LiveTheDream | karma 7975 | avg karma 7.36 · | 2011-08-31 18:31:22+00:00

> I don't need to implement any caching because I already have sufficient resources.

Define "sufficient resources". It might be enough for your normal traffic, but without caching those blogs will crash from the /. effect (unexpected traffic spike).

reply

cabalamat | karma 6928 | avg karma 2.63 · | 2019-01-22 10:40:15+00:00

> Could you give me a little insight into your use case?

I'm not the OP but a use case I might find useful is if you have a blog, want to comment on a news article, and want to have a cached copy of that news article. Since web pages often go way after some time, making sure the news article stays up might be useful. This is particularly true if the entity writing the news article decides it is embarrasing to them and wants to flush it down a memory hole.

reply

justinludwig | karma 434 | avg karma 3.98 · | 2011-03-21 03:05:40

> unless they're talking about proxy caching

I'm fairly certain that's what he's talking about. A) He's Yves Lafon, and B) when you're talking web architecture, you care very much about what intermediaries can do.

reply