Hacker Read

WildGreenLeave · 2016-12-01 01:40:07

I really like CDNs because of the ability to drop in a file and know it will be cached correctly. (Also there is a high probability that your user already has a cached version of the file) But never thought about CDNs being able to track you.

Isn't there an alternative? A more transparant way to provide users with source files and still keep the 'cached items' aspect.

reply

m463 | karma 17870 | avg karma 1.89 · | 2020-07-10 02:27:31+00:00

I think most CDNs nowadays do not cache as much as you think because they are used for tracking.

pgeorgi | karma 5800 | avg karma 2.85 · | 2014-03-19 16:37:10+00:00

Users requesting files from CDN leak metadata.

I guess, browsers could provide a separate cache for CDN domains that's _really_ long lived (there's a certain guarantee that files won't change there) and also send requests there with no referer at all times.

reply

daveFNbuck | karma 3777 | avg karma 2.26 · | 2020-10-11 16:27:08+00:00

The script wouldn't have to be from a CDN to track people using the browser cache. I could infer whether you've visited a site that doesn't use CDNs or trackers by asking you to load something from that site and inferring whether you have that resource cached by the time it took you to load it.

fenollp | karma 292 | avg karma 1.64 · | 2016-12-01 10:05:58

WRT the "timing attack":

In most cases, client does not even request bytes from CDN which is then not able to track Client. But then again CDNs can implement tracking based on this lack of requests (which is kind of ironic and should be infeasible the more clients use this technique I think).

Actually the other issues are solved by the "DHT" part of this idea: no centralized party can track which assets are already in your history.

The only tracking I can think of is by your nearest neighbours's browsers. If such a neighbour N empties your cache (DNS attack?) it will trigger a full fetch from N. Then N can attempt to fingerprint this assets query with what other pages list. But then the whole point of this is to cache assets that are used on most pages!

I love this idea. Let's make the Web decentralized again! (I couldn't resist)

reply

chatmasta | karma 18107 | avg karma 3.55 · | 2018-06-08 15:47:43+00:00

This is true, but how could a CDN possibly cache a response without access to it? Unfortunately it seems to be a necessity.

jimaek | karma 575 | avg karma 4.11 · | 2014-02-07 01:23:41+00:00

I think that depends on user. Some use CDNs to improve performance via already cached files, others to offload traffic and use the actual CDN performance from Geo balancing.

Personally I don't think a user will have anything cached other than jQuery from Google. Which in this case I removed to follow the rule of eating own dogfood.

reply

Thiez | karma | avg karma · | 2022-06-17 09:30:37

Even if the CDN can't (for whatever reason) one could easily include a tracking pixel on every page that is marked as `Cache-Control: no-cache`, or insert a few lines of JS to do the same.

bisho | karma 6 | avg karma 1.2 · | 2012-12-13 21:24:11+00:00

Dude, as a good cdn will set cache headers to make browsers store the items forever, without even re-validating content.

Definitely not very useable for tracking purposes, they will only know about first visit of a user. Even more if sites a and b use the same js library and version, they will only know about the first that a user visit.

Anyway is a bad technical decision not to use a different domain to ensure clients don't need to send extra cookies in the headers.

reply

wreegab | karma 317 | avg karma 2.86 · | 2013-11-27 17:47:15+00:00

If it is in the cache, why is the CDN needed anyways?

kenjackson | karma 20413 | avg karma 3.83 · | 2011-02-24 19:25:46

If you use a CDN isn't it cached across multiple sites?

paulddraper | karma 16686 | avg karma 1.72 · | 2024-05-23 20:30:51

Okay....but that's you providing a backbone for the user. Don't you just use a CDN at that point? (One that allows non-cached requests as well.)

depingus | karma 1207 | avg karma 3.03 · | 2022-05-27 21:18:29

Aren't CDN's usually caching legit data though?

nawitus | karma 3654 | avg karma 2.15 · | 2016-03-20 17:52:29+00:00

Wouldn't the CDN'd resource be already cached for most users (if we're talking about the average, often used resource)?

GordonS | karma 20012 | avg karma 3.02 · | 2019-03-19 23:09:23

OTOH, if you use a "standard" library from a CDN, it's quite possible that the end user already has it cached.

bhrgunatha | karma 1346 | avg karma 3.18 · | 2016-02-01 06:16:05+00:00

So does this cache files on your local machine that are often served via CDN?

Doesn't the browser do that automatically for you?

reply

MrRadar | karma 2132 | avg karma 4.58 · | 2017-02-08 22:17:19+00:00

I also use an add-on called Decentraleyes. It caches various common scripts from popular CDNs within the add-on itself so your device doesn't need to make any network requests for them. It was originally meant as a privacy to but the caching seems to be at least as valuable.

derefr | karma 53572 | avg karma 3.59 · | 2021-04-27 10:23:37

CDNs aren’t intended as a “canonical store”; content can be invalidated from a CDN’s caches at any time, for any reason (e.g. because the CDN replaced one of their disk nodes), and the CDN expects to be able to re-fetch it from the origin. You need to maintain the canonical store yourself — usually in the form of an object store. (Also, because CDNs try to be nearly-stateless, they don’t tend to be built with an architecture capable of fetching one “primary” copy of your canonical-store data and then mirroring it from there; but rather they usually have each CDN node fetch its own copy directly from your origin. That can be expensive for you, if this data is being computed each time it’s fetched!)

Your own HTTP reverse-proxy caching scheme, meanwhile, can be made durable, such that the cache is guaranteed to only re-fetch at explicit controlled intervals. In that sense, it can be the “canonical store”, replacing an object store — at least for the type of data that “expires.”

This provides a very nice pipeline: you can write “reporting” code in your backend, exposed on a regular HTTP route, that does some very expensive computations and then just streams them out as an HTTP response; and then you can put your HTTP reverse-proxy cache in front of that route. As long as the cache is durable, and the caching headers are set correctly, you’ll only actually have the reporting endpoint on the backend re-requested when the previous report expires; so you’ll never do a “redundant” re-computation. And yet you don’t need to write a single scrap of rate-limiting code in the backend itself to protect that endpoint from being used to DDoS your system. It’s inherently protected by the caching.

You get essentially the same semantics as if the backend itself was a worker running a scheduler that triggered the expensive computation and then pushed the result into an object store, which is then fronted by a CDN; but your backend doesn’t need to know anything about scheduling, or object stores, or any of that. It can be completely stateless, just doing some database/upstream-API queries in response to an HTTP request, building a response, and streaming it. It can be a Lambda, or a single non-framework PHP file, or whatever-you-like.

reply

notahacker | karma 21317 | avg karma 2.5 · | 2011-04-01 21:55:07+00:00

Isn't using Google CDN (quite possibly cached by user) with a fallback to a local copy the best of both worlds?

y03a | karma 83 | avg karma 2.24 · | 2018-02-14 03:02:26+00:00

Doesn't that kill the possibility that a visitor will already have common content cached when visiting your site for the first time? I know that's not the only reason to use a CDN, but it's a pretty big one.