Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
My fight against CDN libraries (peppercarrot.com) similar stories update story
298 points by agateau | karma 520 | avg karma 8.81 2016-11-30 16:12:19 | hide | past | favorite | 149 comments



view as:

The only issue with going against the grain here if you're not putting your site itself behind a cdn. It'll vary in download rates across the global. This was the intended use case for CDNs, but analytics are added so CDNs can improve.

You're correct with the fact that they are tracking us, but there's a trade off that comes with this that holds tremendous value. If that value of speed isn't a factor or low on your list of priorities then by all means, sever everything.


Not download rates, but request times are important for a website. Of course, if your uplink is exceptionally bad, the former matters somewhat, too.

Usually, one would amalgamate the resources to avoid additional requests to any server at all (CDN or not, all requests have unnecessary overhead). Many Web CDNs include frameworks that combine CSS, JS etc. resources to speed up page loading. Add to that SVG inlining and image optimization and you're good speedwise.

What you still miss the the geo-targeting of an Anycast network like CF et al. This will slow down the initial resource request again.

The question is: If you knew that you could live without the aforementioned pros of a CDN, why use it in the first place...


> Many Web CDNs include frameworks that combine CSS, JS etc. resources to speed up page loading. Add to that SVG inlining and image optimization and you're good speedwise.

I thought that's what HTTP/2 was for? I'd rather solve this at the connection level than have some third party amalgamate my content and thus silently break it in maybe 5% of cases.


    > It'll vary in download rates across the global [sic].
Sure, but we're talking on the order of ~KBs, so many HTML pages are going to be bigger than font and CSS files, and seeing as the author is providing a webcomic, all those images certainly will be.

Nobody's CDN-ing CSS for latency when the rest of their assets are served from a single location. As the author says, it's just "laziness" (or 'ease of development').


I think this is a false choice. If CDNs were simply caching the content in various geographic regions to keep the data closer to the users, and thus loading quickly, then that'd be fine. But in my observation, that's not what they've become.

It seems that increasingly more often, I have to enable javascript from numerous third party sites just to get some or all of the supposed first party content of the page to even render at all. And then all this extra JS ends up slowing my browser down, IMO, cancelling any improvement in load time from having a geographically close cache. Then there's other annoyances, like the alignment of the text, pictures, and other content suddenly shifting about, because the font has changed, or the JS from one CDN finally finished processing and decided that, no, actually, pictures should go over there.

My needs and wants from a page are rather simple. Render your content, then kindly get out of my way and let me take in the message you are trying to communicate to me. Attempting to import distracting fanciness from CDNs is more likely to cause me to skip your site than an extra 100ms of load time because I'm in Germany and you're in Canada.

On an unrelated note, it looks like I might be taken by another archive binge here soon.


Sadly a lot of people don't use the `async` attribute for script tags. We can still load fonts over CDNs, even using Cloudfront or Cloudflare for caching.

> Then there's other annoyances, like the alignment of the text, pictures, and other content suddenly shifting about, because the font has changed, or the JS from one CDN finally finished processing and decided that, no, actually, pictures should go over there.

This is because the scripts and css are so large that it takes time for the browser to render all the rules set by that site. This can easily be optimized with the right tools or throwing out dead code, even only fetching code specific to that web page. Most developers just `script` tags into their head as well and this _kills_ loading of pages more than anything (unless your css file is thousands of lines).

CDNs are meant for distributing across the globe, you're hosting your own assets without a CDN in the middle and the site is still slow, you're optimizing in the wrong place.


There are particular circumstances you can actually use 'async' though. Load order is usually important.

You don't need inject third-party Javascript files into your website to geo distribute a few font files. You can still put them on something like S3 and let the hoster figure out the shortest path.

That would have to be S3 with Cloudfront which is the only way to easily have custom domain SSL since S3 is not CDN'ed or cached.

The latency issue is only present once, the initial page load. After that the resources are cached. Second to that, if you're following best practices for page speed, the user will not notice at all because a snippet of CSS that provides the initial layout and styles will be sent with the HTML body. Amongst dozens of other things you can do to make this a non-issue.

>>The latency issue is only present once, the initial page load.

The initial page load is also one of the most important things to optimize for things like, you know, conversion of visitors to paying customers. I've given up on subscribing to new products and services simply because their pages weren't performing well, and I'm sure many others here have done the same.


I've given up subscribing because pages don't work (as in don't render anything at all) with JS disabled.

i've given up subscribing because pages look like shit in telnet:80

:D


Cached per browser, though, which is significantly different than cached per request.

Even if you're caching/serving static content efficiently it still adds load to a server.


Also avoid facebook's reactjs, google's angular & twitter's boostrap :)

So your main argument is privacy, not letting Google collect users' data, but then consider that most of your users are probably using Chrome, everything they type in the URL box is sent to Google (for autocompletion) anyway.

Is looking at some comics website even a privacy problem? Let's say google finds out your user X looks at your website. What possible damage can they do? Sell it to the advertisers so they can target X with some comics ad? If you ran a medical site, I would get it.

Then you have to give up other cool things like Google Analytics.

P.S.

Some beautiful artwork on your site.


It would depend on the site if most users use chrome. Also, you have to ask what kind of damage can be done in the future if the data is collected .

That's what I'm asking. What kind of damage are you talking about exactly? I understand privacy concerns for medical, porn, political, gender rights websites. But a comics one?

Categorizing people based on what they read comes to mind .

Ok, so you categorized the people by whether they read this comic or not. What's the damage?

You've been categorized based on what you read.

The damage is that it contributes to normalizing this behavior in general.

I don't know if some present or future government could be offended by people reading this comic but it shouldn't be a question we even need to ask. If you want to live in a world where surveillance is not the default then the very least you can do is remove the tracking from your own websites.

Thus I applaud the author of this website. And I'm really bored of the dullards who chip away at my freedom saying "but it's only a tiny transgression, what does it matter?"


> if some present or future government could be offended by people reading this comic

That makes zero sense. Sorry, not buying that argument.

I'm talking about this particular comic, not some political manifesto.


You haven't addressed the actual point. Which was that we shouldn't need to think about whether any particular comic is or is not of interest to people carrying out surveillance.

This is a comic from a French author. Remember, the authors of a satirical comic in France were killed for their drawings only a few years back.

Charlie Hedbo, http://m.bbc.com/news/world-europe-30710883


> but then consider that most of your users are probably using Chrome, everything they type in the URL box is sent to Google (for autocompletion) anyway.

That's no reason to hand over Google the remainder of your users on a silver plate. (Especially not when you're not even getting paid for it.)

> Is looking at some comics website even a privacy problem?

Why is privacy something that needs to be justified? Why is exhibitionism supposed to be the default?

> Then you have to give up other cool things like Google Analytics.

"Being shiny" is not a justification for mass surveillance.


Autocompletion is one of the first things I disable in Chrome. Actually go into Advanced Settings and turn everything off that I can.

What you have you done to fix this issue? How many people have you told to not use Chrome?

Single sites are not the issue. The reach is. Alright, what's the matter if somebody knowns that I visit single comics site, right? Well, yeah, but if they also know that I also visit that other comics site, and that site too, and hmm, yes, that one too, then somebody starts getting the picture of me (disclaimer: this is not description of me).


I don't think it's an issue. People who are paranoid about their privacy can use FF with Ghostery.

Chrome is now the most popular browser, and for a reason. It's clear that most people value better browser experience over privacy fears.

Plus Google says they don't sell user's data, they only use it to target their own ads:

https://privacy.google.com/how-ads-work.html


Chrome has gained hold because too many ignorant developers have recommended it in their circles instead of FF.

Google has used its services to aggressively push people towards Chrome with banners, degraded service etc.

While the privacy issues have been actively downplayed by either ignorant bystanders or by astroturfing, I do not even think that is the biggest issue here.

It is the monopoly on every level, that is my biggest concern.


> So your main argument is privacy, not letting Google collect users' data, but then consider that most of your users are probably using Chrome, everything they type in the URL box is sent to Google (for autocompletion) anyway.

I can't control information they willingly give away, but I don't have to give them additional data on the people who chose not to send everything to google.

> Is looking at some comics website even a privacy problem? Let's say google finds out your user X looks at your website. What possible damage can they do? Sell it to the advertisers so they can target X with some comics ad?

For me it's because it ruins the search results. Just because I looked at a comic doesn't mean I want them ranked higher in search results. I've found the less google knows about me the better the search works.

> Then you have to give up other cool things like Google Analytics.

Personally I'm about to give up on it anyway. Maybe if your a big site it's useful. I just want a list of page hits and the referrer url if possible. Analytics completely fails at this.


> For me it's because it ruins the search results. Just because I looked at a comic doesn't mean I want them ranked higher in search results. I've found the less google knows about me the better the search works.

Isn't that what clicking the Globe icon next to the Cog icon does?


> but then consider that most of your users are probably using Chrome, everything they type in the URL box is sent to Google (for autocompletion) anyway.

This is a conscious decision that the user has made to use Chrome and send their URLs to Google. When you visit a website you don't really get much of a say in this - you're up to the mercy of the website.

To be clear, I use Chrome (and Gmail and Youtube and Google Search and everything else) and I'm not overly concerned by these concerns, but I can appreciate when webmasters are being responsible with the services that they subject their visitors to.


> This is a conscious decision that the user has made to use Chrome and send their URLs to Google

Not to pick on this one point, but I want to add that I doubt many users are really conscious of it. People on HN are in a bubble; go to the local mall and ask people what they know. Ask them what a web browser is.

https://www.youtube.com/watch?v=o4MwTvtyrUQ

That's why people in the IT field have a responsibility to provide safe products to the public. It's like someone on Wall Street saying that a typical person is making a 'conscious decision' to accept all the complex risks of a sophisticated financial instrument, or that I'm making a 'conscious decision' to accept all the risks of an airplane when I get on it - I have little idea how the plane works; it's up to the manufacturer, airline and goverment to ensure it's safe.


> Is looking at some comics website even a privacy problem? Let's say google finds out your user X looks at your website. What possible damage can they do?

What if they look at something politically unpopular, something that favors unpopular group or ideas? Those can be in comics. How about socially embarrassing things? They also determine you are in a certain place at a certain time. It is standard practice for governments to use those things to persecute people; there is no reason to think it will suddenly cease. I'm talking about western governments too - the U.S. government tried to embarrass and blackmail Martin Luther King; arguably it interfered in the recent election. Is there a reason governments would have changed?


I'm talking specifically about this particular comic website.

Yes, I fully understand the need for privacy when it comes to political and other social issues.


> Is looking at some comics website even a privacy problem?

I don't think it matters what the website is. Privacy should be the default. The idea that popular CDNs could passively gather a list of websites I visit is disturbing to me, even if all the sites on that list happen to be mundane. That information could still be used to build an advertising or personality profile on me. It's even more disturbing that leaking user information is commonplace on the web and most web devs don't give user privacy a second thought, though it's nice to see this one does.


> Then you have to give up other cool things like Google Analytics.

Honest question: What is the benefit of Google Analytics? Can someone share a story where they got actionable insight out of it?

(Context: Being very concerned with privacy, I would never for the live of myself install any analytics on my sites. In fact, I have even disabled the nginx access.log.)


Knowing who's sharing your content, your real-time visitors, and how long they stay on your website is incredibly actionable stuff. It tells you whether you're creating stuff that has a long-term draw, it tells you about the nature of who links your pages, and it tells you whether one layout or piece of content works better than another. (And you know when a term trends on Google.) You can change your tactics based on your traffic.

When your goal is to attract an audience, that's just standard boilerplate stuff as far as actionable insight goes.

Your privacy concerns are legitimate, but it's hard to consider any of these issues from a black box. You need data—and Google Analytics is the most common way to get that data.


> Then you have to give up other cool things like Google Analytics.

Alternatives do exist. If you want advanced tools similar to GA, a local Piwik[0] install will do the job. For much more basic options, there are lots of log processing programs available, I have recently tried and liked GoAccess[1].

[0] https://piwik.org/

[1] https://www.goaccess.io/


Great to see someone paying attention to the problem of loading third-party <script>s, and talking about the work required to avoid them.

Before I knew it was a comic site I was amazed they took the time to copy all of the icons they wanted as svg. Even knowing the author is an illustrator it is still admirable and impressive.

That part didn't seem strictly required to address the third-party content problem. They could have used the font icons, and just copied all the necessary bits to their server.

Also, for anyone with a similar problem, consider backing https://www.kickstarter.com/projects/232193852/font-awesome-... . They're 15 hours from completion, and $38k away from a stretch goal to release SVG icon support in the Open Source version.


> the work required to avoid them And that's the rub, it was a _lot_ of work. It's nice to see it can be done, but few sites will have the time or inclination.

Good. Another reason not to use these CDNs is they're additional risk and introduce the potential for downtime and breakage. It's an additional point of failure that just doesn't come with many benefits.

I'll happily use these services for quick POCs and throwaway demos, but once anything starts to become semi-permanent I'll make sure I control my uptime and host these assets myself.


I've started to leverage them with fallback, but I guess I'll see how that plays out. (For fonts - I don't use anything else from a CDN, aside from front caching with cloudflare)

From the post:

"Well a big one: Privacy of the readers of Pepper&Carrot."

Before even thinking about tossing things like Google Fonts or AddThis or whatever, the very first thing you need to do is turn on HTTPS. If you're concerned about privacy, or content injection, or MITM attacks, or name-your-poison-here, you must immediately only serve up pages via HTTPS with strong encryption.


Those are to a large extent different problems. For one you are eliminating requests to outside hosts from your own website and thus avoiding having those outside hosts track your users. For the other, adding encryption, you're preventing the carrier being used at either end or in between from tracking which pages on the site are visited but not so effectively whether the site was visited at all. Without the libraries being loaded Google and other CDN Library providers have no way of knowing whether I have visited that site unless they are also providing the underlying network connection that I am using.

These seem completely independent to me.

- HTTPS is for attacks.

- What the article describes is run-of-the-mill tracking by Google etc.

If I am not being attacked, the CDN resources will still allow Google to track me. If I am being attacked the CDN resources will still allow Google to track me.

If I don't have these Google resources (let's just use Google resources for now), I don't think that Google will MITM me.


Google might not MITM you, but a shitty wifi router, your ISP, a hotspot, a hacked device on your network, or the government, and more can and will MITM you.

Time and time again you see stories of people having tracking, ads, and malware injected into their browsing from free wifi, most ISPs, cell providers, hacked wifi routers, or even antivirus software.

Enabling HTTPS is THE baseline, there's no excuse not to have it.


> HTTPS is for attacks.

You are always under attack on the internet.

This isn't really hyperbole. While I'm sure it's possible to find the occasional exception, you really need to assume all internet traffic could be hostile.

- Verison vandalizes most of the plaintext HTTP by adding their X-UIDH[1] tracking-id header.

- I's common to see Javascript appended to HTML files when they are sent over HTTP on a cellular network. (it replaces images URLs with a very highly compressed version).

- If the HTTP socket intersects Great Firewall, more injected Javascript might conscript your browser into the Great Cannon[2]. (also: the "QUANTUJM" suite of tools that use packet races for similar purposes)

- One of the goals is the privacy of the readers. Google isn't the only attacker, and MitM is only one type of attack. If you aren't encrypting, your requests being analyzed - probably several times - with DPI[3]. If you aren't encryption, you are enabling passive surveillance.

That's just some of the obvious stuff.

[1] https://www.verizonwireless.com/support/unique-identifier-he...

[2] https://citizenlab.org/2015/04/chinas-great-cannon/

[3] https://en.wikipedia.org/wiki/Deep_packet_inspection


Without https MANY places will silently MITM your connection. I remember reading that a major site, I think netflix, was having a low consistent level of page load failures and couldn't figure it out.

When they went to HTTPS all the problems went away, apparently the culprit was code and HTML injected into their pages by

1) wifi hotspots - this is REALLY common, how do you think they redirect you to their login page?

2) content filters like sonicwall - I used to work in an IT consulting shop, these are EVERYWHERE. And they don't just filter, they record every page of every site you visit. Expect any non-residential wifi to be reading all of your non-encrypted traffic.

3) crappy ISP's trying to serve ads

4) bad actors on public wifi - this is common at the airport, meeting halls, and conventions. Even encrypted wifi is vulnerable as long as the attacker is on the same network (if they have wireless isolation off, which most places do for printers).


The worst part is that HTTPS is there and works, all they need to do is add the HSTS header and they would instantly improve the security of every single visitor for free.

I generally hate when people point out things like "if you really cared about your users like you said you do, you'd implement [unrelated thing]", but in this case it's an extremely small change that would improve the privacy for every single one of their visitors.


After working at an encrypted/private email service, this is my cup of tea. However, I'd like to go off-topic and point out that the comic looks fantastically well drawn: http://peppercarrot.com/en/article383/episode-19-pollution

Made with Krita!

I’ve just recently been deciding on an app to use for drawing with my Surface Book for illustrating all kinds of things, and I’ve settled with Krita in the last week. It’s best-of-breed, and free to boot.

I use Decentraleyes to help with the CDN issue. It's not much but every little bit helps I think.

https://addons.mozilla.org/firefox/addon/decentraleyes


This post and half the comments are killing me on conflating "third party javascript" with "CDN".

Yes. While I completely agree with the author and their quest to eliminate third party scripts from their site, the problem isn't with CDNs. The problem is with third party scripts, most of which aren't coming from a typical CDN (cdnjs, for example).

It is entirely valid, and common, to front your own application code behind a CDN.

Love the sentiment, just wish the terminology was more accurate.


The code injection problem can often (but not always) be solved via Subresource Intergrity https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

I only represent about 0.00000013% of all Chinese Internet users, but let me chime in: EVERY website that uses Google CDNs for js or fonts just doesn't work here. It just keeps loading and loading, and loading forever. In most cases it's jQuery, and in most cases it's in the <head> so the page just never shows. Cloudflare (cdnjs), Amazon CDNs, Akamai CDNs also occasionally get blocked and take entire Internet segments with them.

If you use 3rd party CDNs, please consider implementing client-side failover strategy so you don't leave out 50% of the Internet "population".


For my new site, what I'm doing is a full fallback - local, cdn font, served from my site, then regular font-family fallback.

Not sure if that works properly in China, if it just spins. It might never 'fail' and fall back. I'd need to test that.

Like so: src: local('Slabo 13px'), local('Slabo13px-Regular'), url(https://fonts.gstatic.com/s/slabo13px/v3/B9U01_cNwYDvIHK04hX...) format('woff2'), url(https://fonts.gstatic.com/s/slabo13px/v3/fScGOqovO8xyProgHUR...) format('woff2'), url("/fonts/Slabo13px-Regular.ttf"); }


Thanks for sharing this. Is there a list of big domains such as Google which don't work in China? Does adding social logins like FB and G+ also makes the login pages to break?

Yep social logins do break or make pages very very slow to load. (Expat in Shanghai)

Funny thing, most if not all 'client-side failover' strategies you might find through google or the likes won't work either.

This is because the loaded resource will 'fail' anywhere between x seconds up to minutes, or never! In the meantime the user just sees a blank page, or best case, some 80-90% page that keeps trying to load something...

I've experienced this myself a couple times. Most probably my ISP messed up some stuff taking down whole chunks of 'internet' :)


The fact that browsers don't have any built-in mechanisms to load-balance resources it disappointing, but not surprising. Think about who are the main drivers of HTML specs. There is probably a way to implement such balancing manually, though. Either through loading parallel documents and exploiting cahcing or through using setTimeout and loading a different set of resources.

I don't like the censorship policies of the Chinese government. I'm not going to go out of my way to make sure my site is compatible with their censorship, use a VPN.

Enjoy losing tons of potential users over your stubbornness for something that you could easily fix.

Cost of having principles? Well not that bad then.

I'm sure Google, Uber, and countless other tech companies thought the same before they pulled out of China's market. Turns out China's government policies can be so hostile to outside tech companies that even the Mighty Google has decided it's not worth it. If I did make my site work great it could break again tomorrow or get banned for no apparent reason with no recourse.

Didn't get it, put your jQuery in another CDN provider has anything to do with censorship?

spend time and resources making sure my site works and keeps working in a marketplace I have no interest in due to the government policies there? The reason I don't care has everything to do with censorship, and censorship ironically is the only reason the site wouldn't work right in the first place.

You care about Chinese government’s censorship, but you don’t care about privacy from multinational corporations and governments?

False equivalence - the one has nothing to do with the other.

They do, if you take a moralistic attitude, as op has. It’s hypocrisy. If you just don’t care about Chinese market then say so—don’t pretend you do so out of a moral obligation.

You're making assumptions about the beliefs of OP that aren't in evidence. A "moralistic attitude" (as you put it) can give greater weight to concerns about government censorship than to concerns about corporate respect for privacy rights. The two are quite different, even to an avowed anti-statist such as myself.

That’s why I stated it as a question.

I still think it’s hypocrisy to care about one and not the other. Those corporations are actually actively working to erode Internet freedoms, which affects everyone, not just a single country, and one that is not even democratic in the first place.

To get on your high horse over censorship in Asia while at the same time merrily include spying in your own code, as a simple convenience, nonetheless, is very much indefensible.


Who says he/she doesn't care about privacy from multinational corporations? I care about both.

Well, he seems to be defending the practice of using CDN libraries, and the article talks about how this practice, which he says he won't change, invades user privacy.

This is a very strange comment. First, there is no indication that GP doesn't care about the other things you mention. But more importantly: why don't you care about Chinese government's censorship? Are you saying that the censorship is OK because multinational corporations doing something? How is it even related?

It may not be your intention, but you appear to be defending the practice of censorship and demanding others to share your disinterest in the matter. That won't work, not because google and fb tracking everything on the web is good, but because censorship is bad.

EDIT: I'd like to know what made my downvoters do their thing?


Hosting your assets locally is not “supporting censorship.”

Meta question:

Why does HN allow throw away accounts? It seems to go against the idea of "internet karma".


Because sometimes people feel free to talk if they are anonymous.

By calling the account "throwaway*" it tells people that this person is a regular and delibrately using a pseudonym


yeah I always make it obvious, no point in trying to pretend you're some guru.

Barring deliberately putting "throwaway" in the username, how would you propose HN's code determine between a throwaway and a normal brand new account?

Some subreddit have rules based on the account age.

I'd like to add, that I don't think I judged it in my wording. I've made throwaway accounts, but never on sites where discussion between users is the focus. I just don't really understand.


Whenever I'm going to say something bad about anything more important than toilet paper I create a throwaway.

Even if the posts give you karma (which my throwaways are pretty universally positive karma) it isn't worth the downvote brigades every time the haters see you and occasional death threats.

throwaway accounts very much tie in with the concept of free speech and the open internet. If I don't want my personal safety tied to a few paragraphs of ranting there shouldn't be an issue with that. I should be able to speak my mind without fear of recourse.

Nobody is more interested in "internet karma" than the Chinese govt at the moment. Haven't you heard of the "online credit score" they're implementing? Everything you say and do online is tied to your real identity and has wide ranging effects on your well being. Needless to say there's some suggestions that this may affect free speech


I'm not against throw away accounts, I adhere to the idea of free speech anonymity provides. But if HN was a video game with a karma mechanism, I'd make it so users can't get around that central mechanism...

I'm not sure that I would consider providing a fallback local copy of a library as going out of my way.

Ultimately it comes down to who you want in your audience. If you don't care about reaching a Chinese audience then that's a perfectly reasonable stance.

The Berkman Center estimates that just 2-3% of users in censored markets use circumvention technologies. So you should be prepared to have your content pirated or your service cloned if it is of interest to a Chinese audience.

That's just the reality of the situation.

\* http://cyber.harvard.edu/sites/cyber.harvard.edu/files/2010_...


It will happen whether you build your website to be accessible there or not. They already cloned all the major sites used in the US and as soon as the Chinese version was off the ground the government blocked out all the foreign competitors.

That's very ignorant, and plain sad! so basically you are saying that because of decisions made by the leaders of a certain country, you would like to punish everyone else? That's shameful almost.

I feel your pain, when building my sites and if I use a 3rd party js/css cdn I use the only one I have found that has ICP and a damn good list of western nodes - jsdelivr.com

Funny, Privacy Badger blocks enough of www.jsdelivr.com that the site content fails to load.

Maybe not the best solution, but for the js CDN of Google, I'am using a self hosted mirror with this list https://github.com/euh/googleapis-libraries-list

Unfortunately in HTTPS, I need to bypass HSTS protection and for Firefox it's annoying (I don't know for Chrome).

For the fonts I tried to create a self hosted mirror too, but Google does not offer to download exactly the exact fonts they host.


How is that possible? You just download them and save them on your server... Unless you are worried about breaking some TOS?

Local fallbacks also make it easier to do web dev when e.g. your wifi drops out and you can't see where it landed or you're on a boat or whatever.

I do things like http://stackoverflow.com/questions/7383163/how-to-fallback-t...



Is this a DNS block or an IP block? For example: if you serve assets from your domain but CloudFlare sits in front, do the assets still load?

I assume that Chinese censorship is not escaped just by using https, thus without DPI I assume it has to be IP related.

A combination of DNS pollution, HTTP Reset attack and IP ban.

Cloudflare work terrible in China! Actually it doesn't work at all as long as you don't buy there enterprise plan (and have an ICP)

Is this a DNS block or an IP block? For example: if you serve assets from your domain but CloudFlare sits in front, do the assets still load?

Do you know of a way for a non-Chinese to test a website from your end? Is there a VPN or Bowser testing site that we can use?



I'm curious - do you know of any chrome plugins / client-side tools to help solve this problem? Seems like it could be very infuriating for the semi-competent Chinese internet user. Wouldn't be too bad to write a plugin that finds dependencies from "allowed"/"works-in-China" CDNs.

uMatrix

Really glad about your comment, i honestly havent thought about this a single time yet. I guess i am missing out on a lot of traffic.

The cats are pretty nice.

I use uMatrix and do not load external web fonts. I am stripping out CDN reliance in our stack at work as well. This practice of supporting secure protocols but still trading ease-of-development for end-user privacy & security must stop.

It's awesome that nearly 10 years after I came up with MonsterID, it's still going strong. I love those cats.

I really like CDNs because of the ability to drop in a file and know it will be cached correctly. (Also there is a high probability that your user already has a cached version of the file) But never thought about CDNs being able to track you.

Isn't there an alternative? A more transparant way to provide users with source files and still keep the 'cached items' aspect.


Firefox on Linux.

I use uBlock Origin, Ghostery and Disconnect, and Flash Control. peppercarrot.com is all zeroes for all three blockers, meaning nothing is blocked because there's nothing noticed that needs to be blocked. There are no Flash Control icons, meaning no video or audio noticed and blocked. Thanks for caring. :)

On the front page of theguardian.com, logged in as me, there's a V icon at the top, meaning that Flash Control has blocked video, probably for some gratuitous menu feature. I have zero trouble using and reading the site.

When I first opened theguardian a few minutes ago, uBlock was blocking 13 requests. It's steadily climbed in those minutes to 32 blocked requests. Ghostery is noticing/blocking 0 trackers. Disconnect is blocking two: nielsen and comscore. Disconnect is also blocking 1 from Facebook and 3 from Google. All three tools may be seeing and blocking some of the same things.

Without these four tools, except for low/no-commercial technical sites and public service sites like wikipedia my web is all but unusable. With them my web is fine.

I very rarely have any problems using any site. I had to enable my bank in uBlock to use their popup bill pay feature. I think I had trouble viewing a cartoon at The New Yorker; I forget what I did to view it. Youtube and Flash Control seem to be in a perpetual arms race, as was the case with Flashblock. Youtube is my main motivation for using Flash Control, to prevent automatic video playing.

And yep, I get that sites pay the bills with ads. I $ubscribe to three news sites, and I also get that that doesn't pay the whole bill. The web is either going to have to block me for using a blocker (I've been seeing that very rarely recently, or at least "Unblock us please") or figure out a less dangerous, intrusive and loadsome way to serve ads. (And yep, I just made up the word "loadsome." I can do anything!)

EDIT: I whitelist duckduckgo.com in uBlock.

https://duck.co/help/company/advertising-and-affiliates

https://duckduckgo.com/privacy


Some of the trackers load more trackers - taking theguardian.com as an example again, with Ghostery on it blocks only 6 items. But whitelist the site and after it lets those 6 load, now it finds 18 trackers.

Not sure if this is a feature of youtube or chrome, but when opening a video in a new tab, it does not play until I have that tab in focus.

I think if you're logged in to youtube/google you can set preferences. I'm never logged in because tracking.

> I use uBlock Origin, Ghostery and Disconnect, and Flash Control.

I just have to say, thank goodness for Moore's Law. Without it, we would never have so many wasted cycles![0]

[0] Not saying you're wasting, but the fact that we have to jump through sooo many hoops to stop all this crap is just disgusting.


I see a parallel to our immune system. One of the most complex pieces of machinery in our bodies has the sole function to keep us from getting overrun and taken over by "hackers". When I look at nature as an example I see a way to more complexity in the things we create - because they are not actually completely "designed", instead we let laws of nature govern how they develop, so I think it's not too far-fetched to look at existing nature-designed systems for guidance on predictions about the future of man-made systems.

How we operate, a good example using a very simple product: https://medium.com/@kevin_ashton/what-coke-contains-221d4499...


IIRC, Ghostery was scorned in the not so distant past for whitelisting sites for money. Is that still the case? Because ever since then I have used Privacy Badger (https://www.eff.org/privacybadger) which is made by the EFF which I know is financed mainly through donations and in recent years more and more through the Humble Bundle (https://www.humblebundle.com/) so not financed by companies that would want their trackers etc. in sites regardless of user choice or as opt-out.

And YouTube curiously will trigger hundreds of blocked requests in uBlock Origin over just a few minutes when watching a video. I can’t imagine why it needs so many continuous “questionable” accesses.

AddThis makes money by selling 3rd party audience segments to advertisers like me. I assume they get this data by tracking what users view what pages through their sharing buttons. Example segments I can buy to advertise too: http://i.imgur.com/JF6ZZPC.jpg

The author doesn't even mention the big players: every FB share or like button, on all that nasty porn you watch (even in incognito mode), straight to FB. They recently changed their policies and signaled that they are going to start using this data for ad targeting, probably in a push to expand FAN and be more competitive with Google.

Something as simple as a share button that some blogger copy and pasted into their blog turned into an ad tech/data company!

I personally love that story and think that's cool and innovative thinking from AddThis.

But I also think more data = better ads, at the expense of privacy (probably not a popular opinion around here).


CDN is common enough technique which should be standardized in browsers. HTML should include link to resource hosted by site and its checksum. Now browser can easily use cached resource from any other site with the same checksum or just download it from site.

There are 2 reasons to use CDN. First is caching (different sites using the same resource from the same CDN will download it only once), second is speed (some browsers restrict connection count to the same domain, so hosting resources on a different domains might improve download time). Caching is better solved by using checksum as a key, instead of URL. Speed with HTTP/2 is not an issue, because there's only one TCP connection. The only advantage of CDN might be geographically distributed servers, so user from China would download resource from China server instead of US server. I don't see easy and elegant way to solve it, but I'm not sure it should be solved at all, HTTP/2 pushing resources should be enough.


> CDN is common enough technique which should be standardized in browsers. HTML should include link to resource hosted by site and its checksum. Now browser can easily use cached resource from any other site with the same checksum or just download it from site.

I really like this idea! Store your heavy assets in a public DHT with each browser storing a part. Then fetch said assets by content-hash if not already in cache. Maybe disable serving for mobiles. The W3C needs to get on this!


The W3C has a thing called subresource integrity, which is basically what vbezhenar described:

https://www.w3.org/TR/SRI/

However, there are reasons why e.g. hash-addressed JavaScript are not used as a shared cache:

https://hillbrad.github.io/sri-addressable-caching/sri-addre...


WRT the "timing attack":

In most cases, client does not even request bytes from CDN which is then not able to track Client. But then again CDNs can implement tracking based on this lack of requests (which is kind of ironic and should be infeasible the more clients use this technique I think).

Actually the other issues are solved by the "DHT" part of this idea: no centralized party can track which assets are already in your history.

The only tracking I can think of is by your nearest neighbours's browsers. If such a neighbour N empties your cache (DNS attack?) it will trigger a full fetch from N. Then N can attempt to fingerprint this assets query with what other pages list. But then the whole point of this is to cache assets that are used on most pages!

I love this idea. Let's make the Web decentralized again! (I couldn't resist)


Maybe I'm missing something crucial, but why not just host the content on your own server? I.e., just download that Google font, jquery.js or FontAwesome and serve it directly instead of using an external CDN.

The post seems to say "I don't like where some content is coming from, so I re-created said content by myself".


To first thought, there may be licenses in place preventing you from self-hosting the content.

At least in the case of Google Web Fonts and FontAwesome, I am almost positive there is no issue with hosting locally.

That leaves AddThis and Gravatar

Can't you just proxy gravatar?

Then you pay the bandwidth bill.

I know I would rather save a few bucks over make a site work for China. Many sites don't need to work in China.


Why use alternatives?

You can download the Google Web Fonts and serve them from your host.

You can also download and serve Font Awesome from local.

And there doesn't seem to be a reason why you can't do it with gravatar either.

I don't get this post honestly. It seems to be about replacing stuff with other stuff instead of replacing CDN with locally served content.


In the case of Google fonts, is it legally possible to download the font and serve it from one's own server? The FAQ has a relevant section, but does not answer this question: https://developers.google.com/fonts/faq

As far as I'm aware, the fonts on Google are just fonts, not owned by Google.

Example: https://www.fontsquirrel.com/fonts/playfair-display Playfair Display - "Copyright (c) 2010-2012 by Claus Eggers Sørensen (es@forthehearts.net), with Reserved Font Name 'Playfair'" in the SIL licence right next to the font files.

Therefore yes, you should be able to download them, and use them, according to the original licence. ( Which, by the way, usually required the font creator to be credited, which Google only does when you select it, but not in the served CSSs, which I believe, is not fair. )


IANAL but they would not appear to be able to construct a case against you for using the fonts on your own server, since at no point is it stated that such a practice would be in violation of the terms of use.

As you observe, they do not explicitly answer the question, but their reticence should be taken as an implicit green light, encased in a warning about loading times.

Most Google fonts are merely served from their hardware, and not created by them, so the license selected by the font's creator applies. Think of Google Fonts as an aggregator of free-to-use fonts.

There is also a list of fonts and their licenses available from Google Fonts here: https://fonts.google.com/attribution

If you're really concerned, check who created the font and see if they make the font available under a permissive license on their own website. Lato, for instance, is available from its creator's website and is published under the Open Font License.


Wonder if there will be a time CDNs of these will pay you for the visitor data you 'share/leak' with them via the linked resources (to convince you to keep using them).

Off topic, but the root site of this blog post is pretty awesome - "Pepper & Carrot: A free, libre and open-source webcomic supported directly by its patrons to change the comic book industry!"

So, here's where I mark myself as a dinosaur: why are you trying to set a specific font for a web page? Clients select fonts for a reason.

But that's not a web page, it's a web app! I want full control of it!

... I don't even know myself if I'm being sarcastic or not ...


Because presentation matters a lot in a lot of cases.

Your question is answered by another question. Why does more than one font exist?

Because readers have different needs? Seems pretty obvious to me.

Historically "fonts" have been set by the author, not the reader. While I appreciate that this is no longer necessary, it seems reasonable that authors still want to choose it for reasons of presentation. Of course it's easy enough to write a use stylesheet, so readers that need a different view can get one.

https://github.com/justjavac/ReplaceGoogleCDN I would like to recommend this plugin (for Chinese users) and reach out an arms for others to help contribute towards it.

Legal | privacy