Cloudflare and the Wayback Machine, joining forces for a more reliable Web

superkuh | karma 13075 | avg karma 3.32 · 2020-09-17 09:42:58

Normally I'd be upset about Cloudflare getting involved in anything good and pure like archive.org but this relationship, just suggesting new URLs to archive, seems harmless enough.

syshum | karma 2960 | avg karma 0.71 · 2020-09-17 15:15:15+00:00

>>>seems harmless enough.

The number of things that started out with those words and then went on to be come huge problems is non-trivial I am sure

But I want to believe...

reply

cj | karma 10933 | avg karma 5.3 · 2020-09-17 15:18:43+00:00

> I'd be upset about Cloudflare getting involved in anything good and pure

Why?

At least by FAANG standards, Cloudflare stands out to me as one of the good guys.

reply

NikolaeVarius | karma 5727 | avg karma 2.92 · 2020-09-17 15:25:08+00:00

You fool, you have opened up a can of worms, no good will come out of this resulting flame war

https://en.wikipedia.org/wiki/Cloudflare#Controversies

reply

Dirlewanger | karma 1415 | avg karma 1.0 · 2020-09-17 10:33:21

Wow, zero coverage about their attempts to centralize key parts of the Internet under their control and how many people got outraged about that, but plenty on the trivial outrage of CF supplying services to customers they found offensive. I'm kind of tempted to update this section.

core-questions | karma 1950 | avg karma 1.36 · 2020-09-17 10:50:16

Why establish a false dichotomy? One can be just as mad about them abusing their position to attack free speech in a censorious way, as one is about their other efforts to create a monopoly over a large swathe of internet traffic.

I get it, nobody wants to defend badthink, but it's a hell of a lot easier philosophically to defend all political speech regardless of content than it is to try to pick and choose and make some moral case.

reply

johnklos | karma 4665 | avg karma 3.32 · 2020-09-17 16:29:52+00:00

Set aside the free speech stuff which involves groups inciting violence - while we can agree that it's bad, deciding what to do will inevitably lead to wildly divergent ideas.

Cloudflare says that it's "free speech" for their customers to claim to be Adobe offering a "Flash Updater", or for their customers to claim to be Bank of America.

That is how evil Cloudflare is - they allow clear and unambiguous crime for pennies.

reply

meowface | karma 10977 | avg karma 2.45 · 2020-09-18 00:15:20+00:00

They do? I'm pretty sure they've always banned phishing pages. Do you have a source on that?

slipheen | karma 885 | avg karma 5.27 · 2020-09-17 10:41:33

Any sufficiently large and centralized infrastructure (like Cloudflare is becoming) has an incentive to engage in user-hostile behaviors.

So far, I personally think most people would say they've been mostly benevolent.

But they are a public company, and a new CEO can have different priorities.

For example, (fairly or not) the public perception of Google has slowly been diminishing over time.

reply

forgotmypw17 | karma 6284 | avg karma 1.96 · 2020-09-17 11:54:54

>So far, I personally think most people would say they've been mostly benevolent.

They're denying access to no-JS users to many sites.

reply

aeyes | karma 2853 | avg karma 2.99 · 2020-09-17 17:06:32+00:00

The owner of the site is denying access, Cloudflare just provides a service. The owner can turn the feature off.

forgotmypw17 | karma 6284 | avg karma 1.96 · 2020-09-17 13:27:10

No, I can't even get through to the site itself past Cloudflare's gate...

yandie | karma 286 | avg karma 2.13 · 2020-09-17 18:37:21+00:00

The gate is configured by the website owner. Not to defend the frustrating experience - but I personally prefer my website to not be attacked by bot.

I'm just a happy Cloudflare customer.

reply

kortilla | karma 11418 | avg karma 1.82 · 2020-09-17 15:09:40

Cloudflare being the enabler of a user-hostile technology doesn’t make them benevolent, no matter how happy the customers.

yandie | karma 286 | avg karma 2.13 · 2020-09-17 22:28:15+00:00

We can sit here and argue all day around this and we still won't reach an agreement.

On one hand, as a website operator, I want to prevent DDOS, spam etc... for my website. I can implement these solutions myself and do a bad job, or I can use Cloudflare that solves most of then. It's probably going to rule out some of the users as yourself - which is a shame. But until there's a better way to know that a visitor from the internet is not trying to attack the website, I'd have to use something like Cloudflare.

On the other hand, it's not like it's that hard to leave Cloudflare for me - so if there's a better alternative without causing legitimate users pain, I'd be happy to jump on board.

reply

forgotmypw17 | karma 6284 | avg karma 1.96 · 2020-09-17 23:42:06+00:00

I don't think anyone is arguing that preventing DDoS attacks is desirable.

It's about cutting off access to a small segment of users just because it is easier that way.

I think that, similar to wheelchair access, we will continue to push for access to all devices and users as much as possible.

This attitude of "it's just 1%" or "it's just 0.1%" will become just as unacceptable as saying "well, there are only 3 people who need access ramps out of 30, so they need to suck it up and deal with it."

It's up to you which side you want to be on.

reply

maxgashkov | karma 280 | avg karma 3.15 · 2020-09-18 09:06:18+00:00

Disabling JS is a conscious decision. Using wheelchair is not — you cannot just flip a switch and walk once your spine is fucked.

oefrha | karma 17698 | avg karma 5.08 · 2020-09-17 12:12:55

> Any sufficiently large and centralized infrastructure

archive.org is exactly that. With rather poor performance no less.

reply

patcon | karma 2419 | avg karma 2.49 · 2020-09-17 18:45:58+00:00

Agreed. Just a matter of time, personnel, and offering price

Any reality that becomes more adjacent in possibility-space today, is better conceived as the seed of a future you might find yourself living in. Most any protective institution is just a set of measures working against this process -- moving things away from us in possibility-space

So worrying about this stuff is fair game. The only way I'd not worry, is if there was a complementary protective measure that protected us from the future we don't want. But that wasn't part of the announcement, so we're probably all a little bit less secure in getting the world we want. (EDIT: Though maybe funding archive.org would count as that...?)

reply

Drdrdrq | karma 1836 | avg karma 1.89 · 2020-09-17 19:45:40+00:00

> (EDIT: Though maybe funding archive.org would count as that...?)

That reminds me a bit on how Google "funds" Mozilla. Not the same, but when dealing with the devil...

reply

derefr | karma 53572 | avg karma 3.59 · 2020-09-17 20:07:04+00:00

I've always thought that people think about this backwards. Every dollar the devil gives you, is one dollar less the devil has to spend on devilry. Especially when that dollar is just charity, and the devil isn't getting anything out of it. But it also applies when the devil is getting something out of it that satisfies their preferences, as long as that satisfaction isn't displacing a need that frees up any of their other dollars to now be dedicated toward devilry.

Or, to put that another way: if you can charge [infamous politician] a million dollars for a fancy-but-useless painting, you absolutely should do so. Now you have a million dollars; they're out a million dollars; and all they have is a painting!

reply

gbear605 | karma 2974 | avg karma 2.97 · 2020-09-17 17:02:03

The issue comes when the politician comes back to you and says “Hey, I gave you a great deal on that painting, can you do a favor for me?”

Extracting money from bad sources is good as long as you absolutely positively don’t extract anything else. That’s hard to do in any circumstance. However, in this situation I think it’s fine and worth it.

reply

renewiltord | karma 12072 | avg karma 1.39 · 2020-09-17 22:02:11+00:00

Exactly right. And I agree pretty strongly. Unironically, this is why I think accepting money from the Saudi Arabian Sovereign Wealth Fund can be a great force for good. Adam Neumann may (again unironically) be a remarkable hero, accidentally.

kmeisthax | karma 12969 | avg karma 4.69 · 2020-09-18 01:59:49+00:00

This only works if you accept money from every devil that passes you by. If the majority of your funding comes from one devil, it doesn't matter how perfectly normal the underlying business transaction is - the moment you get in the way of your devilry, you're out a job.

Mozilla is a good case study in this: they are financially dependent on Google money to continue browser development. Google hasn't actually intervened in their affairs a whole lot. However, they could, which is why they're going through all sorts of self-inflicted harm trying to get away from their business of selling a browser default to a search engine.

Public companies are even worse, because what they are looking for isn't money, it's more money, or "growth". This is why a lot of American media companies suddenly got really quiet about certain kinds of atrocities committed by certain governments. If you call the devil out on concentration camps full of Uighurs, then maybe he doesn't buy your paintings anymore, and then you're out of the painting biz.

reply

derefr | karma 53572 | avg karma 3.59 · 2020-09-18 02:12:26

You’re talking about being employed by a devil, or maybe receiving continuing patronage from a devil. I’m talking more about having the one-off opportunity to drain a devil’s coffers (whether or not you get the resulting money), without having the ability to turn that into an ongoing relationship.

Basically, this is the other side of the coin to the idea that iterating the Prisoner’s Dilemma gets you the potential for tit-for-tat, and thereby cooperation under expectation of tit-for-tat. In this case, “defecting” against a devil is good — but, just like in the traditional Prisoner’s Dilemma, it’s only practical to defect if the scenario is one-shot.

reply

nuker | karma 1045 | avg karma 1.37 · 2020-09-18 00:35:54+00:00

> But they are a public company, and a new CEO can have different priorities.

And CF have access to decrypted SSL for many sites. Stuff like passwords, personal details, keys and tokens.

reply

therealmarv | karma 4384 | avg karma 2.58 · 2020-09-17 15:48:59+00:00

Just a reminder: Cloudflare with its standard settings is breaking the second and third world countries internet with their captchas on websites. This is discrimination in my opinion. As long as you are only in a first world country you will never notice.

freedomben | karma 22645 | avg karma 3.58 · 2020-09-17 10:58:30

Minor correction: As long as you are only in a first world country [and you don't use a VPN service or Tor] you will never notice.

I point this out because I think it's an anti-privacy "feature" and I wish CF would stop.

reply

meowface | karma 10977 | avg karma 2.45 · 2020-09-18 00:13:35+00:00

They could just block these IPs outright. A massive percentage of attack traffic runs over these networks. Before, site owners would often have to block Tor completely after dealing with enough spam/attacks from exit nodes, and now they can allow actual humans without any sort of complex setup.

I think it's a net win for everyone involved, personally.

reply

judge2020 | karma 18790 | avg karma 2.87 · 2020-09-17 16:02:10+00:00

Isn't this because of ISP-grade NAT where one bad actor (either maliciously or via running something like hola vpn) can kill a few hundred users' IP reputation?

peteretep | karma 19492 | avg karma 3.65 · 2020-09-17 11:04:24

I do almost all of my internet access from a 2nd/3rd world country and hadn’t noticed?

andreareina | karma 4966 | avg karma 2.57 · 2020-09-17 22:52:34

Depends on the country/ISP probably. From the Philippines, Firefox with uBlockOrigin and PrivacyBadger hit captcha walls all the time; all that stopped as soon as I moved to Singapore.

ffpip | karma 4979 | avg karma 4.16 · 2020-09-17 16:06:53+00:00

What else do you expect them to do? Also allow all the bots that use IP's from 2nd and 3rd world countries?

whoknew1122 | karma 1586 | avg karma 3.85 · 2020-09-17 12:26:13

'First world country' means 'aligned with Western democracies in the Cold War' 'Second world country' means 'aligned with the USSR in the Cold War' 'Third world country' means 'unaligned in the Cold War'

Are you talking about countries with under-developed internet infrastructure? That includes swaths of the US....

reply

mschuster91 | karma 19531 | avg karma 1.6 · 2020-09-17 17:32:30+00:00

No, GP means exactly the old classic definition of "second/third world" countries.

surround | karma 2851 | avg karma 6.01 · 2020-09-17 15:49:03+00:00

While Cloudflare actions have been mostly positive, the sheer size of their network is a concern for decentralization.

We can’t trust that they’ll always make good decisions.

I’m glad that the Internet Archive is independent and hope it always remains that way.

reply

johnklos | karma 4665 | avg karma 3.32 · 2020-09-17 11:25:46

They allow and support known spammers and scammers. They make reporting of spammers and scammers arduous. They play dumb when asked about their barriers and repeat inane responses rather than answering questions.

In other words, they have a history of clearly, unambiguously showing themselves to be unapologetic assholes.

reply

britmob | karma 55 | avg karma 1.38 · 2020-09-17 14:54:04+00:00

Wow, this is awesome to see. I hope this doesn’t put a lot of load on the IA, though..

M4v3R | karma 3439 | avg karma 4.13 · 2020-09-17 10:05:15

I would assume that when a site goes offline Cloudflare fetches a snapshot from IA only once and then serves this copy to all further visitors, unless I'm missing something?

Here's a more detailed description of the service from Cloudflare support pages: https://support.cloudflare.com/hc/en-us/articles/200168436

reply

tiffanyh | karma 7280 | avg karma 4.69 · 2020-09-17 14:57:19+00:00

Is this basically Archive.org becoming a customer of Cloudflare CDN to reduce load off their servers?

toomuchtodo | karma 88050 | avg karma 2.82 · 2020-09-17 14:58:39+00:00

It's Archive.org being provided URL telemetry for archiving public sites they have not yet found through traditional means (crawling or users submitting requests through the Wayback Save page) by a Cloudflare product.

The next step would be for Cloudflare to point to Archive.org Wayback links when an origin isn't available (similar to browser extensions that point to Archive.org when sites 404 or are down, but in Cloudflare's core).

Cool stuff. Thanks Cloudflare folks.

reply

GekkePrutser | karma 10335 | avg karma 2.08 · 2020-09-17 15:46:43+00:00

I really doubt their customers would want that. Usually when a page is 404 it's because the company in question wants to forget about it :)

jedberg | karma 72921 | avg karma 5.98 · 2020-09-17 16:31:49+00:00

You would return the archived page for a 5xx error, not a 4xx error.

GekkePrutser | karma 10335 | avg karma 2.08 · 2020-09-17 20:35:10+00:00

Ah I see. But this is precisely a usecase for cloudflare's own caching service.

It wouldn't be fair to use archive.org's community-sponsored resources for propping up businesses which are too cheap to pay for proper IT :)

reply

pronoiac | karma 2333 | avg karma 2.1 · 2020-09-17 22:10:51+00:00

While it's not explicitly mentioned, I think Cloudflare is providing financial support to the Internet Archive.

FlyMoreRockets | karma 1361 | avg karma 2.0 · 2020-09-17 19:49:04

One would hope so. Considering the timing relative to the IA's potentially very expensive legal battle, I full expect this to be the case. Still, considering CF's anti-privacy/anti-TOR stance this is a deal with the devil. Guess I should give money directly to the IA. Considering how much value they provide, I'll do this immediately after updating this post.

shomin | karma 2 | avg karma 1.0 · 2020-09-20 11:25:38+00:00

In what ways is CF anti-privacy?

booleanbetrayal | karma 1037 | avg karma 4.63 · 2020-09-17 15:14:26+00:00

This is actually a really good symbiotic relationship that should foster the archival of a ton more content. Hoping to see this toggle enabled by default at some point.

luckylion | karma 3881 | avg karma 0.72 · 2020-09-17 10:23:35

I don't like the idea of "we're tacking this onto an existing service lots of people have enabled". CF bit me recently by suddenly taking away proxied dns wildcards from free zones, as it's now a premium feature (breaking the security promise in the process by changing the wild card entry to non-proxied). I don't like surprises and opt-out changes in critical infrastructure.

It's one thing to use CF's Always-On service - you're a customer, you know you can remove your data from it. It's another to get the Internet Archive involved, who may or may not remove your data, and may or may not honor robots.txt.

reply

stubish | karma 3211 | avg karma 2.22 · 2020-09-18 00:02:18

Sending the details seems to be tied into clicking the 'Update' button in the Cloud Flare UI, which documents that clicking it you agree. So they might not be sending your PII to a 3rd party until they get your permission. Hopefully any automated updates are not violating customers wishes. Yes, it is annoying the features have been tied together for people who choose to have as little interaction with IA as possible.

lgats | karma 2225 | avg karma 3.73 · 2020-09-17 15:24:51+00:00

Great, until your admin panel is archived...

scrollaway | karma 28520 | avg karma 4.25 · 2020-09-17 15:32:13+00:00

Cloudflare just gives more discovery, it doesn't give IA access to anything that was previously more secure...

"COVID tests: great, until you find a positive result"

reply

lgats | karma 2225 | avg karma 3.73 · 2020-09-17 19:51:00+00:00

| As new URLs are added to sites that use that service they are submitted for archiving to the Wayback Machine.

Yes, this would prevent most order-confirmation pages or otherwise private must-be-logged-in pages from being archived, but it will expose presumed-private URLs that are thought to be unique (tracking numbers, files uploaded with unique names, unique/private image urls that are otherwise publicly accessible)

If you've made efforts to your systems to prevent enumeration-attacks, this could partly bypass them.

reply

jlgaddis | karma 11467 | avg karma 2.4 · 2020-09-17 22:47:12

What I'm hearing is "don't rely on security by obscurity", which I wholeheartedly agree with.

james412 | karma 572 | avg karma 7.53 · 2020-09-17 15:29:54+00:00

Was worried about CF getting their claws dug into archive.org, but on reading, this is a decidedly non-evil deal, actually it sounds wonderful. Still, I worry if there might be some unseen long term interest in the archive.

Never forget Dejanews

reply

ttul | karma 8355 | avg karma 3.91 · 2020-09-17 15:39:47+00:00

Keep in mind how Cloudflare makes most of their money: They sell a web proxy service with security and performance features including a CDN. Cloudflare's interests are furthered by improving that service in ways that help its customers. Keeping the Web Archive healthily stocked with content is aligned with their long term revenue growth.

james412 | karma 572 | avg karma 7.53 · 2020-09-17 16:05:30+00:00

T+10 years I very much expect CloudFlare's core business to have expanded significantly. I remember that time my Googler friend told me they were about to release that one thing they'd absolutely never do, Chrome came out a few weeks later, now look at Firefox

You need to pay attention to the silent positioning of these companies to even guess at where they might go, so deals with things like archive.org may have some unseen substance to them that might only become obvious much later

reply

adventured | karma 39875 | avg karma 3.06 · 2020-09-17 11:13:44

As a business they absolutely are not going to stay in the CDN lane as a primary.

Akamai has $3b in sales and an $18b market cap.

Cloudflare has $348m in sales and an $10.8b market cap.

Akamai is their maximum ceiling if they focus primarily on the CDN segment. Cloudflare is rapidly approaching their valuation ceiling if they stick to CDN as their core (and they'd have to start killing Akamai just to get there; the CDN business is increasingly a slower growth segment in the larger cloud industry).

Companies all around them in the cloud are growing faster, yet few are more important than Cloudflare. Zero question Cloudflare will continue to aggressively branch out, leveraging their critical positioning. In the not-so-distant future CDN will not be the center of their business. CDN is and will remain a springboard for them, a gateway drug, milk at the back of the grocery store.

reply

toomuchtodo | karma 88050 | avg karma 2.82 · 2020-09-17 11:55:00

Great comment. Cloudflare is not a CDN. They are an edge computing platform that happens to offer CDN services. Could Akamai grow into that market faster than Cloudflare can consume it? TBD.

brynx97 | karma 110 | avg karma 3.67 · 2020-09-17 18:36:47+00:00

Edge computing is super interesting, and today's CDN providers should be able to provide it given their current infrastructure deployment. It could really bring in the next era of computing and technology once certain networks/providers reach critical mass to provide edge services within 5-10ms to customers.

peterburkimsher | karma 4193 | avg karma 2.53 · 2020-09-17 23:44:17+00:00

If jgrahamc is reading this, I'd really like to know if Cloudflare wants to work with telcos.

Imagine a small server in every cell tower, with locally-cached maps/Wikipedia/latest movies.

Some communication couldn't be cached (e.g. real-time video calls), but a lot of broadcast media could be. Of course there are copyright implications, and it might require partnering with Netflix or others.

The quick load times would be great for users, and the reduced load on the backbone would be good for the telecom companies.

If you'd like me to chat to some friends in telcos in New Zealand about this, drop me an email. It's not my job now (I'm in IoT) but I know who to talk to if you'd like to get this kind of thing moving.

reply

mcdevilkiller | karma 165 | avg karma 1.12 · 2020-09-18 00:53:42+00:00

AFAIK Netflix (and YouTube and others) already do this edge caching. They partner with telcos, as you said.

toomuchtodo | karma 88050 | avg karma 2.82 · 2020-09-18 03:02:32+00:00

https://openconnect.netflix.com/en/

Nemo_bis | karma 334 | avg karma 1.27 · 2020-09-30 07:10:32+00:00

Kiwix does this for Wikipedia, other Wikimedia projects and other free content projects, through "Kiwix hotspot" (based on kiwix-serve). https://www.kiwix.org/en/downloads/kiwix-hotspot/

tedivm | karma 13893 | avg karma 8.36 · 2020-09-17 17:08:41+00:00

Akamai is not their ceiling because Akamai doesn't serve all segments of the market.

I'm fairly critical of Cloudflare for a lot of resources, but one thing I think they did right was focus on the SMB market with plans that were actually affordable to the average business. They targeted customers that companies like Akamai pretended didn't exist. Even now they have the cheapest plan available, and once they consolidate the market even further they can start raising those prices.

reply

adventured | karma 39875 | avg karma 3.06 · 2020-09-17 18:59:56+00:00

Akamai is their ceiling in CDN because they have a much higher value segment of the business, representing a drastically larger share of all dollars in the CDN space. Their business is nine times the size of Cloudflare because their customers are far more lucrative.

If Cloudflare holds onto all of their already considerable number of customers, and then kills Akamai and somehow takes all of Akamai's business, the combination will be a mere 10% larger than Akamai already is now. There is your general indie ceiling in action, with all segments combined (and Cloudflare isn't going to monopolize the entire CDN business besides).

All you need to know to spot the independent CDN ceiling is that Cloudflare + Fastly + Akamai = $3.6 billion in sales (with the understanding that it's a slowly increasing ceiling, as the CDN market is still growing). The ceiling in that space for Cloudflare just can't realistically be much larger than that combined group and that's not much larger than where Akamai is already at. The only way this isn't the case, is if you project Cloudflare knocks off most competitors and takes the market (they can't, Amazon, Microsoft, Google among other giants, are standing in the way of that outcome).

It'll take Cloudflare a small lifetime to get to $3 billion in sales in the CDN space at the rate they're growing (they're adding ~$8m-$10m per quarter in growth (all of which obviously isn't CDN), so maybe it'll only take a few decades with some compounding). It took Akamai 22 years to get there with very high value customers and a pretty nice open field for many of those years.

Akamai in absolute dollar terms is growing faster than Cloudflare + Fastly combined. The CDN ceiling is actually running away from Cloudflare at present. That shouldn't be happening.

Cloudflare knows full well CDN isn't their brightest business future. It's why so much of their expansion effort is going into everything else. Given the way they price-structured their CDN from day one, Cloudflare has always known CDN was a lure and the upside was in sprawling outward from it. Come for the CDN, stay for the workers or whatever preferably higher margin thing we can sell you on. It's also why they're not interested in / worried about trying to make money on domain registrations, as with SSL before that. They'll happily murder the margins in foreign services all day long (areas where they don't compete, but there is margin to wipe out cost effectively, and with customers to lure in), so long as they can occasionally launch a new service where they have a distinct advantage and can convert their base to use it and increase total revenue per customer in the process.

What would be a better path: if Cloudflare could own a big part of Akamai's CDN business by trying to aggressively climb up the ladder from an unassailable price-value position Akamai doesn't want to go down to, like an ARM eating an Intel from the feet upward; or just leave the snoring giant alone to keeping snoozing in his enterprise tower while Cloudflare busies itself sprawling out in many directions, leveraging the volume of customers that Akamai doesn't want to (and or can't) go after because they're not viewed as lucrative enough? I think what Cloudflare can find outside of the CDN business, is likely to be more valuable than what's inside the CDN business, very long-term speaking.

And if you're Akamai and you let Cloudflare get far enough along with that sprawling (likely already too late), how about if they drop your CDN legs out from under you. Cloudflare builds out many other legs to stand on, so they flip the switch on the margin and kill the CDN market for the independents, as they were willing to do with domains and SSL. Free CDN, all tiers, all features. They can't do that today, they might be able to do it tomorrow. The CDN market becomes the SSL market, and as a totally free lure it accelerates a rush into Cloudflare's other more exclusive services (including for larger, lucrative enterprise customers). Surely this switch has been pondered inside of Cloudflare, road-mapped as a potential.

reply

devy | karma 10269 | avg karma 3.98 · 2020-09-17 13:35:52

> As a business they absolutely are not going to stay in the CDN lane as a primary.

Yeah, and all the big five Cloud vendors: AWS, Azure, GCP, IBM, Oracle all have their own CDN solutions bundled. Hard to make a case to purchase separate CDN solutions.

reply

eldridgea | karma 375 | avg karma 4.36 · 2020-09-17 14:37:33

I'm not sure about all the providers but Amazon's CloudFront CDN product has additional costs, so it's "bundled" but not in the sense that it's free, only that it's integrated.

And one of Cloudflare's selling points imo is the multi-cloud customers. Use AWS all the way but Cloudflare as your CDN and you could swtich to GCP seamlessly. Or route traffic based on pricing etc. I think you're right they will/have absolutely branch out from CDN but I think their CDN product is actually compelling especially to bigger companies that are more afraid of Amazon that they are of Cloudflare.

(Other interesting point - it's worth noting that IBM's CDN is essentially white labeled Cloudflare).

reply

altdatathrow | karma 669 | avg karma 5.72 · 2020-09-17 16:28:59+00:00

They control SSL decryption for a massive number of websites. Governments will gladly fund Cloudflare for eternity.

judge2020 | karma 18790 | avg karma 2.87 · 2020-09-17 12:54:30

Akamai as well then?

dimitrios1 | karma 2135 | avg karma 2.65 · 2020-09-17 14:04:25

Much of the US Government already uses it, so yes.

mlindner | karma 2855 | avg karma 0.76 · 2020-09-17 17:52:01

If your root CA is subject to the laws of a government that can take the root certificates and MITM the connection with those root CAs that's not much better. Cloudflare just makes it easier.

firloop | karma 9072 | avg karma 15.27 · 2020-09-17 18:19:30

Certificate Transparency makes this significantly harder to do stealthily. I’m not convinced that Cloudflare is a deep state operation either, but Cloudflare's ability to secretly MITM is a position afforded to a select few, and certainly not every CA.

abiogenesis | karma 523 | avg karma 1.4 · 2020-09-18 00:42:26+00:00

It's much easier (and virtually undetectable) to MITM when you are also the reverse proxy though.

tus88 | karma 566 | avg karma 0.81 · 2020-09-17 17:53:46

More like a web blocker "service". It is profoundly unhelpful to me that a proxy service cares if I have Javascript disabled in my browser.

prawnsalad | karma 304 | avg karma 4.22 · 2020-09-17 23:39:21+00:00

That’s the website that has manually enabled a feature if it requires Javascript. Cloudflare does not require Javascript out of the box.

FlyMoreRockets | karma 1361 | avg karma 2.0 · 2020-09-18 00:16:03+00:00

Please clarify. I thought all those captcha puzzles were coming from Cloudflare. Are you saying they are only enabled if the destination page has JS?

wadkar | karma 328 | avg karma 1.33 · 2020-09-18 00:47:47+00:00

I believe GP is referring to a setting that a cloudflare user has to flip for requiring visitors to enable JavaScript

KingOfCoders | karma 8559 | avg karma 2.83 · 2020-09-17 16:11:45+00:00

When users are used to this (getting redirected to a archived copy when the site is down/not available) and when this trial baloon has been proved to work, Cloudflare will replace archive.org with their own infrastructure. This is the common game plan.

sp332 | karma 55607 | avg karma 2.75 · 2020-09-17 11:31:00

Doesn't CF already have an "Always Online" feature using their own infrastructure? So this seems like the opposite happening.

eternalban | karma 6518 | avg karma 1.44 · 2020-09-17 16:41:49+00:00

Previous iteration:

https://en.wikipedia.org/wiki/Google_Groups#Google_Groups

reply

jgrahamc | karma 89756 | avg karma 10.41 · 2020-09-17 11:51:14

Uh, no. We're literally doing the opposite. We used to have our own caching infrastructure for "Always Online" and we're getting rid of it and using archive.org instead.

jorams | karma 3294 | avg karma 5.25 · 2020-09-17 11:59:46

How do you handle robots.txt? The previous incarnation of Always Online didn't care about robots.txt, while archive.org does.

jgrahamc | karma 89756 | avg karma 10.41 · 2020-09-17 17:01:42+00:00

https://blog.cloudflare.com/cloudflares-always-online-and-th...

We tell archive.org about the URI, they crawl it. They handle robots.txt.

reply

AnonHP | karma 1676 | avg karma 3.85 · 2020-09-18 14:03:21+00:00

archive.org doesn't handle robots.txt in any meaningful way (see my comment above at https://news.ycombinator.com/item?id=24516875 ). If that's changed recently, I'd like to know more.

AnonHP | karma 1676 | avg karma 3.85 · 2020-09-18 14:02:16+00:00

Note that archive.org stopped respecting robots.txt since 2017. [1]

In my experience, the site owner must email archive.org support to be excluded from its crawler and archiving.

[1]: https://boingboing.net/2017/04/22/internet-archive-to-ignore...

reply

KingOfCoders | karma 8559 | avg karma 2.83 · 2020-09-17 12:48:01

"We're literally doing the opposite."

How does what you do now contradict what you will do in the future? What legal assurances are there that you won't do hat when you leave? (See Facebook/Oculus "no Facebook account promise")

reply

cortesoft | karma 23619 | avg karma 3.91 · 2020-09-17 18:04:04

Wait... so you think Cloudflare's master plan is to roll this new thing out to get people to accept it as normal, and then suddenly make a big shift to.... what they currently have?

Why don't they skip this step and just keep what they have now, then? No one seems to be up in arms that they currently provide their customers offline caching...

reply

throwaway56909 | karma 2 | avg karma 2.0 · 2020-09-17 17:49:16+00:00

And thank god for it. Trying to explain to end users why their site was not, in fact, always online on account of the creaking behemoth that plodded along in IAD barely managing to successfully cache and serve anything ever was never any fun.

The original Always Online infra was long unloved and probably kept on life support far too long for lack of want to deprecate an early feature.

reply

Nemo_bis | karma 334 | avg karma 1.27 · 2020-09-30 07:16:05+00:00

Thanks, so maybe this page is outdated where it mentions your own crawler with user-agent? Or does the Internet Archive use it for these crawls? https://www.cloudflare.com/always-online/

neop1x | karma 439 | avg karma 0.68 · 2020-09-18 00:15:08+00:00

Not long ago, CF has been blocking access from Tor. And they are blocking access from my web crawler sometimes. I don't like CF as they act as a police or gatekeeper to the origin website, deciding who to penalize and who do not, while pretending to be speeding up websites and protecting from 'threads'.

viraptor | karma 41139 | avg karma 2.79 · 2020-09-18 03:01:04+00:00

> while pretending to be speeding up websites and protecting from 'threads'.

They do though. That's why people pay them lots of money to do those two things. Not sure what part you think is "pretending"?

reply

gogopuppygogo | karma 1102 | avg karma 2.32 · 2020-09-18 03:12:43+00:00

One of the first 100 people to use cloudflare when it launched.

Paying them today to speed up a couple of websites while protecting them.

They rock at making big things possible for very small companies.

reply

trevorsstar | karma 5 | avg karma 2.5 · 2020-09-22 20:39:02+00:00

Hey, me too! Do you have the first-users t-shirt?

lwansbrough | karma 5115 | avg karma 5.38 · 2020-09-17 22:19:14

They’re acting more as a security guard. Which is to say that they’re intentionally employed by the owner of the property you’re trying to enter. Often specifically to “bounce” users like you, malicious or not. Believe it or not there are legitimate reasons for wanting only real human users on your website!

cookiengineer | karma 4969 | avg karma 3.65 · 2020-09-18 02:59:57

What worries me is that Cloudflare is deanonymizing a huge load of TOR users, and the issue that comes with it is that a huge part of TOR users actually needs access to the web archive due to country-wide DNS censorships (European countries included).

As Cloudflare is deanonymizing TOR users pretty much with every website that's hosted on it, I fear they are abusing that power once again to deanonymize users of the web archive.

Cloudflare always claims it's not their issue and that it's a webmaster setting with the shitty captchas and Google's infamous Prism-sponsored PREFS cookie - but to be honest they should just not have implemented it in the first place if privacy was a core value of their company.

The "DDoS" protection basically fingerprints a machine and user inside an encrypted HTTPS connection; which makes the encryption tunnel itself obsolete.

reply

raxxorrax | karma 8264 | avg karma 1.37 · 2020-09-18 08:03:05+00:00

I don't know really. Cloudflare is notoriously in conflict with different archive sites and now this announcement makes that sound not too credible.

I think we will see selective removal of certain content.

reply

resynth1943 | karma 775 | avg karma 10.92 · 2020-09-20 10:28:59

> Was worried about CF getting their claws dug into archive.org

SAME. From the title, I assumed the Wayback Machine would be using Cloudflare. Nice prank, boys.

reply

varbhat | karma 3311 | avg karma 9.41 · 2020-09-17 15:44:27+00:00

Good News.

I also recommend using Internet archive addon in browser. Clicking on it would archive the website. That way, you can archive pages you visit.

reply

surround | karma 2851 | avg karma 6.01 · 2020-09-17 10:56:34

Or use this bookmarklet:

  javascript:window.location="https://web.archive.org/save/"+location.href

sp332 | karma 55607 | avg karma 2.75 · 2020-09-17 16:38:34+00:00

I use https://web.archive.org/web/submit?url=%s and set the keyword to "rez". That way I can type "rez example.com" and it will send me to the archived version.

jwilk | karma 8094 | avg karma 2.47 · 2020-09-18 05:50:16+00:00

Why "rez"?

null0pointer | karma 1199 | avg karma 3.66 · 2020-09-18 08:25:52+00:00

Short for "resurrect" maybe?

sp332 | karma 55607 | avg karma 2.75 · 2020-09-18 13:27:27+00:00

OkGoDoIt | karma 2631 | avg karma 3.94 · 2020-09-17 11:37:52

Another nice feature is when you hit a page that is a 404, it will automatically try to load it from Wayback if available

buildbuildbuild | karma 1946 | avg karma 6.78 · 2020-09-17 16:06:29+00:00

This should be made very clear to Cloudflare users, ideally a warning next to the Always Online checkbox.

"Always Online" now can mean "Archive Forever" - even when a site is pre-launch.

reply

judge2020 | karma 18790 | avg karma 2.87 · 2020-09-17 11:14:04

From the blog post, an image of the checkbox: https://lh6.googleusercontent.com/J42AtNZv8xNcyQPPefVywiAGEh...

jorams | karma 3294 | avg karma 5.25 · 2020-09-17 11:57:59

It always has. If your site is publicly available and you don't disallow bots through robots.txt, they can crawl it at any time. Even if the site is "pre-launch", because that doesn't mean anything on its own.

zeta0134 | karma 4843 | avg karma 5.94 · 2020-09-17 17:11:25+00:00

And of course, remember that robots.txt is only a signal to benevolent bots which respect it. If you have secrets to keep, don't put them online in the first place.

archgoon | karma 5777 | avg karma 4.2 · 2020-09-17 17:41:10+00:00

Or properly authenticate (and audit) access.

hinkley | karma 39933 | avg karma 2.46 · 2020-09-17 20:18:12+00:00

Looking at my Splunk logs and then asking a lot of questions, I have learned that there are a LOT of not so benevolent bots that must be tolerated anyway.

Benevolence is a continuum.

reply

blacksmith_tb | karma 4448 | avg karma 1.65 · 2020-09-17 19:25:30

In fact robots.txt is a list of things a nefarious crawler will absolutely want to examine - no need to know those paths when they're all laid out for you!

ghaff | karma 44589 | avg karma 1.65 · 2020-09-17 17:16:17+00:00

I'd just add that, while major players like the Internet Archive do respect robots.txt, it's essentially just a flag that depends on people voluntarily respecting it. If a site is publicly available but you don't want people to find it, you're just depending of security through obscurity.

AnonHP | karma 1676 | avg karma 3.85 · 2020-09-18 14:06:11+00:00

Internet Archive stopped respecting robots.txt since 2017. See https://boingboing.net/2017/04/22/internet-archive-to-ignore...

f311a | karma 1920 | avg karma 8.0 · 2020-09-17 17:55:16+00:00

Webarchive completely ignores robots for a few years now. They did it on purpose.

AnonHP | karma 1676 | avg karma 3.85 · 2020-09-18 09:05:07

I have serious issues with this and the fact that site owners have to email a human support team in archive.org to be excluded.

JackC | karma 2874 | avg karma 6.81 · 2020-09-17 17:12:30+00:00

Yeah, I definitely expect this to bite some people, if I'm understanding correctly. A plausible scenario (among many) would be: soft launch a site, show it to some early stakeholders, have Wayback archive everything via Always Online, fix embarrassing screwups or oversharing in soft-launched version, publicize site more broadly, everyone in the world can rewind to version zero, regrets. I don't think the existing warnings really make clear that a soft launch is now a forever launch.

jgrahamc | karma 89756 | avg karma 10.41 · 2020-09-17 17:13:38+00:00

The solution to this is... robots.txt. Otherwise your site might turn up in Google etc. Since it's archive.org that's doing the crawling and they respect robots.txt it won't get archived.

symfoniq | karma 794 | avg karma 4.62 · 2020-09-17 17:26:49+00:00

Archive.org does not respect robots.txt IIRC. I’ve run into this problem before with them. Ironically, I ended up blocking Internet Archive’s ASN using Cloudflare.

EDIT: Internet Archive started ignoring robots.txt in 2017: https://www.digitaltrends.com/computing/internet-archive-rob...

reply

kalleboo | karma 20793 | avg karma 2.58 · 2020-09-18 02:33:30+00:00

They only started ignoring robots.txt on US government websites (as that article also says)

symfoniq | karma 794 | avg karma 4.62 · 2020-09-18 08:39:33

That is not what the article says.

It says Internet Archive had already started ignoring robots.txt on US government websites.

Now (since 2017) they ignore it on all websites.

reply

capableweb | karma 37790 | avg karma 4.15 · 2020-09-17 20:01:27+00:00

I think that's fine. The reason we fix screwups is so the next people who arrive don't see them. We don't fix screwups to hide that sometimes we fuck up. If someone goes out of their way to find old screwups, then so be it. As long as not the majority of people see it, we're mostly fine.

renewiltord | karma 12072 | avg karma 1.39 · 2020-09-17 22:00:11+00:00

Most people password-protect this. It's very common. If you contract a webdev for something, he will recommend it for you 100%. Not the basic auth thing, just a shared secret. Something trivial.

josefresco | karma 13528 | avg karma 2.71 · 2020-09-17 11:55:43

Side rant: Sure would be nice if the Wayback Machine showed actual snapshots of web pages, instead of "hybrid" snapshots where they combine old with new (maybe it's a setting?). I recently horked a website, and thought to check the Wayback Machine. Curiously, an edit I made that day was showing on snapshots dating back several years. Until I discovered how the WBM worked, I was pulling my hair out.

judge2020 | karma 18790 | avg karma 2.87 · 2020-09-17 18:20:29+00:00

This is probably due to XHR's. The IA loads all JS, so if a website hard-codes the URL or does other complicated XHR stuff the IA might not be equipped to save the response for those, if they do at all.

vaccinator | karma 109 | avg karma 0.4 · 2020-09-17 12:46:10

If only we could get the NSA to publish their archive of public data.

est31 | karma 15854 | avg karma 4.44 · 2020-09-18 04:01:49+00:00

There is hope. With the help of then-senator Al Gore, the CIA made photographs available to researchers it had made of the polar regions to search for soviet nuclear installations. They became valuable for climate research later on.

throwawaysea | karma 3853 | avg karma 1.18 · 2020-09-17 19:51:40+00:00

Has Cloudflare clarified their stance on which content they allow and disallow? They’ve wavered in the past and given how their service is basically critical infrastructure for the internet, I really want them to commit to free speech, avoid deplatforming, and avoid exceeding legal minimums.

nikisweeting | karma 2396 | avg karma 2.46 · 2020-09-17 20:10:08+00:00

Next step, have CloudFlare start mirroring IA on their own servers so we have some redundancy in case IA ever goes bankrupt.

Ideally it would be a non-profit that does it, but as a last resort CF is one of the few companies I'd trust to do it right and do it transparently.

reply

sebow | karma 130 | avg karma 0.29 · 2020-09-17 20:48:51

Unlike the Wayback Machine, Archive.is for example does not censor & remove certain pages because of "hate-speech" and other motives that are more or less political.

Shame really, but i guess compromises are necessary to stay in "business", even though an internet archive should be exactly that, updated but never removed.

reply

SimeVidas | karma 1504 | avg karma 1.91 · 2020-09-17 16:50:18

Just how big are Internet Archive’s servers? I can’t fathom how they’re able to store so much of the web in so many versions.

013 | karma 167 | avg karma 1.8 · 2020-09-18 09:19:54+00:00

The Wayback Machine uses 9.6 PetaBytes. Total storage is 50 PetaBytes.

https://archive.org/web/petabox.php

reply

dualboot | karma 540 | avg karma 2.74 · 2020-09-17 22:14:27+00:00

Cloudflare is really neat unless you find yourself mysteriously blacklisted by them as a user.

Then suddenly the web is a much smaller place.

reply

pabs3 | karma 43824 | avg karma 6.39 · 2020-09-17 23:03:48+00:00

You can use archive.org to bypass the Cloudflare blocklist, especially considering the save page feature.

amelius | karma 42902 | avg karma 1.63 · 2020-09-17 22:23:23+00:00

Perhaps they can start archiving YouTube.

zmix | karma 764 | avg karma 1.34 · 2020-09-17 17:55:25

Whatever fills web.archive.org is good!

Though, it would be nice if someone invented technology, that can erase all the 404 pages and redirects, that are archived, as well, as soon the page goes offline. Maybe a job for AI?

reply

feralimal | karma -15 | avg karma -0.03 · 2020-09-18 04:07:35

No.

Keep historical revisionism out of archive.org.

reply

mcdevilkiller | karma 165 | avg karma 1.12 · 2020-09-17 19:56:10

Strange that the blog doesn't have https redirection.

j45 | karma 3932 | avg karma 0.91 · 2020-09-17 22:51:42

Encouraging to see this kind of partnership this day in age. May it never forget why it started and only improve on it.

stubish | karma 3211 | avg karma 2.22 · 2020-09-18 04:47:50+00:00

Interesting to find that, when I checked to see if I was using the feature, I had already agreed to the supplemental terms saying my information will be shared with IA.

(For others who need to opt out, https://support.cloudflare.com/hc/en-us/articles/200168436-U... describes how to disable "Always Online". There doesn't seem to be a way to turn off just the information sharing.)

reply

nomercy400 | karma 1039 | avg karma 3.3 · 2020-09-18 02:41:30

Who owns Cloudflare? And what are they valued at?

I mean, these deals make them look cool and altruistic, but what happens when BigCompany offers them enough money to sell?

reply

feralimal | karma -15 | avg karma -0.03 · 2020-09-18 09:05:18+00:00

Perhaps we are getting a sugared pill? Perhaps CF are genuinely being useful here, but in order to gain trust to act nefariously in future?

I don't feel comfortable with their ability to switch off parts of the internet, nor in this case, that they have their hands near what is preserved for posterity.

As they say: "Cloudflare has become core infrastructure for the Web, and we are glad we can be helpful in making a more reliable web for everyone." They are indeed powerful.

I'm concerned that they are becoming gatekeepers to information, under the guise of providing a better internet service. They are able to operate at a level deeper than the odious restrictions youtube, facebook et al enforce on free speech.

reply

feralimal | karma -15 | avg karma -0.03 · 2020-09-18 09:28:57+00:00

I'm being downvoted - but we have seen major 'book burnings' on youtube, etc where billions of comments and videos have been purged. These are private platforms and can do what they like, so in a way that's acceptable as it is within their terms of service.

CF is a level deeper than that. This is a company that can effectively shut down the internet for companies and individuals. And now they are involved with archive.org? Should we be concerned about online historical revisionism as that relationship matures?

I feel uncomfortable that CF seems to be positioning itself as a guardian to all information - not at an application level, but at an architectural level.

Cloudflare is shaping up to be a key tool that an authoritarian government requires. And I'm concerned about it!

reply

josefrichter | karma 1044 | avg karma 2.57 · 2020-09-18 10:21:29+00:00

Could someone please automatically activate this for all content linked from HN? It happens all the time that many of the first page links are down due to traffic spike.

raybb | karma 9072 | avg karma 5.74 · 2020-09-19 15:26:41+00:00

From what I can tell, all links submitted are automatically archived.

borrame | karma 5 | avg karma 1.67 · 2020-09-18 06:39:44

Cloudflare is not vpn friendly.

I'm a privacy concerned vpn user and in my daily browsing I have to deal dozens of times a day with cloudflare captchas or in some cases with cloudflare total blocking.

reply

keepingscore | karma 95 | avg karma 3.65 · 2020-09-18 11:57:11+00:00

Is using this chrome/fx addon a option for your use case?

https://support.cloudflare.com/hc/en-us/articles/11500199265...

reply

jamescampbell | karma 42 | avg karma 0.48 · 2020-09-18 19:58:53

My heart shuddered when I read the headline. I can’t be alone in the fear.

resynth1943 | karma 775 | avg karma 10.92 · 2020-09-20 15:34:24+00:00

This clearly isn't to create some utopic 'more reliable Web'. In fact, Cloudflare severely undermines that, by pushing their centralised view of what the internet should be.

I was hopeful, but after reading this:

> “The Internet Archive’s Wayback Machine has an impressive infrastructure that can archive the web at scale,” said Matthew Prince, co-founder and CEO of Cloudflare. “By working together, we can take another step toward making the Internet more resilient by stopping server issues for our customers and in turn from interrupting businesses and users online.”

It's plain to see that this is a money-making venture for Cloudflare. While I do like the added functionality, I personally can't see how this 'improves' the Wayback Machine. It's just going to place more load on it.

reply