Normally I'd be upset about Cloudflare getting involved in anything good and pure like archive.org but this relationship, just suggesting new URLs to archive, seems harmless enough.
Wow, zero coverage about their attempts to centralize key parts of the Internet under their control and how many people got outraged about that, but plenty on the trivial outrage of CF supplying services to customers they found offensive. I'm kind of tempted to update this section.
Why establish a false dichotomy? One can be just as mad about them abusing their position to attack free speech in a censorious way, as one is about their other efforts to create a monopoly over a large swathe of internet traffic.
I get it, nobody wants to defend badthink, but it's a hell of a lot easier philosophically to defend all political speech regardless of content than it is to try to pick and choose and make some moral case.
Set aside the free speech stuff which involves groups inciting violence - while we can agree that it's bad, deciding what to do will inevitably lead to wildly divergent ideas.
Cloudflare says that it's "free speech" for their customers to claim to be Adobe offering a "Flash Updater", or for their customers to claim to be Bank of America.
That is how evil Cloudflare is - they allow clear and unambiguous crime for pennies.
We can sit here and argue all day around this and we still won't reach an agreement.
On one hand, as a website operator, I want to prevent DDOS, spam etc... for my website. I can implement these solutions myself and do a bad job, or I can use Cloudflare that solves most of then. It's probably going to rule out some of the users as yourself - which is a shame. But until there's a better way to know that a visitor from the internet is not trying to attack the website, I'd have to use something like Cloudflare.
On the other hand, it's not like it's that hard to leave Cloudflare for me - so if there's a better alternative without causing legitimate users pain, I'd be happy to jump on board.
I don't think anyone is arguing that preventing DDoS attacks is desirable.
It's about cutting off access to a small segment of users just because it is easier that way.
I think that, similar to wheelchair access, we will continue to push for access to all devices and users as much as possible.
This attitude of "it's just 1%" or "it's just 0.1%" will become just as unacceptable as saying "well, there are only 3 people who need access ramps out of 30, so they need to suck it up and deal with it."
Agreed. Just a matter of time, personnel, and offering price
Any reality that becomes more adjacent in possibility-space today, is better conceived as the seed of a future you might find yourself living in. Most any protective institution is just a set of measures working against this process -- moving things away from us in possibility-space
So worrying about this stuff is fair game. The only way I'd not worry, is if there was a complementary protective measure that protected us from the future we don't want. But that wasn't part of the announcement, so we're probably all a little bit less secure in getting the world we want. (EDIT: Though maybe funding archive.org would count as that...?)
I've always thought that people think about this backwards. Every dollar the devil gives you, is one dollar less the devil has to spend on devilry. Especially when that dollar is just charity, and the devil isn't getting anything out of it. But it also applies when the devil is getting something out of it that satisfies their preferences, as long as that satisfaction isn't displacing a need that frees up any of their other dollars to now be dedicated toward devilry.
Or, to put that another way: if you can charge [infamous politician] a million dollars for a fancy-but-useless painting, you absolutely should do so. Now you have a million dollars; they're out a million dollars; and all they have is a painting!
The issue comes when the politician comes back to you and says “Hey, I gave you a great deal on that painting, can you do a favor for me?”
Extracting money from bad sources is good as long as you absolutely positively don’t extract anything else. That’s hard to do in any circumstance. However, in this situation I think it’s fine and worth it.
Exactly right. And I agree pretty strongly. Unironically, this is why I think accepting money from the Saudi Arabian Sovereign Wealth Fund can be a great force for good. Adam Neumann may (again unironically) be a remarkable hero, accidentally.
This only works if you accept money from every devil that passes you by. If the majority of your funding comes from one devil, it doesn't matter how perfectly normal the underlying business transaction is - the moment you get in the way of your devilry, you're out a job.
Mozilla is a good case study in this: they are financially dependent on Google money to continue browser development. Google hasn't actually intervened in their affairs a whole lot. However, they could, which is why they're going through all sorts of self-inflicted harm trying to get away from their business of selling a browser default to a search engine.
Public companies are even worse, because what they are looking for isn't money, it's more money, or "growth". This is why a lot of American media companies suddenly got really quiet about certain kinds of atrocities committed by certain governments. If you call the devil out on concentration camps full of Uighurs, then maybe he doesn't buy your paintings anymore, and then you're out of the painting biz.
You’re talking about being employed by a devil, or maybe receiving continuing patronage from a devil. I’m talking more about having the one-off opportunity to drain a devil’s coffers (whether or not you get the resulting money), without having the ability to turn that into an ongoing relationship.
Basically, this is the other side of the coin to the idea that iterating the Prisoner’s Dilemma gets you the potential for tit-for-tat, and thereby cooperation under expectation of tit-for-tat. In this case, “defecting” against a devil is good — but, just like in the traditional Prisoner’s Dilemma, it’s only practical to defect if the scenario is one-shot.
Just a reminder: Cloudflare with its standard settings is breaking the second and third world countries internet with their captchas on websites. This is discrimination in my opinion. As long as you are only in a first world country you will never notice.
They could just block these IPs outright. A massive percentage of attack traffic runs over these networks. Before, site owners would often have to block Tor completely after dealing with enough spam/attacks from exit nodes, and now they can allow actual humans without any sort of complex setup.
I think it's a net win for everyone involved, personally.
Isn't this because of ISP-grade NAT where one bad actor (either maliciously or via running something like hola vpn) can kill a few hundred users' IP reputation?
Depends on the country/ISP probably. From the Philippines, Firefox with uBlockOrigin and PrivacyBadger hit captcha walls all the time; all that stopped as soon as I moved to Singapore.
'First world country' means 'aligned with Western democracies in the Cold War'
'Second world country' means 'aligned with the USSR in the Cold War'
'Third world country' means 'unaligned in the Cold War'
Are you talking about countries with under-developed internet infrastructure? That includes swaths of the US....
They allow and support known spammers and scammers. They make reporting of spammers and scammers arduous. They play dumb when asked about their barriers and repeat inane responses rather than answering questions.
In other words, they have a history of clearly, unambiguously showing themselves to be unapologetic assholes.
I would assume that when a site goes offline Cloudflare fetches a snapshot from IA only once and then serves this copy to all further visitors, unless I'm missing something?
It's Archive.org being provided URL telemetry for archiving public sites they have not yet found through traditional means (crawling or users submitting requests through the Wayback Save page) by a Cloudflare product.
The next step would be for Cloudflare to point to Archive.org Wayback links when an origin isn't available (similar to browser extensions that point to Archive.org when sites 404 or are down, but in Cloudflare's core).
One would hope so. Considering the timing relative to the IA's potentially very expensive legal battle, I full expect this to be the case. Still, considering CF's anti-privacy/anti-TOR stance this is a deal with the devil. Guess I should give money directly to the IA. Considering how much value they provide, I'll do this immediately after updating this post.
This is actually a really good symbiotic relationship that should foster the archival of a ton more content. Hoping to see this toggle enabled by default at some point.
I don't like the idea of "we're tacking this onto an existing service lots of people have enabled". CF bit me recently by suddenly taking away proxied dns wildcards from free zones, as it's now a premium feature (breaking the security promise in the process by changing the wild card entry to non-proxied). I don't like surprises and opt-out changes in critical infrastructure.
It's one thing to use CF's Always-On service - you're a customer, you know you can remove your data from it. It's another to get the Internet Archive involved, who may or may not remove your data, and may or may not honor robots.txt.
Sending the details seems to be tied into clicking the 'Update' button in the Cloud Flare UI, which documents that clicking it you agree. So they might not be sending your PII to a 3rd party until they get your permission. Hopefully any automated updates are not violating customers wishes. Yes, it is annoying the features have been tied together for people who choose to have as little interaction with IA as possible.
| As new URLs are added to sites that use that service they are submitted for archiving to the Wayback Machine.
Yes, this would prevent most order-confirmation pages or otherwise private must-be-logged-in pages from being archived, but it will expose presumed-private URLs that are thought to be unique (tracking numbers, files uploaded with unique names, unique/private image urls that are otherwise publicly accessible)
If you've made efforts to your systems to prevent enumeration-attacks, this could partly bypass them.
Was worried about CF getting their claws dug into archive.org, but on reading, this is a decidedly non-evil deal, actually it sounds wonderful. Still, I worry if there might be some unseen long term interest in the archive.
Keep in mind how Cloudflare makes most of their money: They sell a web proxy service with security and performance features including a CDN. Cloudflare's interests are furthered by improving that service in ways that help its customers. Keeping the Web Archive healthily stocked with content is aligned with their long term revenue growth.
T+10 years I very much expect CloudFlare's core business to have expanded significantly. I remember that time my Googler friend told me they were about to release that one thing they'd absolutely never do, Chrome came out a few weeks later, now look at Firefox
You need to pay attention to the silent positioning of these companies to even guess at where they might go, so deals with things like archive.org may have some unseen substance to them that might only become obvious much later
As a business they absolutely are not going to stay in the CDN lane as a primary.
Akamai has $3b in sales and an $18b market cap.
Cloudflare has $348m in sales and an $10.8b market cap.
Akamai is their maximum ceiling if they focus primarily on the CDN segment. Cloudflare is rapidly approaching their valuation ceiling if they stick to CDN as their core (and they'd have to start killing Akamai just to get there; the CDN business is increasingly a slower growth segment in the larger cloud industry).
Companies all around them in the cloud are growing faster, yet few are more important than Cloudflare. Zero question Cloudflare will continue to aggressively branch out, leveraging their critical positioning. In the not-so-distant future CDN will not be the center of their business. CDN is and will remain a springboard for them, a gateway drug, milk at the back of the grocery store.
Great comment. Cloudflare is not a CDN. They are an edge computing platform that happens to offer CDN services. Could Akamai grow into that market faster than Cloudflare can consume it? TBD.
Edge computing is super interesting, and today's CDN providers should be able to provide it given their current infrastructure deployment. It could really bring in the next era of computing and technology once certain networks/providers reach critical mass to provide edge services within 5-10ms to customers.
If jgrahamc is reading this, I'd really like to know if Cloudflare wants to work with telcos.
Imagine a small server in every cell tower, with locally-cached maps/Wikipedia/latest movies.
Some communication couldn't be cached (e.g. real-time video calls), but a lot of broadcast media could be. Of course there are copyright implications, and it might require partnering with Netflix or others.
The quick load times would be great for users, and the reduced load on the backbone would be good for the telecom companies.
If you'd like me to chat to some friends in telcos in New Zealand about this, drop me an email. It's not my job now (I'm in IoT) but I know who to talk to if you'd like to get this kind of thing moving.
Akamai is not their ceiling because Akamai doesn't serve all segments of the market.
I'm fairly critical of Cloudflare for a lot of resources, but one thing I think they did right was focus on the SMB market with plans that were actually affordable to the average business. They targeted customers that companies like Akamai pretended didn't exist. Even now they have the cheapest plan available, and once they consolidate the market even further they can start raising those prices.
Akamai is their ceiling in CDN because they have a much higher value segment of the business, representing a drastically larger share of all dollars in the CDN space. Their business is nine times the size of Cloudflare because their customers are far more lucrative.
If Cloudflare holds onto all of their already considerable number of customers, and then kills Akamai and somehow takes all of Akamai's business, the combination will be a mere 10% larger than Akamai already is now. There is your general indie ceiling in action, with all segments combined (and Cloudflare isn't going to monopolize the entire CDN business besides).
All you need to know to spot the independent CDN ceiling is that Cloudflare + Fastly + Akamai = $3.6 billion in sales (with the understanding that it's a slowly increasing ceiling, as the CDN market is still growing). The ceiling in that space for Cloudflare just can't realistically be much larger than that combined group and that's not much larger than where Akamai is already at. The only way this isn't the case, is if you project Cloudflare knocks off most competitors and takes the market (they can't, Amazon, Microsoft, Google among other giants, are standing in the way of that outcome).
It'll take Cloudflare a small lifetime to get to $3 billion in sales in the CDN space at the rate they're growing (they're adding ~$8m-$10m per quarter in growth (all of which obviously isn't CDN), so maybe it'll only take a few decades with some compounding). It took Akamai 22 years to get there with very high value customers and a pretty nice open field for many of those years.
Akamai in absolute dollar terms is growing faster than Cloudflare + Fastly combined. The CDN ceiling is actually running away from Cloudflare at present. That shouldn't be happening.
Cloudflare knows full well CDN isn't their brightest business future. It's why so much of their expansion effort is going into everything else. Given the way they price-structured their CDN from day one, Cloudflare has always known CDN was a lure and the upside was in sprawling outward from it. Come for the CDN, stay for the workers or whatever preferably higher margin thing we can sell you on. It's also why they're not interested in / worried about trying to make money on domain registrations, as with SSL before that. They'll happily murder the margins in foreign services all day long (areas where they don't compete, but there is margin to wipe out cost effectively, and with customers to lure in), so long as they can occasionally launch a new service where they have a distinct advantage and can convert their base to use it and increase total revenue per customer in the process.
What would be a better path: if Cloudflare could own a big part of Akamai's CDN business by trying to aggressively climb up the ladder from an unassailable price-value position Akamai doesn't want to go down to, like an ARM eating an Intel from the feet upward; or just leave the snoring giant alone to keeping snoozing in his enterprise tower while Cloudflare busies itself sprawling out in many directions, leveraging the volume of customers that Akamai doesn't want to (and or can't) go after because they're not viewed as lucrative enough? I think what Cloudflare can find outside of the CDN business, is likely to be more valuable than what's inside the CDN business, very long-term speaking.
And if you're Akamai and you let Cloudflare get far enough along with that sprawling (likely already too late), how about if they drop your CDN legs out from under you. Cloudflare builds out many other legs to stand on, so they flip the switch on the margin and kill the CDN market for the independents, as they were willing to do with domains and SSL. Free CDN, all tiers, all features. They can't do that today, they might be able to do it tomorrow. The CDN market becomes the SSL market, and as a totally free lure it accelerates a rush into Cloudflare's other more exclusive services (including for larger, lucrative enterprise customers). Surely this switch has been pondered inside of Cloudflare, road-mapped as a potential.
> As a business they absolutely are not going to stay in the CDN lane as a primary.
Yeah, and all the big five Cloud vendors: AWS, Azure, GCP, IBM, Oracle all have their own CDN solutions bundled. Hard to make a case to purchase separate CDN solutions.
I'm not sure about all the providers but Amazon's CloudFront CDN product has additional costs, so it's "bundled" but not in the sense that it's free, only that it's integrated.
And one of Cloudflare's selling points imo is the multi-cloud customers. Use AWS all the way but Cloudflare as your CDN and you could swtich to GCP seamlessly. Or route traffic based on pricing etc. I think you're right they will/have absolutely branch out from CDN but I think their CDN product is actually compelling especially to bigger companies that are more afraid of Amazon that they are of Cloudflare.
(Other interesting point - it's worth noting that IBM's CDN is essentially white labeled Cloudflare).
If your root CA is subject to the laws of a government that can take the root certificates and MITM the connection with those root CAs that's not much better. Cloudflare just makes it easier.
Certificate Transparency makes this significantly harder to do stealthily. I’m not convinced that Cloudflare is a deep state operation either, but Cloudflare's ability to secretly MITM is a position afforded to a select few, and certainly not every CA.
When users are used to this (getting redirected to a archived copy when the site is down/not available) and when this trial baloon has been proved to work, Cloudflare will replace archive.org with their own infrastructure. This is the common game plan.
Uh, no. We're literally doing the opposite. We used to have our own caching infrastructure for "Always Online" and we're getting rid of it and using archive.org instead.
archive.org doesn't handle robots.txt in any meaningful way (see my comment above at https://news.ycombinator.com/item?id=24516875 ). If that's changed recently, I'd like to know more.
How does what you do now contradict what you will do in the future? What legal assurances are there that you won't do hat when you leave? (See Facebook/Oculus "no Facebook account promise")
Wait... so you think Cloudflare's master plan is to roll this new thing out to get people to accept it as normal, and then suddenly make a big shift to.... what they currently have?
Why don't they skip this step and just keep what they have now, then? No one seems to be up in arms that they currently provide their customers offline caching...
And thank god for it. Trying to explain to end users why their site was not, in fact, always online on account of the creaking behemoth that plodded along in IAD barely managing to successfully cache and serve anything ever was never any fun.
The original Always Online infra was long unloved and probably kept on life support far too long for lack of want to deprecate an early feature.
Thanks, so maybe this page is outdated where it mentions your own crawler with user-agent? Or does the Internet Archive use it for these crawls?
https://www.cloudflare.com/always-online/
Not long ago, CF has been blocking access from Tor. And they are blocking access from my web crawler sometimes. I don't like CF as they act as a police or gatekeeper to the origin website, deciding who to penalize and who do not, while pretending to be speeding up websites and protecting from 'threads'.
They’re acting more as a security guard. Which is to say that they’re intentionally employed by the owner of the property you’re trying to enter. Often specifically to “bounce” users like you, malicious or not. Believe it or not there are legitimate reasons for wanting only real human users on your website!
What worries me is that Cloudflare is deanonymizing a huge load of TOR users, and the issue that comes with it is that a huge part of TOR users actually needs access to the web archive due to country-wide DNS censorships (European countries included).
As Cloudflare is deanonymizing TOR users pretty much with every website that's hosted on it, I fear they are abusing that power once again to deanonymize users of the web archive.
Cloudflare always claims it's not their issue and that it's a webmaster setting with the shitty captchas and Google's infamous Prism-sponsored PREFS cookie - but to be honest they should just not have implemented it in the first place if privacy was a core value of their company.
The "DDoS" protection basically fingerprints a machine and user inside an encrypted HTTPS connection; which makes the encryption tunnel itself obsolete.
It always has. If your site is publicly available and you don't disallow bots through robots.txt, they can crawl it at any time. Even if the site is "pre-launch", because that doesn't mean anything on its own.
And of course, remember that robots.txt is only a signal to benevolent bots which respect it. If you have secrets to keep, don't put them online in the first place.
Looking at my Splunk logs and then asking a lot of questions, I have learned that there are a LOT of not so benevolent bots that must be tolerated anyway.
In fact robots.txt is a list of things a nefarious crawler will absolutely want to examine - no need to know those paths when they're all laid out for you!
I'd just add that, while major players like the Internet Archive do respect robots.txt, it's essentially just a flag that depends on people voluntarily respecting it. If a site is publicly available but you don't want people to find it, you're just depending of security through obscurity.
Yeah, I definitely expect this to bite some people, if I'm understanding correctly. A plausible scenario (among many) would be: soft launch a site, show it to some early stakeholders, have Wayback archive everything via Always Online, fix embarrassing screwups or oversharing in soft-launched version, publicize site more broadly, everyone in the world can rewind to version zero, regrets. I don't think the existing warnings really make clear that a soft launch is now a forever launch.
The solution to this is... robots.txt. Otherwise your site might turn up in Google etc. Since it's archive.org that's doing the crawling and they respect robots.txt it won't get archived.
Archive.org does not respect robots.txt IIRC. I’ve run into this problem before with them. Ironically, I ended up blocking Internet Archive’s ASN using Cloudflare.
I think that's fine. The reason we fix screwups is so the next people who arrive don't see them. We don't fix screwups to hide that sometimes we fuck up. If someone goes out of their way to find old screwups, then so be it. As long as not the majority of people see it, we're mostly fine.
Most people password-protect this. It's very common. If you contract a webdev for something, he will recommend it for you 100%. Not the basic auth thing, just a shared secret. Something trivial.
Side rant: Sure would be nice if the Wayback Machine showed actual snapshots of web pages, instead of "hybrid" snapshots where they combine old with new (maybe it's a setting?). I recently horked a website, and thought to check the Wayback Machine. Curiously, an edit I made that day was showing on snapshots dating back several years. Until I discovered how the WBM worked, I was pulling my hair out.
This is probably due to XHR's. The IA loads all JS, so if a website hard-codes the URL or does other complicated XHR stuff the IA might not be equipped to save the response for those, if they do at all.
There is hope. With the help of then-senator Al Gore, the CIA made photographs available to researchers it had made of the polar regions to search for soviet nuclear installations. They became valuable for climate research later on.
Has Cloudflare clarified their stance on which content they allow and disallow? They’ve wavered in the past and given how their service is basically critical infrastructure for the internet, I really want them to commit to free speech, avoid deplatforming, and avoid exceeding legal minimums.
Unlike the Wayback Machine, Archive.is for example does not censor & remove certain pages because of "hate-speech" and other motives that are more or less political.
Shame really, but i guess compromises are necessary to stay in "business", even though an internet archive should be exactly that, updated but never removed.
Though, it would be nice if someone invented technology, that can erase all the 404 pages and redirects, that are archived, as well, as soon the page goes offline. Maybe a job for AI?
Interesting to find that, when I checked to see if I was using the feature, I had already agreed to the supplemental terms saying my information will be shared with IA.
Perhaps we are getting a sugared pill? Perhaps CF are genuinely being useful here, but in order to gain trust to act nefariously in future?
I don't feel comfortable with their ability to switch off parts of the internet, nor in this case, that they have their hands near what is preserved for posterity.
As they say: "Cloudflare has become core infrastructure for the Web, and we are glad we can be helpful in making a more reliable web for everyone." They are indeed powerful.
I'm concerned that they are becoming gatekeepers to information, under the guise of providing a better internet service. They are able to operate at a level deeper than the odious restrictions youtube, facebook et al enforce on free speech.
I'm being downvoted - but we have seen major 'book burnings' on youtube, etc where billions of comments and videos have been purged. These are private platforms and can do what they like, so in a way that's acceptable as it is within their terms of service.
CF is a level deeper than that. This is a company that can effectively shut down the internet for companies and individuals. And now they are involved with archive.org? Should we be concerned about online historical revisionism as that relationship matures?
I feel uncomfortable that CF seems to be positioning itself as a guardian to all information - not at an application level, but at an architectural level.
Cloudflare is shaping up to be a key tool that an authoritarian government requires. And I'm concerned about it!
Could someone please automatically activate this for all content linked from HN? It happens all the time that many of the first page links are down due to traffic spike.
I'm a privacy concerned vpn user and in my daily browsing I have to deal dozens of times a day with cloudflare captchas or in some cases with cloudflare total blocking.
This clearly isn't to create some utopic 'more reliable Web'. In fact, Cloudflare severely undermines that, by pushing their centralised view of what the internet should be.
I was hopeful, but after reading this:
> “The Internet Archive’s Wayback Machine has an impressive infrastructure that can archive the web at scale,” said Matthew Prince, co-founder and CEO of Cloudflare. “By working together, we can take another step toward making the Internet more resilient by stopping server issues for our customers and in turn from interrupting businesses and users online.”
It's plain to see that this is a money-making venture for Cloudflare.
While I do like the added functionality, I personally can't see how this 'improves' the Wayback Machine. It's just going to place more load on it.
reply