We tried to use Cloudflare when they teamed with Dreamhost a few months ago. We had more downtime than uptime....
Though this is super-relevant because during the struggle with Cloudflare, we released an article about LOIC and how easy it is to reveal the locations and identities of individuals involved in a DDoS attack using LOIC.
Sorry for the trouble you had. Did you submit a support ticket to CloudFlare? Sounds like something was blocking requests from our network.
LOIC and a number of the more public DDoS tools make the attackers' identities relatively easy to track. The big attack we saw last Saturday is much more difficult to trace both because it is originated with a UDP request (the headers of which can be forged) and because it is reflected off open resolvers (essentially laundering the identity of the attack's source).
Honestly we were confused by the whole partnership. We kept submitting tickets to Dreamhost which they did a poor job of fulfilling. We liked the idea of Cloudflare and we were mostly willing to stick it out. Not only that, but we had several great ideas about how we might leverage your offering against some other things that we wanted to accomplish.
The real reason that we stopped using Cloudflare is that when you were unable to serve the cached page, it throws up a Cloudflare branded page. We thought that those sorts of error pages would diminish our image as an open source publication because it seems to suggest that we can't rely on our own tools and abilities.
If we were able to display a "Powerbase" branded 404 type page, we would have been more satisfied.
If you're a paying customer, you can fully customize the error pages. Probably easiest if you just sign up directly through us if you want to do something like that. We can get you setup:
I also had problems when I tried CloudFlare. Different problems, but it cost me a few days contacting users and apologising before I U-turned and switched the nameservers back.
I unfortunately use PayPal and their Instant Notification thing, basically a callback to a web page with a POST about the transaction that just happened. Upon receiving the POST I can then do things like notify the customer, dispatch goods, award virtual goods, etc.
The problem I had was that after putting my website (in the London Linode datacenter) behind CloudFlare, PayPal started randomly failing to reach the callback page.
PayPal, being PayPal, failed silently for a few days before finally sending an email to say that they couldn't notify me of transactions. I figured it out just before that though, because users were getting the CloudFlare "site offline" message.
The pages on my site are 100% dynamic, so nothing was cached by the feature that keeps a site online.
My biggest problem with CloudFlare was visibility for debugging: I had no visibility.
If it wasn't for my users letting me know and PayPal emailing me confirming what I thought... I wouldn't have known. Even then it took too long to find out, over a week from when it started to fail silently.
According to Linode there was no downtime in that period, and according to my server logs load was never above average and there was no reason it should've been unable to be reached.
Did I submit a ticket? I submitted some questions beforehand and got back answers that were very friendly but not technically detailed. That's also how I found the interface, I couldn't debug using CloudFlare, no way to answer the questions "What is happening? Why is it happening?". So no, when I figured it out I wasn't going to stay with CloudFlare as even if it was resolved I would still lack visibility for future problem-solving.
In the end it was costing me goodwill with my users.
I wanted things you didn't provide:
How often did CloudFlare fail to contact my network?
Can I see a chart of such failures over time?
What were the failure error codes and times so that I can cross-reference them to my logs?
Basically: I wanted transparency so I could have confidence in the service, and detail so that I can debug failures when they occur.
And I was going to email this as it's really a "just to let you know". But you have no email in your HN profile, and looking through the support emails I had I see tenderapp.com and can't make a guess what your email address may be, and I pinged you on Twitter but no response and it's very late here... so posting it so you can see.
If you add ways for developers to debug issues when using CloudFlare then I may well be tempted back in the future. The fundamental premise is a good one and I really wanted it to work (paid for the Enterprise level, had every intention of using it). But when failing silently costs real money and customer goodwill, I don't feel I had a choice but to U-turn very rapidly.
As soon as I was off CloudFlare, PayPal Instant Payment Notification worked again and there hasn't been a single failure since.
I had similar experiences with CloudFlare. Wanted to use them, but they could give me absolutely 0 visibility into why they thought my site was down. I could load it in another tab using the raw address while CloudFlare kept serving the couldn't contact page.
Support was fairly unhelpful, basically just saying "sometimes this happens, and it usually clears up".
For this to work you can use the direct.[domain].[tld] (or any record you create with the grey indicator) record which does not get served by CloudFlare. When you use CloudFlare with the orange indicator, your traffic will be proxied through their network... this can mean when calls look similar, they will be cached. you either have to tweak the settings, or as aforementioned, avoid CloudFlare when dealing with callbacks.
Ah, but then PayPal IPN callbacks would get through, but the users would still get broken pages.
Or are you suggesting bypassing CloudFlare for all dynamic content? In which case, why use them?
I actually did use them just for a CDN for a few more days. I use a second domain (sslcache.se) to proxy inline images in user generated content that doesn't originate from an SSL site but the user is viewing my site over SSL. Similar to this https://github.com/atmos/camo , except mine isn't written in node.js
Even then, users complained of broken images when they were logged in (on SSL).
The very basic thing: CloudFlare fails silently, sometimes. Well, that still happened with very basic content. From CloudFlares perspective the sslcache.se service was a site of static files, as I prime the content (download in the background) when a user submits the content. By the time the request for an item goes to CloudFlare the image is already being served from a local file system.
CDN only, it still errored enough that the end user noticed. And still without any information for me to resolve it.
I do use CloudFlare for a large amount of websites (dynamic content) and have not had a similar experience.
If CDN is your only purpose? this mostly only useful for static files like JS, CSS and HTML. Somehow your answer is not clear what you try to accomplish. Too little information to go by. If you experience issues with their service, bother them more to get it resolved. Any gain you have improves the service for others.
In the worst case you can always look at other services. I am not sure but I believe Amazon CloudFront provides a similar solution.
I wholeheartedly agree. The lack of transparency is CloudFlare's single largest problem from my perspective. I really wish the analytics package offered more than just a few pretty aggregate charts. Insight into failed requests to my backend servers would be a major help in troubleshooting, and with integration into their API, could serve as great early warning for when things are going wrong at the CDN level.
I'd also love to see the option of periodic uploads of raw traffic logs to an S3 bucket or similar. Something akin to how AWS CloudFront handles raw logs. I believe raw logs are currently available on their enterprise packages, but this seems like basic functionality (that would provide much of the needed transparency) and should be included in all of the paid plans.
The amplification is by asking the misconfigured resolver about a DNSSEC zone.
Basically, DNSSEC just mean you do not need to search the for a large zone to request. Given that large zones are not directly in shot supply, and that searching for them is (in the age of ipv4) rather easy, I wonder if DNSSEC actually have any affect on the issue what so ever.
DNSSEC-signed responses can be very large. Here's an example of turning a 31 byte request into a 3974 byte response: http://dnscurve.org/amplification.html That's ~128x amplification -- with a 100Mbps connection, roughly 12.8Gbps of responses would be sent to the forged IP source.
I feel bad that I'm not currently paying for Cloudflare; I use it on a few sites but they don't have any traffic worth adding the extra fee for. However it's an excellent service and something I recommend often; hopefully I'll have something to make better use of it in the future :)
Who are these irresponsible network operators that allow spoofed source addresses out of their network? The only way to make a reflection attack like this work is to make the responses go back to the victim. For that to happen, it has to look like they generated the request.
Remember smurf? Spoof-ping a broadcast address for a multiplication effect. It's from 1997 or so. 15 years later and we're still living with that kind of problem.
Re-read the blog post. He's speculating the attack used DNS. (Though he has no proof.) In that case, with UDP, spoofed headers are allowed out. Connectionless.
Cloudflare uses anycast DNS - machines in different geographically located data centers all sharing the same IP.
If you want to try to make your site DOS-proof (and potentially faster), one way is to move the site to the network edge. Move the data closer to the user. Put a copy on a machine in the data center nearest the user. Do this in data centers around the world. ("CDN") Give all the machines the same IP. ("Anycast") Your users will be accessing a mirrored copy of your site at some regional data center, instead of actually sending requests that go out to the internet. Does Google do this? Akamai? Netflix?
Next time you access a popular website ask yourself "Am I actually accessing the internet? Or am I just downloading a copy of something from a local data center?"
A lot of these services are just marketing. In theory they sound great, but things may be different in practice. And that's why we frequently see comments that things did not work as expected.
I did some CDN experiments downloading pages using Akamai where I accessed content on the "true IP address" (the master copy so to speak) versus the regional IP address they provide through stupid DNS tricks. Guess which one was faster?
It all depends on caching: what is in the cache and what isn't. Same applies to DNS. A DNS caching server (resolver) is only faster than non-caching DNS server (authoritative) if it's primed with the records you're after. If they are not in the cache, it will not be faster. In fact, it will be slower because there are more steps to the process.
These strategies are often based on 80/20, power law thinking. If you are not in the 20 percent of content being accessed 80 percent of the time, then you do not see the benefits. If no one in your region has requested a given page, and you're the first, it will be slower to wait for it to be cached at your regional data center than if you just grabbed it from the internet.
> In that case, with UDP, spoofed headers are allowed. Connectionless.
rachelbythebay is asking which ISPs allow spoofed UDP packets.
The way this attack works is you send a query to an open resolver, using the target's IP address as the "source" IP address in the UDP header instead of your own.
However, ISPs can (and should) block UDP packets where the source IP address is outside the IP-blocks they own. Why don't ISPs do this?
I'm not really sure what the rest of your post has to do with any of this.
So you are saying it is common practice to block outgoing UDP packets based on source IP? I did not know this. Does your ISP do that? Everyone is expected to block ingress with spoofed IP. But I can't find an BCP for blocking UDP egress based on source IP. Does it exist?
As for the rest the comment, this appears to be an "informational advertising" style marketing piece for Cloudflare so I think it's relevant.
Right. As the top post also points out. There's no way to distinguish incoming UDP traffic as "spoofed".
My question is does anyone filter UDP egress based on source IP? Is there guidance somewhere that tells admins to do this?
Let me put it another way: If it was a workable solution to get admins to do this - to filter outgoing UDP based on source IP, then why are people trying to get network admins to change their DNS server settings as a way to reduce the possibility of DNS-based DDOS? That seems like a far more difficult task given that there hundreds of thousands of open resolvers and most admins understand working with firewall rulesets better than DNS configuration.
Sure you can at the carrier level, it's called unicast reverse path forwarding. Any incoming packets from a network with a source IP not being advertised via BGP by that network would be spoofed.
One person accessing data from "origin" is faster then getting it from the "cdn" but the cacheing lets you leverage bandwidth. Instead of building 100gbps of capacity I can instead build a origin server with 5gbps of capacity and let the CDN do the heavy lifting.
If every user where to go around the cdn then the site would break.
What's good for the server isn't always good for the client.
If you're running a site, CDN's sound great. If you're an internet user, CDN's might not always be so welcome.
But if CDN's didn't work pretty good from the client's perspective, if they didn't generally speed things up for most users, sites would not use them. So they do work pretty good, in general. But there are exceptions.
At a certain point, when you are getting enough traffic to "break" the site, then it makes sense to move the data closer to the edge. A data center or even at the ISP. And that's exactly what we see big sites doing. I'm not so sure every site using a CDN is in that category though. Some might just be trying to save a few bucks on bandwidth. Others might lack the know-how to run a high traffic site. Some sites I see on Akamai just use them for their www subdomain that gets used in links from articles in other high volume sites. Meanwhile the same content is on the host (sans www).
If we move all data from all sites to the edge of the network, then sure, things will be faster, in general.
I'm OK with that. Then maybe the internet gets more use as a communication channel than as a distribution channel for commercial content: a global TV network.
Most large service providers disallow ICMP (which prevents these types of ping attacks), unfortunately that also messes with path MTU discovery. One potential way of dealing with these bot nets forging source IPs is to perform an analysis on the IP TTLs -- if a single source is sending you a large amount of traffic, I wouldn't expect to see widely varying IP TTLs as the majority of packets should take the same number of hops to reach you. If a bot net army, distributed around the globe, is spoofing the source IP, I'd expect to see a wide variance in the IP TTLs and thus know that something was up (and thus blackhole traffic claiming to be from that source IP).
Large service providers do not disallow ICMP at the network level. TCP/IP relies on ICMP for a number of purposes other than PING (ICMP ECHO), and a network would be broken without it.
"We know, for example, that we haven't sent any DNS inquiries out from our network. We can therefore safely filter the responses from DNS resolvers. We can therefore drop the response packets at our routers or, in some cases, even upstream at one of our bandwidth providers. The result is that these types of attacks are relatively easily mitigated."
This seems to be a bit more specific and does indeed give insight on how to mitigate this type of attack.
I am surprised that the article did not mention egress filtering alongside closing open resolvers. If more edge routers did proprer egress filtering these attacks would be harder to pull off.
From my short experience using cloudflare I believe they do. If a comprised computer tries to visit a site that has dns routing through cloudflare, they serve up a page that says their computer might be comprised and they have to enter a captcha to verify its human traffic to enter the site.
With a Layer 4 attack, we don't always know for sure what site is the target of the attack. Traffic is being directed at an IP address and many customers may share that IP. We have techniques to scatter customers across IPs and watch attacks follow, but we generally only do that if the attack is causing a problem. In this case, we were able to mitigate it at our edge and upstream to the point that we didn't need to do any shuffling so from this alone we probably wouldn't have known exactly who the target was. From what happened next, when they attacker shifted to a Layer 7 attack, we were able to determine the target and did reach out to the customer to let them know what was going on and what we were going to do to protect them.
(i love these posts; i'm old + jaded and have no specialist knowledge of networks and protocols, but they're like the spaghetti westerns of the internet age :o)
Another question is how many repeated attacks does it take for an ISP to drop a customer over this. Surely, DDoS doesn't come cheap on the receiving end, even if it can be stopped in a timely manner.
Akamai recently deflected an attack on the scale of 1 TBit/s and is present in pretty much every DC (~1000 POPs). CloudFlare has 23 POPs and brags about handling 65 GBit/s...
there are some public studies on DDOS size over the years. IIRC Arbor networks published some numbers earlier this year. IME tens of Gb/s is a mid size attack these days. Large is a hundred plus Gb/s.
I can't find any public reports topping ~100 Gbps. For example, here's a recent article that interviewed an Arbor employee and mentions a ~100 Gbps plateau:
Hrm, maybe Im misrembering. Found akamai claiming 125Gb/s in '09, and some more arbor quotes of ~110Gb/s. I'll see if I still have the report Im thinking of tomorrow.
Akamai got into that many POPs by "giving away free bandwidth saving appliances" to the ISPs rather than paying to host their commercial edge CDN in a their DCs. Full marks to whoever came up with that spin! (This was circa 1999/2000).
Part of me has to wonder, how wise is it to attack somebody such as Cloudflare? I know they are a juicy target. But, part of their job is to learn and defend against downtime. If their ops are worth a salt (and it appears they are), they've been logging every bit of information they can about these attacks. Logging allows them to do two things:
1) Learn how to mitigate the attack in the future
2) Catalog data on botnets
Cataloging data on these botnets is one sure way to get them shut down.
Billy Bob's Bike Repair, because I know I can bring their site down. I also know that I won't be able to bring down CF or sites running behind their infrastructure so no g33k cred to be had, etc.
Yup. We've got tons of data on the machines launching these attacks. We're working on a plan to begin publishing it so even sites not on CloudFlare can better protect themselves. Stay tuned.
Because they're doing it intentionally and knowingly and have worked to mitigate the risks (as detailed on their site - https://developers.google.com/speed/public-dns/docs/security) unlike the vast majority of the open resolvers which have done so unintentionally or without understanding.
The real solution is not to use open resolvers and open caches, full stop. Run your own cache on localhost. djb has always advised against third party DNS, but people don't listen. I've even caught the author of the DNS/BIND book admitting it's a smart idea. Moreover you'll be immune from DNS poisoning.
Only if it's not already in your cache or in /etc/hosts
You can put hosts file on RAM disk.
With some servers it's also possible to save and reload caches.
Assuming you're not doing hundreds of thousands of new lookups (sites you've never visited before) every day, it's very easy to configure a system for yourself that is faster than any open resolver.
There is one trade-off: if sites switch IP's without telling their users (preferring instead to wait for ttl's in open caches to expire) then for those sites that like to hop from IP to another unexpectedly, you need to monitor for this. This is rare though, and you can try to safeguard against it by "pinging" less oft visited sites periodically, but it does happen occasionally.
DNS was supposed to replace hand-maintained hosts files. And I don't think running my own DNS is as reliable as using a professionally-configured endpoint like Google or OpenDNS. Plus it's inconvenient. Plus I don't see how it prevents DNS poisoning?
Well, I could prove it to you. Side by side speed test.
As for reliability, if you lose access to your "professioinally-configured endpoint" you're SOL. You can't do lookups (assuming you don't know how to do them by hand). Meanwhile I'm unaffected.
I would not have done this for myself and tell you about it if it wasn't faster. I'm not gathering info on users or selling anything. I'm not telling you what to do proclaiming I'm an "expert". I'm just an end user, like you.
When you use an open resolver, you are sharing a cache with everyone else who uses it. Some might do nefarious things to the cache.
When you use a resolver listening on 127.0.0.1 you are sharing it with whoever can access localhost on that port. i.e., no one (hopefully)
"professionally-configured" C'mon. You sound like a marketer's dream. Be a hobbyist. Be a hacker. Experiment. Thinks for yourself. Or don't.
I will grant that a hosts file or a local DNS server is faster than a 3rd-party DNS, assuming I have the site in my cache already. But if I visit a new site, I'm going to have to look it up somewhere, and the DNS server that my local DNS server contacts could have the exact same cache poisoning problem.
Nope. (Granted, this is possible if the domain name resgitrant designates an open resolver as authoritative but almost no one does that. The reasons should be obvious.)
How much time have you devoted to learning how DNS works?
Not much. My main question is: how is the DNS server that my DNS server asks for an address, different from the DNS server that my DNS client asks for an address?
The one your client queries can both answer requests and forward them to other servers.[1] The one your caching server queries can only answer queries. That's an oversimplification because people use some ridiculously convoluted DNS configurations and these features can be mixed and matched in any server (e.g BIND), but hopefully it answers your question, if I understand it correctly.
1. But you could just as easily only query servers of the second type (ones that only answer queries, often called "authoritative" servers). I have written programs to do this as it is very fast and IME more reliable than using resolvers.
I tried to send a udp packet with fake source Ip(no evil, i am not a attacker;),but i was failed. I seems that the router of the datacenter censor the packets and drop it; who can taught me how to make it?
I may definitely be missing something here, but I find it difficult to believe not a single packet from that attack made it to their network or affected their operations. I understand how the amplifications were mitigated, but how do you distinguish between legitimate and illegitimate traffic and then block just the illegitimate?
I ran an MMO a while ago, and we would have a few hundred login packets spammed every minute. When we were DDoS'd, I responded by moving my server to a larger line (1 gbps) since the DDoS itself wasn't nearly as massive. Yet, we had no way of figuring out (at a base level) what was a legitimate packet.
As mentioned in the article, they dropped all packets that looked like responses from DNS resolvers. All client applications hosted in cloudfare shouldn't normally receive responses from DNS resolvers.
Yesterday I posted a post mortem on an outage we had
Saturday. The outage was caused when we applied an overly
aggressive rate limit to traffic on our network while
battling a determined DDoS attacker
Kudos for documenting what you did and what worked.
I'm curious about the observed PPS rate. 65 Gb/s is annoyingly large, but network interfaces generally hit pps limits first.
The bandwidth graph in this post and post mortem entry is quite interesting. A lot of incoming bytes from customer origins. I'd guess the system cache hit ratio is only 60-70% at peak, dropping to maybe 20-30% during trough. From that I would assume the cache width is quite small, maybe 8-12 hours LRU?
I could be misreading that if the average object size is closer to 5kb than 50kb, or if a large number of customers are using it a proxy only fashion.
Though this is super-relevant because during the struggle with Cloudflare, we released an article about LOIC and how easy it is to reveal the locations and identities of individuals involved in a DDoS attack using LOIC.
http://www.thepowerbase.com/2012/03/low-orbit-ion-cannon-exp...
reply