Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Why we've doubled down on AWS and the cloud (blog.codeship.io) similar stories update story
53.0 points by lfittl | karma 1577 | avg karma 8.66 2013-09-24 16:09:38+00:00 | hide | past | favorite | 41 comments



view as:

Great post! Just discovered another nice piece about CI on their blog > http://blog.codeship.io/2013/09/12/the-codeship-workflow-par...

Seems damn legit!


Thanks, going immutable with all of our infrastructure made our system way more stable and productive

Hosting your own servers doesn't require foregoing virtualization and containers. All the tools listed in the article that aid deployment can be used on your own hardware, and should be! Also, if you suddenly need 10 or 100 new machines, EC2 is still there - you've not sworn it off forever.

Using the cloud, your own servers, or both should be a deliberate decision based on cost and your real business needs. Not thinking about this seriously is doing yourself a disservice.


This indeed. A smart company will do a cost-benefit analysis, not blindly go to the cloud, or 'double-down'. There are great benefits to doing things (maybe even your entire environment) on 'The Cloud', but sometimes physical hardware (rented or purchased) will be a better investment in the long-run. Also, Hybrid Cloud is big at the moment (physical servers for day-to-day traffic and either dev tasks on the cloud, or expansion/failover).

>A smart company will do a cost-benefit analysis, not blindly go to the cloud

But often times those cost-benefit analysis don't take into account how quickly you can improve and work on your infrastructure. The performance or cost improvements need to be a lot to slow down your team even a bit.

Especially for startups this can hit them very hard.


Not sure that this would really change the analysis that much. If you were buying physical servers and installing them yourself then sure, this could very well (perhaps even will) cause slowdown. Most everyone that does bring up using physical hardware for start-ups on here instead discuss leasing managed dedicated servers. You can just as easily spin instances on those up/down as needed, and many of the solution providers out there will even have Openstack/VMWare/etc. available pre-installed so you don't have to do any of that legwork.

I think engineering for 'The Cloud' can be just as time consuming if not more so for start-ups. For Amazon at least you have to engineer for failure. With a dedicated server configuration you still do need to engineer for that, but you aren't worrying about if depending on the solution providers block storage is going to take certain aspects of your architecture offline when it goes down (yet again), or if the portal won't allow you to bring up/down new instances during a very critical period of time, etc.


It's a risk shift.

Generally speaking, most sysadmins at small places do a horrible job engineering for failure. Or, worse, they don't have professional SA's, just devs who know enough to be dangerous.

So while you don't run into a US-East EBS failure every now and again that affects millions of servers, you are subject to random unforseen failures because somebody didn't do X correctly. (Where X could include: disk configuration, HA configuration, DR plan, backup/restore plan, os management, firewall management, etc)

There's no one right answer here. I've rolled specific hardware solutions managed by a single team for certain applications, and everything was fine. I've been stuck working within the bounds of a managed services provider, and things worked too. Cloud is just another model.

EC2 lets you roll a globally distributed solution with good tooling for low cost.


>EC2 lets you roll a globally distributed solution with good tooling for low cost

This is the Crux. There are other options available, but going cloud makes it very easy to roll out something big without the necessity to have experts in all the lower level parts of your infrastructure. It lets you focus and the price for that is absolutely reasonable


I fully agree on the SA stuff. Sadly more often than not it's some dev who knows enough to be really dangerous. I make a good amount of money by coming in and mopping up after those kinds of scenarios (sadly... I'd rather make less money up-front by helping in the planning phase). I will say however that while yes, it's a risk shift you can still have and will have plenty of those risks that are shifted to the cloud still unaccounted for. For instance, backups. Just because your servers are now on 'the cloud' does not mean they're immune from losing all of their data if the instance dies, especially on EC2. On EC2 it's generally just a 'whoops, bye!' scenario. At least with Rackspace cloud it might be a 'whoops, restore from a VM image snapshot from last week'.

I think we've reached a point where most small companies can and should blindly use the cloud for everything that can be done remotely. There are only two situations where the cloud shouldn't be used:

1. If you're using tons of servers, you might benefit from hosting your own hardware on a large scale. But even then you can probably negotiate with the cloud service for a better price (like Netflix did). If you're a big customer and you think you should move off the cloud, the provider can beat your expected costs and still make a profit. Your team doesn't have thousands of years of combined hardware experience, and you're not buying hardware at a better price than Amazon got.

2) You don't want The NSA snooping on your company's data. This is a moot point because they will obtain the data anyway, with a gag order, if they really need to.


"If they really need to" and by default is a pretty big gap imo.

They don't access your servers "by default" on the cloud, because they have to go through Amazon, still spend effort, and it probably isn't as legal as obtaining things through a court order. But I think the chance is high enough that it's still worth considering hosting your servers locally.

Why should a small company use the cloud for everything instead of a dedicated server or VPS?

Most small companies aren't going to benefit from the elasticity that the cloud provides. The administrative overhead will be similar on each except in the case of the cloud you need to worry about the persistence of data either via EBS, replication or having an acceptable loss level.


I would think remote dedicated servers and VPS services fall under the definition of "cloud"?

The question is whether or not you're paying up-front for hardware, and managing it locally somewhere. If you lease a $10/month VPS, you're using cloud services.


I absolutely agree, but oftentimes small cost optimisations are done by teams without thinking about how this impacts the speed with which they can build their business.

The cloud is not the definitive answer for all teams, but leaving all the power cloud computing gives for a little cheaper hosting at the beginning of a project seems wrong to me.


Hosting your own servers doesn't require foregoing virtualization and containers.

This is a critical point. In every recent hardware project where we've deployed big, beefy servers to colocation, the first step was deciding on the virtualization platform and the automation tools to manage it, as the flexibility, scaling, and reliability concerns are principal now. I have to hope that no one thinks of in terms of a physical box that is the database server, etc, any more.

And worth noting that on a couple of medium-range Dell servers you literally can spin up and down hundreds of virtual machines, obviously depending upon size and scale.

Even on a recent low end OVH server I requisitioned, the first thing I did was configure libvirt. The base image becomes almost irrelevant.


But even in that case there is a single point of failure in the host. That's all manageable, but it is effort, that could otherwise be put into developing the application

I would never advocate that someone build a production stack with a single physical server -- I've generally used three servers in concert in such situations.

But then you are probably getting close in pricing to running the stack on AWS, as you don't need to have that much reserved infrastructure when you can easily replace it when there are problems with the server

If you have the requirements for more than a few VM's on AWS, then the pricing for physical is going to be cheaper by far, even with a three-stack of physical servers. Let's put a basic Poweredge R620 as an example. For about $10k/server (and I'm going for buying hardware to kind of do a worst-case-scenario), you can get 384GB of memory per server, times three that's over 1T of memory in your cluster.

Given the EC2 Small instance type of 1.7GB/VM that would be over 600 VM's you could run on that kind of stack. Pricing for a half cabinet with power and bandwidth (you can find places with un-metered gigabit connectivity) will run you no more than about $2500/month (Much less if you are good at negotiating). So you have an outlay of around $40k for total hardware costs and recurring monthly of $2500/month. Amortizing this over expected lifespan of three years for the servers (worst case scenario), you can estimate per month costs somewhere in the range of $3600/month.

With EC2 to reach parity with those 600 'Small' instances would be an initial deposit of $57,600 (for three year term) and a recurring monthly of $11,826 for a total 'recurring' of $13,426/month.

Now, I should point out that these are obviously not apples-to-apples if you have: Huge CPU requirements on those 600 instances, or need lots of disk. Your up-front hardware costs increase (perhaps substantially) once those come into play. More I'm just trying to compare a bare instance scenario. As per one of my comments elsewhere in the thread really it all comes down to workloads and doing a real business analysis.


Probably not - you can stick quite a few virtual servers on a $6,000 phyiscal server and not have to worry about costs for usage, or noisy neighbors.

But that's already an investment that can get you quite some resources on AWS for a while.

That doesn't quite work for us. All our front ends/service and infrastructure nodes are virtual but databases are all dedicated.

This is because no virtualization tech likes handling large nodes (FCAL, 48 cores, 256Gb RAM). We lost 20% throughput by sticking our SQL Server primary and replicas on ESX versus physical when running our trials. That's a big hit when you run 150million queries a day.


HN user dergachev mentioned this on yesterday's discussion moving away from AWS ( https://news.ycombinator.com/item?id=6431126 )

  > OVH actually supports running the Proxmox* virtualization distro on their servers
As noted in the response there, limiting this to one physical server does introduce a single point of failure.

* http://www.proxmox.com/


Indeed. You should never, ever just spin up a single physical box, put virtualization on it and go to work. Doing this is not only not engineering for failure, but is setting the company up for failure.

This ending of the article kind of had me going???:

> AWS is expensive when you compare only the raw server costs and do not use reserved instances. But that premium lets you focus on building your product far more than going with a self managed machine. Especially for early stage startups or projects this makes all the difference in the world.

Well, which one is it? The article spoke from the beginning about that they use the idea of tearing down VM's on a daily basis. This is kind of a not having your cake or eating it scenario. If you want to have reserved instances you, well, have to reserve them. Once you've done that (unless I've missed something looking into reserved instances on EC2 before) you can't really remove and re-create a new instance just like that without paying the up-front cost for the instance again.

Really that then takes away a lot of the power, even though you're saving money in the long run. At that point, why not just go with a managed dedicated server provider? You get a whole lot of hardware these days for not a whole lot of money (People have talked endlessly about the different solution providers out there, so I won't repeat any).


Reserved instances are not bound to any specific VM. When you start a virtual machine and there are reserved instance slots free it will charge you the reserved instance amount. If you provision more machines it will charge the standard amount.

Well what do you mean by it will charge the reserved instance amount? Do you mean by that just the hourly reserved costs, or the up-front payment that varies based on the instance period or instance type? If the former, then I stand corrected!

It will only charge the hourly reserved costs.

From: http://aws.amazon.com/ec2/reserved-instances/

Easy to Use: Reserved Instances are easy to use and require no change to how you use EC2. When computing your bill, our system will automatically apply Reserved Instance rates first to minimize your costs. An instance hour will only be charged at the On-Demand rate when your total quantity of instances running that hour exceeds the number of applicable Reserved Instances you own.


Then I stand corrected!

"AWS is expensive when you compare only the raw server costs and do not use reserved instances. But that premium lets you focus on building your product far more than going with a self managed machine."

Sometimes it's faster and more cost-effective to use AWS. Other times -- such as when you spend a lot of time and/or money making up for the performance limits of the platform -- you lose. It all depends on what you need, and there are clear engineering trade-offs at play.

The problem comes when you don't make rational choices. If you're writing blog posts about how you're "all in" on one platform or another, you're either linkbaiting (this seems most likely to me), or you're not being rational about your choices.

Let's be honest: it isn't that hard to manage a rack of servers. Any competent engineer should be able to do the basic sysadmin tasks necessary, and aside from the setup costs, the amortized maintenance costs should be on the order of a few hours a month. Certainly, the wildly overblown claims of entire teams spending months to set up a few colocated servers...well, those teams are either completely incompetent, or there's some exaggeration going on.

Don't be afraid of colocating. Just be rational about what you need.


We don't use AWS/etc, we have 3 colo spaces in various locations around the country. We've never visited two of them; a small fee gets your equipment racked and connected, and remote hands does the rest as needed. The local one we visit maybe 3 times a year, usually to install new equipment.

The overhead of dealing with hardware, especially for applications that don't have massive exponential growth, is really not as big of a deal as it sounds.


Indeed.

We have 6x 42U racks across three datacentres in the UK packed with network hardware, servers, san and other kit.

Stuff doesn't really go wrong that often. Our ops guys probably go there once a quarter. Most of the time, you can manage it remotely. The only main failure point appears to be disks but you can just mail them to the DC and get them to stick it in for a small fee.


That Cloud -> Butt Chrome extension is doing a great job

I wanted to throw my experience into the ring because there seems to be such a fear of colocation. We knew nothing about colocation and decided to build some supermicro servers ourselves and install them and a switch in a colo 4 years ago. I read the all stories "i had to get up in the middle of the night to drive to the colo. it was the worst move ever to colo", and they are total bull. Even the biggest noob can setup things so it's totally remote. Servers have a dedicated ipmi port ( remote console over ethernet ) that will make it as if you're sitting at the server remotely. You can even mount a cd/image on your laptop, remotely so the hardware thinks that cd is in that machine. Hell, I can reinstall the bios on the server remotely, OS, everything. Why on EARTH would you have to drive to your colo? You can get servers that have 4 ether ports that you can bond in pairs to different switches. You can have hardware raid so loosing 2 drives in a server is no big deal, and you can take care of it at a later time. We have drives fail sometimes, but things keep on ticking. With the costs you save you can have triple redundancy if you like, and the benefit of consistent latency and better performance always. We have 250TB of storage and its double redundancy and also a remote backup. It cost ONCE what we would have to pay for a few months on the cheapest storage service.

We run straight kvm virtualization on our own hosts for flexibility. We run dbs on bare metal. I hear all these stories of people vms "crashing" all the time but i can tell you we have only had 1 instance of a vm, or in this case host dying in 4 years. Happened to be one of the video conversion hosts that is pinned 24/7 and it turned out it just hit some un recoverable memory hardware error. No big deal there were others.

Flexibility? We can clone and spin up VMs at will. We can live migrate and upgrade hosts. We can automate things with virtlib to our hearts desire.

Costs? $1500/month for direct equinix colo ( includes power, full rack, and gigabit connection from tier1 provider ) Never had a power issue, never had a network issue. We also use a CDN for static stuff and thats extra. We started with 3 servers, now are at a dozen, and adding a new one does not add a new monthly expense.

You can have a E3-1240 V2 @ 3.40GHz server built for $1500 and that as a host can run most of our front end stack. Sure we have 6 of those for backend crap, redundancy, but we actually run most of our stack on 1 of them. Mostly we do that for shits and giggles, but also because the interaction between the www, redis, mcd, zeromq is a few ms faster when it does not go over physical net. So if you over optimize like us, and want 30ms page gen times, you can nerd out like that.

s6 CPU: 8 MEM: 32080MB total running CPU: 16 MEM: 16384MB r-fp1 running CPU: 2 MEM: 1024MB r-mcd2 running CPU: 2 MEM: 1024MB r-www2 running CPU: 2 MEM: 4096MB r-www3 running CPU: 2 MEM: 4096MB r-red1 running CPU: 2 MEM: 2048MB r-red2 running CPU: 2 MEM: 2048MB r-zmq running CPU: 2 MEM: 1024MB r-zmq2 running CPU: 2 MEM: 1024MB

front end proxy, www front ends, redis, zeromq, memcached, etc. Excluding mysql db which is on bare metal. This serves our site that handles about 200 page views per second peak day, and that is at 25% host utilization. Our pages generate ( no caching ), including redis, zmq, and maybe 25 db mysql calls per page in about 30ms. You can optimize things too like...you know that the default config on a server will kick down the cpu to 1.6Ghz if its not really loaded, and that means page gen times in our case would be 15ms slower. Hell, we dont have to try to save power, so we can kick that sucker to 3.4Ghz all the time and make sure users get the benefit of that. Nice to be in control of the host.

We never needed remote hands or anything like that, but that is available a phone call away. I visit the colo in San Jose once a year and I schedule it with my motocycle trip down there. Sometimes I just dust the servers off, pet them a little and look at the pretty lights.

Of course ec2 has it's use. If your html traffic spikes higher than 1 gbps, then it's nice to have the flexibility of a fatter distributed pipe. If you want to optimize for rtt then it's nice to be able to spin up in a different geographical areal.

I think what bugs me the most is that a lot of companies use the argument of, if you get high traffic, like slashdotted or hackernews you can spin up a 100 front ends easily and handle it. We've been on the top of hackernews and the change in traffic was in the noise floor as compared to 200 r/s we normally handle. The point I'm trying to make is that if you engineer your app better, and understand and fix issues with generating your pages faster, you wont need the fancy scale to 100 front ends bullshit. ( tip. it's probably your database queries anyways so optimize that. it's not the print/echo statement that is outputting html on the front end ) Of course some do require webscale and it's a good way to go with ec2 and all the extra costs and engineering, but it seems that every joe blow and his blog or app seems to think they need so spin up to 100 front ends.

Sorry for the rant. I actually think that ec2 and the likes are the future and as tech gets better and prices get better I can see it making sense for more and more. I just wanted to give a contrast with our current setup.


$1500/month for Equinix + Bandwidth? We got quotes from them before and rack was low, but I wasn't finding super cheap bandwidth like that. Did you go with Cogent or similar?

We actually went via Bandcon, which was then bought by highwinds. BW is around the going price $2.5/Gbps and it seemed to be level3 at the beginning and now it's seems more of a mix. ( I should specify that we have a gbps port but we only use about 100 mbps as it's only the html we serve from there ) We use another 2 Gbps of traffic via CDN for all the static/video content but that is of course a different cost ) But it's nice when the CDN ingest point is in the same physical DC as we are.

I just looked what 250TB would cost us on s3, $20k/month, or $240k/year. ( im not even counting the put/get usage )

You can build it, for ease of math, 100x 3TB seagate constellation. 100x $250 = $25k, another $5k easily covers a 45 jbod and raid card and server with ssd zil and arc for zfs and you're done. so $30k. Get 2 more for redundancy and backup as you see fit.

So over 3 years, 720k vs apples to apples 90k ( if you got 3 of those servers) so you save say $600k. You can get a decent remote dev for $200k/year for that time.


if you know how to negotiate, even tier 1 transit providers like Level3 can be had at around $500/month commited throughput for a burst-to-gigabit fiber link. $1k for the power and half cab is also within reason.

won't be redundant power or connectivity though - but it's not cost prohibitive - just double the price.

people - this is what amazon makes their margins on. doing this work for you, so you can click a button in your underwear. if it makes sense for you, it makes sense - but if it doesn't - it doesn't.


Thanks for the info. Do you have a plan for hardware upgrades? I guess hardware from 4 years ago can serve plenty of websites for a long time. But eventually an upgrade will make sense. Will you upgrade entire servers, just a few disks, RAM, etc.?

How hard is the KVM virtualization to set up? That also seems like a fairly big task, or at least a specialized one.


When we started we bought these boards http://www.supermicro.com/products/motherboard/QPI/5500/X8DT... and at first we outfitted them with one low end CPU and small amount of ram 6G as those were our needs and that's what we could afford at the time. 2 years ago we upgraded those machines with dual 5560 cpus, and 48Gram for not very much money, and in fact they run our production DBs right now. They still are very competitive if you stack them up to current e5 models. We added more servers last 2 years and they have been E3-1240 V2 based single cpu, 32G ram. You can't beat the price/performance there. So in 4 years we still have not obsoleted much but some older ram and base cpus.

KVM is really easy to setup. Install the package on your linux distro, start up virt-manager if you want gui, "start" new machine and install whatever you want from any cd image you have. Of course once you wrap your head around it you'll want to do it with cli tools and custom automate it. But basic virt-manager might take you a long way. Once you have multiple machines and you want to migrate between hosts you'll have to setup a shared storage. That can be as easy as an nfs share/mount. We started with just 1 ssd for that, but then built a dedicated box with many intel ssds on hardware raid 10. Never had an issue. But shared storage/live migration is not always needed and can add more risk. If you engineer it that all your hosts are independent and you have redundant services for everything, then you dont need to live migrate. If you need to free up that host, just turn it off, as you have redundant services running on other hosts.

In fact on our Dev systems we run KVM on our osx laptops nested in a vmware vm. ( vmware can nest like this passing hardware flags to the guest host ) So on osx you run vmware, which runs a linux vm, then that vm is used as a kvm host to run other vms via kvm. This way we can run exactly 100% the same image locally as is in production.

In fact if you really want to do some crazy plumbing... the VM host on my laptop has a VPN link to our DC, this puts it on the same internal network as our DC production hardware. I can then live migrate a production VM ( like a web front end ), onto my laptop, while it's fully operational doing processing for the production environment. On my laptop it will still be, via VPN, receiving and processing live web requests on our website, and properly sending back data to the proxy and user. Not very performant, but the flexible plumbing is nice if you want to test/debug a clone of the exact production system locally.


Legal | privacy