David Cramer's Blog

Moving Sentry from Heroku to Hardware

Update: Don't decide against Heroku just because you've read my blog. It makes some things (especially prototyping) very easy, and with certain kinds of applications it can work very well.

I've talked a lot about how I run getsentry.com, mostly with my experiences on Heroku and how I switched to leased servers. Many people consistently suggested that operations work is difficult so they shouldn't deal with it themselves. I'm not going to tell you that my roommate, Mike Clarke, one of the few operations people we have at DISQUS, has it easy, but I'd like to give you a little bit of food for thought.

GetSentry started around Christmas of 2011. I had already built and open sourced Sentry at Disqus, and the idea was to take that work and create a Heroku AddOn out of it. The pitch was that I could make a little bit of money on the side simply by hosting Sentry for people. About three months later I had that prototype hosting service running on Heroku, accepting payments both via the AddOn infrastructure, as well as on my own using the amazing Stripe platform.

Let's fast forward to today. I no longer run any servers on Heroku (or any cloud provider, other than S3 for backups), and instead I lease servers. Now the company I lease from is what most people would call a "budget provider". They're extremely cheap (they dont' add extreme margins to the cost of the machines you're leasing), and they do absolutely nothing for you. It's not for the faint of heart. That said, it's also how I can get away with very low costs.

I'm going to tell you a bit of a story of how I switched from Heroku to fully configured leased servers in less than a week, in my free time. I'm also going to try to convince you that it's really not that complicated,.

The First Server

This part could be more appropriately titled "Learning Chef". I'm fortunate to have some awesome coworkers, and even more fortunate that when I was making this transitiong I had access to my roommate to prod him about questions. I'm also extremely fortunate that medians like Google, IRC, and Twitter exist for any other questions I ever have.

The first task I had to getting my prototype web server online was to get it all configured. I could have taken the old fashioned approach of creating a few config files locally (vcs maybe) and then sending them up to the server, as well as manually installing whatever packages I needed (nginx, memcache, etc.), but with Puppet and Chef becoming all the range I figured it was as good as time as ever to dig into one.

I decided to use the Chef hosted service, and after a few bumps with figuring out what all this Ruby stuff was about, I had managed to get a basic understanding of roles and cookbooks. After quite a bit of fiddling I had created a cookbook specific to getsentry (which holds things like setting up varoius paths), and a bunch of generic ones, like apt, nginx, memcached, python, etc.

Creating a Recipe

The meat of this was handled via Chef's awesome roles, and wiring up a few things in the 'default' recipe of getsentry:

include_recipe "python"

directory "/srv/www" do
  owner "root"
  group "root"
  mode "0755"
  action :create
end

directory "/srv/www/getsentry.com" do
  owner "dcramer"
  group "dcramer"
  mode "0755"
  action :create
end

This formed the basis of any server that I would be running, and simply setup a couple of directories. I also simply gave ownership to my user, as I'm the only one working on the project, and didn't need the added complexities of build or system users.

I then moved on to a second recipe, which formed the basis of a web node. This one has a lot more to it, as it needed to configure nginx and memcache at the start:

include_recipe "getsentry"
include_recipe "supervisor"

template "#{node[:nginx][:dir]}/sites-available/getsentry.com" do
  source "nginx/getsentry.erb"
  owner "root"
  group "root"
  mode 0644
  notifies :reload, "service[nginx]"
end

nginx_site "getsentry.com"

supervisor_service "web-1" do
  directory "/srv/www/getsentry.com/current/"
  command "/srv/www/getsentry.com/env/bin/python manage.py run_gunicorn -b 0.0.0.0:9000 -w #{node[:getsentry][:web][:workers]}"
  environment "DJANGO_CONF" => node[:django_conf]
  user "dcramer"
end

supervisor_service "web-2" do
  directory "/srv/www/getsentry.com/current/"
  command "/srv/www/getsentry.com/env/bin/python manage.py run_gunicorn -b 0.0.0.0:9001 -w #{node[:getsentry][:web][:workers]}"
  environment "DJANGO_CONF" => node[:django_conf]
  user "dcramer"
end

There is a bit more to it then what I've shown, but all in all it was pretty simple. It just took me a bit to understand how chef functioned. All in all, I'm now an engineer that has experience in Chef, even if it's very little. From from my perspective (on the hiring end at Disqus), that's is an awesome addition to an engineer's skillset.

Once the web server was online, all I had to do was to configure a primary database server. I simply brought up another node, gave it a new role (db), and didn't even need to create a custom recipe (I simply reused the existing pgbouncer, postgersql, and redis recipes available elsewhere on the internet).

Operational Complexity

I stated in the beginning that I completed this process in less than a week. From Heroku to hardware it took me about three evenings of toying with Chef (mostly more complex components, like iptables and building a deploy script). What I really want to point out is how I have never been in an operations position. I've definitely configured servers (ala apt-get install nano), and know my way around, especially with a database, but most of this was fairly new to me.

The continued argument of it being "too difficult" to run your own servers is quite the overstatement, but it's not something you should ignore. There are many things I have to be concerned about, most importantly data loss and the ability to recover in the event of a disaster on my machines. These also aren't overly complex challenges to handle.

Data redundancy is handled a simple cron script that does nightly backups to S3. It's literally just a script that calls pg_dump and s3cmd to send the files upstream. Now that's not enough for any real requirements, so step two is simply setting up replication on your database node to a second server, if if that server is your application server.

Availability is the second big problem, and is easily avoided the same way that you avoid losing your database: have a second server. This again can be a server thats primary task is for something other than your application (it can be your database). It doesnt have to a permanent location for it. It only has to survive until a primary server is available or you're willing and able to invest in more hardware.

Closing Thoughts

I spent an initial three evenings, and another week's worth since on server configuring an operations. There were various problems like Postgres not being tuned well enough (pgtune is amazing by the way), DNS being slow (fuck it, use IPs), and some more minor things that needed addressed throughout that time. All in all, there's basically zero day-to-day operations concerns, and most of the work happens when I need to expand the system (which is rare).

All of it ended as an extremely valuable learning experience, but you using Chef wasn't a necessity. I could have done things the more "amateur" way, but I also now have the benefit of being able to bring online a server, run a few commands, and have a machine or even a cluster identical to what's already running.

On the limited hardware I run for getsentry.com, that is, two servers that actually service requests (one database, one app), we've serviced around 25 million requests since August 1st, doing anywhere from 500k to 2 million in a single day. That isn't that much traffic, but what's important is it services those requests very quickly, and is using very little of the resources that are dedicated to it. In the end, this means that Sentry's revenue will grow much more quickly than it's monthly bill will.

GetSentry has been profitable since its 4th month, and currently only spends 10% of its monthly revenue (hardware and other third party services). That gap gets larger every month, and I've been more than happy to invest some of my time to keep that gap as large as possible. The irony of it all? I'm selling a service that's entirely open source, yet suggesting that you run your own hardware. For some people sacrificing cost for convenience is acceptable, for others it may not be.

Also, this.

Look for a future post with many more details on how I setup Chef (likely incorrect) with more in-depth code and configuration from the cookbooks.

Comments