The last couple days were pretty rough for Beanstalk and our customers. We experience very high load across all of our slices, eventually determining that the performance problem was on the GFS drives that store our application and SVN data. Since we host with Engine Yard, we have nice Nagios warnings about high load, which started to come in on Tuesday morning. After lots of digging and help from the Engine Yard support team, we were able to narrow down the problems.
Problem: Storage calculation
Each plan level in Beanstalk has a storage limit, which determines the plan you need or if you have exceeded your storage capacity across your repositories. In order to keep this information up to date, we need to constantly calculate storage. On the server, we would run a "du" on every commit to calculate the storage, then report back to the application. If the storage is too high, we can inform the customer that it is time to upgrade.
As you can imagine, this creates a huge problem as the application grows. Considering that we have thousands of commits each day, the process would have to calculate storage for literally thousands of files in each repository on every commit. The IO usage was absurd.
Solution: Adjust and compromise
The only solution was to reduce the number of times we calculate storage. Of course, this also means that some people could go over their storage limits and continue to use the system. For us this was an easy compromise. Having a stable system with the ability to grow compared to a short delay in exceeded storage is a no brainer.
We've had this fix ready to go since late last week, but it required some additional testing. After the problems in the past two days we decided enough is enough. We all got together today, tested the fix rigidly, and deployed the fix to our slices.
It's only been running for a short time, but so far load has gone down dramatically. We also increased the RAM on our the slices that handle our daemons. We'll be watching this carefully to adjust and improve.
While we just recently released performance updates, it's an ongoing process. This latest fix is another improvement that will help us grow. In the next several weeks we will be working with Engine Yard to improve the speed of our GFS drives as well as some tuning for Subversion specifically.
We really appreciate the patience while we improve and we welcome as much feedback as possible. Please don't forget to follow us on Twitter for more frequent updates.