Redis Benchmarking on Amazon EC2, Flexiscale, and Slicehost

Much attention has been garnered by key/value databases in recent months, often stemming from the potential increase in throughput over traditional RDBMSs. Redis is one such database, and its support of data structures has especially captured my attention.

However, I have been concerned that some users have found in-memory databases (such as Redis) perform poorly on cloud hosting providers like Amazon EC2. Contention for memory bandwidth would seem a likely cause.

So I set out to compare cloud hosting providers and find some (reasonably) solid numbers. The full results can be seen in the Google Docs Spreadsheet, but I have extracted the key points below. I am not experienced at performing benchmarking, so any suggestions for improvement are very welcome!

Update – SpeedyRails:

SpeedyRails kindly provided me with an 8-core, 2GB memory VPS (normally costing $128/month). The server provided a very high request-per-second throughput thanks to its eight cores, but fell short on memory. It should be noted that this memory limitation seems common to many VPS providers (Linode and Slicehost to name two).

Benchmarking information:

All benchmarks below were performed using the Redis benchmarking command as follows:

./redis-benchmark -n 10000 -d 200

I ran Ubuntu 8.04 LTS on every server, with the 64 bit version used for all tests except ‘small-remote’, ‘small’, and ‘high-cpu-extra-large-32bit-os’ (which used the 32 bit version).

Raw throughput for one Redis instance

 

This first chart shows that, by-and-large, the performance of the Redis instance is dependant upon the speed of the core on which it runs. The ‘large’ EC2 server is about twice the speed of the ‘small’ EC2 instance (which Amazon states), and the double/quadruple-extra-large servers are a little faster again.

Based on this theory, one may expect the difference in performance to be greater in some cases, but I suspect that there is some variability depending on the physical server the instance is actually deployed to, so take this as a rough guide only.

It is also interesting to note that the 32bit high-cpu instance outperformed the 64bit high-cpu instance in benchmarks using both 2byte and 200byte requests, potentially due to the extra resources needed to process and transfer 64bit values between memory and the CPU. However, running a 32bit OS on 64bit hardware showed appalling performance, so don’t try it.

It should also be noted that the ‘small’ and ‘small-remote’ servers are the same, but in the case of ‘small-remote’ the benchmarking utility was run on a separate ‘high-cpu-extra-large instance’ (as ‘small’ instances have only one core, the benchmarking utility was a significant drain on system resources, as shown by the differing performance).

Both Flexiscale servers performed equally well. Slicehost also performed well for a single core server (which also ran the benchmarking tool during the test). However, the result is of little significance as this was testing an in-memory database on a server with 256Mb memory. We essentially discount the Slicehost server later, but I would welcome any benchmarking data for larger slices.

Considering multiple Redis instances

Of course, many of these servers have multiple cores. In these cases, the intention is to run one Redis instance per core and use some form of load balancing to distribute requests. This will inevitably cause a reduction in performance depending on how you choose to do this (replication, sharding, consistent hashing, vertical partitioning etc etc), but this analysis does not take this into account. I expect the performance impact will greatly vary depending on your application’s architecture.

With that said and done, here is the chart of projected requests per second when running one Redis instance on each available core.

There are very few surprises here considering what we have learnt so far. If you have a ‘quadruple-extra-large’ server with 8 cores at about 3.5GHz each, you get massive throughput. The faster and more plentiful your cores, the more you can get done.

Bang for your buck – responses per second

Now here is the actual useful bit! Which server will give the best performance (in terms of requests per second) per dollar spent? To find this, we scale the data we have for each server so that it becomes a fictional server that costs $1/hour.

‘slicehost-256’ is very cheap and performs OK, so clearly does well here (don’t worry, it will fall down in the next test!). Behind this, the EC2 High CPU instances give good value, as do the Flexiscale servers. Any memory heavy server performs poorly here, as you are paying for the memory, not the core speed/quantity.

Bang for your buck – database size

Of course, responses per second is only half of it. Redis stores the entire database in memory, which limits how much data our servers can store. Firstly, lets look at the available memory in each server (irrespective of cost):

The ‘quadruple-extra-large’ server provides a whopping 63GB of memory (we subtract 1GB from each machine as the OS will need some memory), so that comes out on top. The ‘slicehost-256’ comes out as zero here as it is not a realistic choice for a Redis server.

Depending on how you intend to split you dataset across Redis instances, you will want to pay more attention to either ‘Memory available’, or ‘Memory per Redis instance’.

We can also consider this in terms of dollars per GB of database storage:

Most of the standard large servers do well here. ‘flexiscale-2gb-4core’ fairs badly, but this simply highlights that splitting 2GB across 4 Redis instances has skewed the server towards requests per second rather than storage.

Conclusion

It is easy to see how Redis can perform poorly on the cloud. Comparisons between a (locally benchmarked) EC2 small instance and a bare metal server just won’t hold up, but who runs a single core 1.1GHz server anyway? There will be a performance penalty for virtualisation, but that isn’t news. As always, the trick is to work out what is best for the situation.

For example, a ‘high-cpu-medium’ EC2 instance will be around $130/month and will serve 60k-70k requests per second for each of its two cores. Yes, your database will be limited to 1GB, but there is a very easy upgrade path. However, if you do need to store many gigabytes of data, then I expect that a dedicated server will be much more cost effective (considering the price of memory).

In terms of ‘bang for your buck’, it is nice to see parity between small and massive servers alike. In both price and performance, one EC2 ‘double-extra-large’ server is equivalent to about 15 small EC2 instances. So the small guys can take comfort that they are not losing out to economies of scale, but the big guys should probably be looking at dedicated bare-metal.

Request for Benchmarks

I would be especially interested in additional benchmarks for bare-metal servers, larger Slicehost slices, and any other IaaS providers not covered here.

5 comments so far

  1. Keenan Brock on

    This is great. What version of redis did you run? (1.01 or head ~1.1)?

    Thanks

    • Adam Charnock on

      Hi Keenan,

      The tests here used Redis 1.0.2. However, I also did the same tests using the version straight from Git and the results were essentially the same.

  2. Zimtstern on

    Interessting Facts!

  3. Michael Lenaghan on

    Thanks for the post!

    Amazon offers both On Demand pricing and Reserved pricing. If you sign up for Reserved you pre-pay for either a one year or three year term; that then lowers your hourly pricing. If you were running a server 24×7 on Amazon you’d obviously go the Reserved route.

    For high-cpu-medium as of today the On Demand price for 24×7 is:

    $0.17/h * 24h/d * 30d/m = ~$122/m

    For high-cpu-medium as of today the Reserved price for 24×7 with a one year term is:

    $0.06/h * 24h/d * 30d/m + $455/y / 12m/y = ~$81/m

    Obviously that would have a big impact on your price/performance numbers.

    • Adam Charnock on

      An excellent point! As a rule of thumb, I normally asume reserved instances are about a third cheaper, which would have a significant impact on the cost analysis.

      However, even though I do have a number of instances running 24/7, I prefer to have the flexibility and pay the ‘extra’ money.

      Each to their own really. Spot instances are also another option:
      http://aws.amazon.com/ec2/spot-instances/


Leave a reply