Ceph’s RGW service makes it incredibly easy to build out a fully-featured S3 service for customers which is resilient and scalable out of the box. Service Providers can integrate RGW directly with their existing authentication and billing systems, and deploy a ready-to-go S3 product. At SoftIron we usually deploy RGW on our Storage Router, and can get wire speed S3 performance out of a single node. For a production S3 service however, we need our systems to be highly available and scalable.
RGW is a stateless RESTful HTTP service, so you can spin it up and down at will on as many nodes as you like, and they will each independently handle the requests they’re served. We can quite easily build a poor-man’s load balancer with a simple DNS Round Robin setup – multiple IPs are served in response to DNS requests, and we can switch up the order of the servers successively on each request. This is great for lab environments or POCs, but not for the real world. For one thing if a node dies there is a delay in time before clients have their DNS cache updated, so they’re stuck with bad information which will definitely impact performance but could also break things at the application layer.
For real world environments enterprises will typically solve this problem with either a hardware load balancing system or the ubiquitous HAProxy – often with Keepalived to manage a virtual IP system. For a production service though this often still isn’t enough – we don’t just need to address all of these nodes as a single unit – the system also needs to be able to seamlessly handle failure, and pro-actively route requests to new gateways as we add them. We also need to ensure that our load balancing system isn’t throttling our throughput, and that this continues to be the case as we scale the gateways.
Vincent Bernat has a great article on his blog that details the basic building blocks of a 3-tier load balancing system which solves this scalability problem with ECMP routing, and both Layer 4 and Layer 7 load balancing. It’s a great read which he has kindly agreed to let us reference here: 2018 Multi-tier Load Balancer