How does Kubernetes handle load balancing? That's a simple question, but it has a complicated set of answers.After all, load balancing is a complex affair. Although it’s a central part of the functionality that all container orchestrators offer, there are different ways to achieve it—and those different approaches are one factor that help set the various container orchestrators apart.
This article explains the intricacies of Kubernetes load balancing.
Containers and Pods
Before diving into the details of Kubernetes load balancing, let's start with a quick look at the way that Kubernetes manages containers internally. This is important to understand before you can wrap your head around load balancing.
Kubernetes is designed for scalable management of Docker containers, which it organizes into pods. Each pod is a group of containers (typically interrelated, both functionally and in terms of purpose), and shared volumes. A pod has a localhost-based IP address and port. Inter-process communication is possible between containers in a pod, but not between containers in separate pods.
Pods are not designed to be persistent—Kubernetes creates and destroys pods as needed. Each pod has its own IP address and UID. Even if it is a replica of one that previously existed, or that coexists with it, a new pod is assigned a new IP address and UID.
Since communication with pods is usually handled internally, within Kubernetes, the built-in pod management tools are generally sufficient for keeping track of new, deleted, or replicated instances of a pod. If, however, you have reason to expose a Kubernetes-based application to the underlying network (as is sometimes the case), you (or the exposure method that you use) will need to take into account this lack of IP address persistence from one pod instance to the next.
Most of the Kubernetes infrastructure is directly or indirectly involved in creating, managing, and communicating with pods. The pods themselves are deployed in virtual machines called nodes, which include utilities for managing and communicating with the pods contained in the node. Pods may be created or redeployed (destroyed and replaced by a new instance) within a node, or as part of the creation or redeployment of a node.
Controllers and Services
Pods are managed directly by controllers, which handle such tasks as the replication, scaling, use, redeployment, and destruction of pods. Pods are organized into abstract sets called services, which typically represent replicated pods performing the same set of functions. In some ways, a service can be seen as a pod made up of pods.
A service is assigned a relatively persistent IP which is used within Kubernetes. If part of a Kubernetes-based application needs access to the functionality handled by a given service, it can access the service, which will then assign one of the pods to take care of the request. The actual pod used doesn't matter to the program element making the request, and neither does that pod's address. The service in this respect is basically a pool of functionally identical pods, assigning them as needed.
The Need for Balance
This is the point where the question of load balancing comes in. Any time that you have a pool of functional units which are assigned to perform tasks on demand (whether it is software or heavy equipment), you need to have a way of dispatching them which optimizes availability and avoids undue strain on the system. When it comes to physical servers and other large-scale elements of infrastructure, of course, load balancing is a necessity, and for a variety of reasons (not the least of which is optimizing the use of underlying server hardware), it is also a necessity with Kubernetes.
How does Kubernetes handle load balancing? As it turns out, Kubernetes uses a variety of methods to deal with the task, depending both on the scope of the load-balancing effort and the way that it is configured.
Internal Load Balancing
At the most basic level, Kubernetes incorporates internal load balancing into its service model of organization. Recall that a service acts as a stand-in not just for one pod, but for a group of pods, typically all with the same functionality. Since any pod represented by the service should be equally capable of carrying out a requested task, there must be some way of assigning tasks. In the very least, it should be a method that is not entirely uneven.
Kubernetes uses a feature called kube-proxy to handle the virtual IPs for services. Kube-proxy has two modes: userspace and iptables. In userspace mode (the original default mode), Kubernetes allocates tasks to pods within a service by the round-robin method.
Round-Robin and Random
With round-robin allocation, the system maintains a list of destinations (in this case, the virtual IP addresses of the pod represented by the service). When a request comes in, it assigns the request to the next destination on the list, then permutes the list (either by simple rotation, or a more complex method), so the next request goes to the following destination on the list. In iptables mode (the current default), the incoming requests are assigned to pods within a service by random selection.
Balance or Distribution?
Technically, both of these methods (round-robin and random) qualify as load distribution, rather than load balancing, since they simply assign requests to an available pod without taking into account the actual load currently placed on pods represented by the service. Any balancing beyond these distribution methods needs to be handled by other processes, which, in the case of Kubernetes, means external resources.
External Load Balancing
Kubernetes includes two basic methods of applying external load balancing:
The LoadBalancer service type sets a service to use a load balancer from a cloud service provider. The load balancer itself is provisioned by the cloud provider. LoadBalancer will only work with specified providers (including AWS, Azure, OpenStack, CloudStack, and Google Compute Engine). The external load balancer directs requests to the pods represented by the service. The details of the process depend on the cloud provider; balancing capabilities at the pod level may be limited.
An Ingress controller is a Kubernetes pod containing a daemon which applies a set of rules (an Ingress resource) to traffic reaching the services in a cluster. Ingress controllers include built-in load-balancing features. Much more complex system/vendor specific load-balancing rules can be included in an Ingress resource. Ingress is much more complex in its capabilities and reach than LoadBalancer, and includes a variety of features found in sophisticated load-balancing systems.
There are other methods of handling load balancing in Kubernetes, including third-party and provider-specific solutions. In general, however, external load-balancing solutions are increasingly Ingress-based, which is not surprising, given the much greater flexibility and fine-grained control that Ingress provides.
About the Author
Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.
We’re hiring! Check out the careers page for open positions in Amsterdam, London and San Francisco.
As usual, if you want to stay in the loop follow us on twitter @wercker or hop on our public slack channel. If it’s your first time using Wercker, be sure to tweet out your #greenbuilds, and we’ll send you some swag!