In a previous post, we looked at how Docker solves four major problems for software and operations engineers involved in the deployment and runtime management of applications, APIs, and microservices: packaging, distribution, runtime isolation, and installation. When all four of these are combined, they amount to a standardized interface around an application container.
This standardized interface opens the doors to a new tier of automation tooling that can seamlessly and generically manage microservice instances across multiple machines. As more and more of such tools emerge, a few names for this class of system have popped up as well: “container orchestrators”, “datacentre operating systems”, and “schedulers”. Personally, I think “container orchestrator” is the most representative, so I will stick with that name from now on.
Open Source Container Orchestration
The three most popular open-source orchestrators at the moment are:
- Kubernetes (from Google).
- Mesos and Marathon: (from Mesosphere).
- Docker Swarm: (from Docker).
While containerized deployment and orchestration is likely to be the way most deployments are done in the long term, implementing one of the above solutions on premise or in the cloud is still not for the faint of heart. All three systems are under very active development and provide somewhat disjoint sets of features. This makes it difficult for a newcomer to evaluate and compare them against each other. In this post, we will touch on some of the major requirements that are often needed from a complete orchestrator system and how the above systems address those requirements.
Networking is probably the most important aspect of container deployment and has seen quite a lot of change since Docker’s first releases. There are several aspects to networking that we are concerned with.
The first concern is: how do the packets from one microservice reach another microservice, potentially on another machine? There are a few options available:
- Overlay network. This includes tools like Flannel and Weave. These systems maintain manage the allocation of IP addresses and the locations of each container, in a distributed storage system like etcd, Zookeeper, or Consul. All packets between containers are encapsulated and routed to the correct destination. The two common types of encapsulation are UDP (forwarded through a user space process) and VXLAN (done in the kernel).
- Calico. This project eschews encapsulation, and instead relies on good old routing and IP networking capability within the kernel. At a high level, each host is modeled as a router with each container as a host behind the router. Hosts exchange routing information via BGP. In addition to routing, Calico also has a flexible policy-based security configuration and has recently combined forces with Flannel to produce Canal – a combination of Flannel’s overlay network with Calico’s policy system.
Docker ships with its own VXLAN-based overlay network built in, although it does require setting up your own backend (e.g. etcd). Kubernetes and Marathon don’t ship with anything out of the box, but can be combined with Flannel, Weave or Calico. Unlike Kubernetes, though, Marathon does not prescribe that an overlay network be used – for example, you could just forward container ports directly to the host. However, that means you will be on the hook to ensure that those ports are allocated without conflicting with each other. So, I would not recommend this option, unless you have a specific reason to avoid overlays.
Service discovery and load balancing
Now that packets can reach other containers, how do we know the hostname or IP address of the containers for a particular service? Moreover, how do we balance requests across them and react quickly as containers are brought up and down?
In a microservices architecture, these questions are very important. In complex systems, dozens of microservices with multiple instances each could be running across hundreds of machines. Each service exposes an API on the network.
Kubernetes has a built-in solution for both aspects of this via the Service abstraction. Any pods that are labeled accordingly become part of the Service and connections to it will be load balanced. The Service itself is discoverable to other containers via DNS. This comes with some caveats, though – the load balancing currently works through IPTables and NAT on the host, and does not provide much configurability.
With Docker and Marathon, you will generally need to run a proxy that can watch as containers start and stop in the cluster and re-configures itself accordingly. Here are some example integrations: Marathon-lb, Traefik.
The 12-factor application guidelines suggest writing software, particularly API services, in a stateless fashion. This supports the container orchestration nicely as it gives the orchestrator freedom to place application instances on whichever host it deems most appropriate at any given time. However, this falls apart quickly if the service you’re deploying requires persistent storage (for example, a database). There are two approaches that can be taken by an orchestrator in this case:
- Use network-based storage and remap the virtual disk to whichever host the container is scheduled on. In this case, storage can be managed independently of the host, so a hardware failure does not affect the container’s ability to restart elsewhere. On the other hand, disk I/O has to travel across the network, resulting in higher latency. We’ve also found that remapping network disks in AWS, for example, sometimes fails and can require manual intervention.
- Use the disk storage on specific hosts and ensure that the scheduler always places the corresponding application instance on that host. This approach maximizes disk performance as I/O does not have to go over the network; however, the failure of a single host can result in an outage or data loss.
Kubernetes builds in both approaches. When running in a supported cloud environment (GCE or AWS), it is capable of using the provider’s API to remap virtual volumes to whichever host a container is scheduled on (Volumes). It is also able to allocate chunks of storage on host disks, and ensure that corresponding containers are scheduled appropriately (PersistentVolumes).
Docker do not provide an out of the box solution for either technique and would require integration with RexRay or placing manual constraints on container instances to specific machines.
This is one of the big open issues for orchestrators, so I’d expect to see significant progress in this space over the next little while.
The scheduler is the part of the orchestration system that determines the machine placement of each container. There are a number of strategies that can be employed for this purpose:
- Round-robin: place on the next host in a sequence;
- Least loaded node first: place on the host that is the least loaded or has the smallest number of containers;
- Resource-based constraint: place on a host that has available capacity. Container metadata would specify resource requirements;
- Host property constraint: place on a host that meets a specific requirement. For example, specific hardware device or OS;
- Storage constraint: place on the host that holds the required storage volume (see section on storage);
- Label-based constraint: place on a host matching a particular set of constraints, usually specified as labels on hosts and matching expressions in the container metadata;
- Manual placement: place on a specific host, identified by its hostname.
As you can probably imagine these constraints could theoretically be combined in various ways to obtain a specific desired effect. When evaluating an orchestration system, it’s prudent to identify which strategies are relevant to your use case. For more information about how each orchestrator places containers, see Kubernetes node selectors, Marathon’s constraints, and Docker Swarm’s strategies and filters.
Most software (especially an API service) tends to be deployed into multiple environments and therefore usually requires some environment-specific configuration. The amount of such configuration really depends on the type and genericity (yes, it’s a word!) of the service being deployed. For example, a web proxy or database server would likely require a large number of configuration options depending on the application that uses them. On the other hand, an end-user facing application is likely to only have a handful of environment-specific flags.
For simple applications, the 12-factor app recommendation of supplying the settings via environment variables is probably sufficient; this is provided by all of the orchestrators.
For more complex configuration, some additional engineering effort is required. There are two possibilities:
- If the orchestrator allows, specify configuration as metadata associated with the container. Kubernetes is the only one of the three orchestrators that has this feature (ConfigMap)
- With the application image as a base, use your CI/CD pipeline to build a new environment-specific image with the configuration baked in.
- Have the container’s entrypoint pull in configuration from some centralized storage before starting the application. This could be done by querying a database, Zookeeper, or Consul. A variation on this could be to build configuration management directly into the application.
Secrets are a special type of configuration – it should only be visible to the service itself and, perhaps, some authorized parties. This data can include API tokens, passwords, database credentials, etc.
The main consequence of this additional requirement is that the usual configuration dissemination methods are not optimal. Specifically, using environment variables or baking the secrets directly into the image would potentially leak the secret information.
Kubernetes provides a specific solution to this (Secrets). Mesos and Docker Swarm would require some custom integration to achieve this. There are a number of ideas with their tradeoffs in this blog post.
Orchestrators are the future of deployment systems. Currently they are still the domain of early adopters and those operating at scales large enough that traditional deployment systems no longer work. However, there are several solutions that are quickly gaining ground and are quite usable today if you’re willing to put in some work.
Networking, storage, and configuration are just some of the functional requirements you should be aware of when considering an orchestrator. If you are in the process of selecting one, I hope you have found this post useful. There are plenty more factors to think about including:
- Installation and upgrade process
- Rate of change
… and many more.
Please check out LunchBadger for more information on how we’re using Kubernetes to empower companies to compose, manage, monitor, and monetize their APIs:
- Check out the demo to see how LunchBadger provides a Docker container based runtime for APIs that works natively in your cloud and also harnesses the simplicity of a serverless experience.
- Read about the features modeled after the API lifecycle as a solution from start to end, all on the same Docker container runtime.
- Register and join the FREE private beta and become an early participant to realize the simplicity and speed of having APIs work for you and your business.
- For more information please contact us at [email protected].