For an app to function at its best, every part of the application stack needs to be optimized and modernized. Technologies like containers and container orchestration tools have brought this kind of modernization to the infrastructure layer of the stack. The way code is deployed is changing as applications are built and deployed in a more distributed manner. Microservices architecture has ushered in this type of decentralized approach to software delivery.
However, between the infrastructure and the code, the layer that needs to function seamlessly is the networking layer. In modern containerized applications, a lot of focus is on the infrastructure and packaging and deployment of code, but traditionally not as much on networking.
This has been changing recently with the advent of the service mesh.
What Is a Service Mesh?
Network communication was pretty simple back in the day. The network relayed messages from client to server and back. You could easily trace the route messages take, the few touch points across the network, and easily debug latency issues and errors. It took just a single monitoring tool like Nagios.
In a containerized app, each application is made up of loosely coupled microservices. Each microservice is made up of multiple containers, or pods in the case of Kubernetes. Every request now touches multiple services, and to make things worse, each of these services are dynamic. Containers are created and destroyed automatically as the system changes and as deployments are made.
Still, communication between these services needs to be happen seamlessly, and this is the job of the service mesh.
A service mesh is a communication layer between services that handles east-west traffic between microservices. The reason the service mesh is important in microservices is that communication between distributed apps is more complex than with monoliths.
Despite its complexity, microservices architecture brings advantages in performance, control over each service, capability of the system to adapt to changes, and visibility across the network. These advantages make it worthwhile to endure the management overhead of a more complex system.
A service mesh makes it easier to manage networking for microservices.
The Role of a Service Mesh
The most basic responsibility of a service mesh is to handle core networking tasks like load balancing and service discovery. Apart from this, a service mesh introduces advanced tactics like circuit breaking and failure-inducing, which help provide network performance that a cloud native application needs.
In a complex microservices system, failures are common, but what matters is the network’s ability to reroute, retry, proactively fail, and report on these failures.
Load balancing in a service mesh
Load balancing is dynamic in a cloud-native application that can have varied performance because of all the moving parts. The load balancer in a service mesh needs to consider the health of individual instances before sending requests to them. It can hold back or route traffic around unhealthy instances and help avoid emergencies and provide more reliable service.
The load balancer may actively poll the service discovery part of the service mesh, checking for healthy instances, or it may passively respond to failed requests and cut off traffic to instances only based on performance.
Apart from this, load balancing in a service mesh uses algorithms to decide how to route traffic across the network. In the past, routing was simple, using methods like round robin or random routing. With a modern service mesh, load-balancing algorithms now consider latency and variable load on the backend instances.
Service discovery in a service mesh
Service discovery is the process of identifying new instances as they are created and keeping a record of instances that are removed from the network. This record is vital for load balancing to function, as requests are processed only by healthy and available backend instances.
In a dynamic microservices application, service discovery should happen automatically. This is done by having the tool responsible for starting and stopping the system report every event. In Kubernetes, the ReplicationController is responsible for instance lifecycles.
Sidecar proxy
Traditionally the load balancer would sit between the client and server, but now, advanced service meshes attach a sidecar proxy to a client-side library. This ensures every client gets equal access to the load balancer. Additionally, it avoids the single point of failure, which was the biggest drawback of a traditional load balancer.
The sidecar proxy has become the preferred way of implementing a service mesh for a distributed system.
Monitoring the Service Mesh
Visibility is key to successful networking for cloud-native applications, and a service mesh has multiple ways of enabling monitoring. It provides a combination of network performance metrics like latency, bandwidth, and uptime monitoring. It does this for every level of the stack -- hosts, containers, pods, and clusters -- and it provides detailed logging for events that help with troubleshooting.
Distributed tracing is a key factor for visibility; it gives each request an ID as it passes through the network and shows the path each request takes as it passes through the network. Using this, you can tell which parts of the network or which instances are slow or unresponsive and understand what needs fixing.
With the increased complexity of a microservices application, it’s not easy to simply reproduce an error on an instance. You need powerful monitoring tools to understand the path of requests and identify all the problem areas (there will be more than one).
Service Mesh Tooling
The two most prominent service mesh tools today are Linkerd and Istio. Linkerd was the first tool to take a service mesh approach to networking and has gained wide adoption in many production workloads. Istio, though released over a year later, has now added an additional management layer to distributed networking.
Istio sees other service mesh tools as data planes and itself as a combination of a data plane and a control plane. Istio uses Envoy, another popular tool similar to Linkerd, as its data plane. There is lots of compatibility between these tools, though, as Istio can use Linkerd as its data plane as well. What Istio brings to the table is advanced policy-based management and an abstraction layer that brings an even more powerful distributed approach to networking.
Taking a different route, Buoyant has recently announced Conduit, which focuses exclusively on Kubernetes. It provides a lightweight, simpler alternative to the more feature-heavy Linkerd. Ideally, it's meant for organizations that are all-in on Kubernetes and want a quick way to get started and easy management thereafter.
Security is key to networking, and one tool that is leveraging policy-based security is Project Calico. Rather than rely on peripheral firewalls for the entire application as was the case with monoliths, Calico helps create micro-firewalls around each service within a microservices application. It then gives you fine-grained management controls to enforce security policies that isolate each service from other services. This way, even if one service is compromised, the others remain intact.
As we move from monoliths to microservices, how we manage the application network will be critical to success. The service mesh is a superior alternative to traditional networking models, and understanding it should be foundational when working with modern cloud-native applications.