Kubernetes – what a hype! Let’s take a look at the origin of the word ‘Kubernetes’. It comes from ‘kubernh/thv’, a Greek word for ‘helmsman’ or ‘sailing master’. In other words, when you sail in the rough sea of servers, services, and wild internet, it is crucial to have an excellent helmsman to cruise safely.
Kubernetes is a layer over the standard virtual machine with OS, in much the same way the hypervisor is a layer over the hardware, and the end result is the virtual machine. With Kubernetes and Docker as the basic unit, we abstract virtual machines and servers for the benefit of applications. Finally, the latter is what we primarily focus upon; in other words, applications are at the core of our business – IoT in our case.
Containers for applications are popular because they are portable across all OS distributions; they are lightweight; they have resource isolation and utilization and support CI and CD easier. That’s actually the reason we use it in t-matix as well.
Kubernetes is a proven technology from the giant Google, from way back when containers were still in its diapers. Back then, the system was called Borg (resistance is futile for Trekkies), and after Google open-sourced the system, it got a much friendlier name – Kubernetes.
So, what are the benefits of Kubernetes? There is a plethora of information on this, and I will try to keep it as simple as possible.
Kubernetes provides us with ease of deployment but not just that:
- Container orchestrator – run containers in a cluster and don’t think about the server; instead, focus on the application.
- Service discovery and load balancing – using internal DNS for naming services. Services can also be load-balanced in case of high load.
- Automated rollouts and rollbacks – meaning you can use canary deployment or similar, and if something goes wrong, just rollback to a previous version with minimal downtime.
- Self-healing property – in case a container crashes, it automatically starts a new container.
- Cloud support – using almost any popular cloud provider, such as AWS, GCP or Azure, which are integrated with Kubernetes.
- Want more? Visit the Kubernetes website: https://kubernetes.io.
Finally, all that is why we use Kubernetes here at t-matix. After all, it is a proven technology with an enormous amount of benefits and a healthy community. This was a short, and hopefully concise, introduction to what Kubernetes has to offer.
How do we use it?
Our IoT system needed one big and efficient part, and that was the reporting subsystem. Thus, a completely new project was initiated. Everything was developed from scratch, the codebase separated from the old one, new architecture in place, and fresh ideas and energy pushing the old system forward. The project “Reporting” was designed with microservices in mind, Kafka as the central bus/storage and Postgres as separate storage.
It was a perfect situation for introducing Kubernetes as a platform for reporting part of the system. Every microservice was embedded in the Docker image and pushed in private repo. So, here at Infrastructure we created a series of YAML files for the deployment of the reporting system.
Image 1. A small part of the system running on the Kubernetes cluster
The following diagram shows a small part of the system running on the Kubernetes cluster. Yellow circles are microservices in containers running in a Kubernetes cluster. Easily scalable services can crunch an enormous amount of data and store it in a database for analytics. All microservices read data from Kafka – the main bus. Another microservice uses logical replication to get the data from the primary database.
We used a neat little tool written in Golang and named eksctl (https://eksctl.io) to create and manage the Kubernetes cluster on AWS. Then we created a separate namespace with a very creative name – ‘reporting’. After that, we started to deploy service by service until everything was running and crunching data. Scaling was as trivial as changing a single number in a YAML file.
Here is an example of a deployment file for the time series microservice:
In the previous YAML, we can see lots of different keys, but a keen eye will notice YAML key replicas. This number defines how many pods (the minimal Kubernetes unit) should run concurrently. We have 2 pods – but we can quickly scale to 5, 10, or 100 if we need to. Just change the number and apply that change to the Kubernetes cluster.
Here is an example of a command:
It is not necessary to change a YAML file – we can easily run the following to achieve the same effect:
However, it is not recommended to use both methods at the same time, simply because we want to use something called a “single point of truth.” Also, we want our YAML files to represent the state of the system.
This principle guided us to the next level – GitOps. If the state in Git is the same as in the Kubernetes cluster, then we can use Git as a single point of truth.
But before actually dealing with GitOps, we needed the eyes and ears for the cluster. Software is software – it crashes, writes logs, restarts, etc. With Kubernetes you can quickly get a log of the pod and tail it with the command:
Neat, right? But… What if the command kubectl get pods -A | wc -l returns more than 100 pods?
No – you cannot tail a log pod by pod. Not usable.
We needed the central logging.
The central logging
After a small brainstorming session, we decided to use Graylog in combination with fluentbit. The decision where to run Graylog was easy – we ran it in the Kubernetes cluster. So, in the Kubernetes cluster, we created the graylog namespace to separate it from reporting. We deployed fluentbit as daemonset in the Kubernetes cluster to forward logs to the Graylog. We deployed Graylog and Elastic Search nodes with disks as StatefulSets.
That was a learning lesson for us – every pod (or namespace) should have the defined CPU and RAM limits
That was a learning lesson for us – every pod (or namespace) should have the defined CPU and RAM limits. In other words, if you don’t define the limits, a single misbehaving pod can cause the whole cluster to crash. Thus, we added our CPU and memory limits, and the cluster was stable again.
But we did more than that – we created a separate node group just for the central logging and monitoring systems. In the deployment YAML, you can find the nodeSelector where it is defined which node group a pod should run in.
We had previously used eksctl to create a node group. This time, we used the following YAML:
With this YAML, we defined the node group – server types, name, additional policies, max size, min size. To create the node group, we executed the following:
The result was two autoscaling groups of servers on the AWS – ready to scale on command.
All Graylog pods have the nodeSelector or “monitor-workers.” This way, we ensure that only central logging pods and additional monitoring pods like Prometheus and TICK stack are running on that group of servers in the cluster. A separate group of servers (node group) is dedicated to reporting only.
The future – GitOps
I have already mentioned that there is a need for a single point of truth for the Kubernetes cluster and IoT system deployment. So, we managed to store our state in YAML files, and we can now easily recreate the whole cluster with all microservices in minutes (15 minutes, to be precise). We did not just recreate a reporting subsystem; we recreated all services from live processing to workers and web portals, reporting subsystem, redis, rabbitmq, etc.
With all these YAML files, we are ready for Flux (https://fluxcd.io). That is the core part of the GitOps principle – when something is merged into the master or production branch, it is automatically picked up by Flux and applied to the Kubernetes cluster. A simple git push initiates deployment. That’s the risky part, and thus access to push/merge into the master or production branch on Gitlab should be restricted – very restricted.
Everything is stored in git, except for one crucial part of any system – the secrets. Usernames, password, certificates, private keys, and all things that cannot be committed in git.
Well, one would think so. But there’s the option of the public/private encryption.
The main idea is to create a public/private key pair. The public key is available to all developers, sysadmins, DevOps, and infrastructure people. With the public key, we encrypt secrets, and we can safely commit and push them to Gitlab. Nobody except the one with the private key can decrypt passwords.
To continue with the idea, we deploy a docker container that holds the private key and, in runtime, intercepts the deployment YAML and decrypts secrets, storing them in Kubernetes secrets or config maps, whichever is preferable. That way, we have middleware that helps us to utilize the GitOps principle and the way of work fully.
Fortunately, there is a finished tool for that called SealedSecrets – https://github.com/bitnami-labs/sealed-secrets
Since this is our future, it means that we do not utilize GitOps fully. We are in the state of playing with it. No SealedSecrets – yet. 🙂
As soon as we implement this process, we will share our experience here!
December 18, 2019
Continuosly building, testing, releasing and monitoring t-matix mobile apps