In an IoT platform, consisting of a [not so simple] set of services/units that work together, there is a necessity for intermediate storage/queue of records as well as event-based real-time processing of data. That is where Apache Kafka comes into picture, helping us solve all connected issues.
# Introduction to Kafka
Before going into the usage of Kafka here at t-matix, here are some basics about Apache Kafka. Kafka is a distributed streaming platform that keeps a distributed, scalable commit log of records.
Image 1. Screenshot taken from https://www.slideshare.net/KaiWaehner/rethinking-stream-processing-with-apache-kafka-kafka-streams-and-ksql
Messages are organized into topics that are persisted to disk and replicated to other servers for fault-tolerance. Producers write records to topics, while consumers read records from them. There can be multiple consumers subscribed to one topic, and each one of them keeps track where it stopped with consuming the topic.
This is really just some basic information about Kafka. More details on how Kafka works and other important concepts such as topic partitions, offsets, consumer groups etc., can be found in the company’s official documentation.
# Kafka in reporting part of the platform
The reporting part of the platform is responsible for preparing processed data for specific reports for easier and faster access. Having in mind that there are a few report types we support, each having different requirements, we decided to implement a microservice infrastructure. Among other benefits, it helped us to achieve loose coupling, independent deployability and scalability of different services.
However, this brought about another set of issues, the biggest being the question of how to move data between services, which we overcame by deploying Kafka in our processing pipeline. There are, of course, other technologies we could use to solve these problems, but we chose Kafka over them because it is horizontally scalable, fault-tolerant, fast, free and runs in production in thousands of companies. We were not able to find any other service having all of these characteristics.
The picture above shows a simplified view of how microservices collaborate together, and where and how Kafka topics are positioned between them. Apart from using Kafka to store data in a fault-tolerant way (using replication), we also use it to store data in two ways based on the retention of records: permanent and temporary (i.e. as a temporary queue).
## Usage of permanent topics
We are aware that people are skeptical about using Kafka as permanent storage, but this is mostly based on the wrong assumption that Kafka is just a “messaging queue”; and everybody knows you shouldn’t store data in a messaging queue.
Data in Kafka is persisted to disk, checksummed, and replicated for fault tolerance. Accumulating more stored data doesn’t make it slower, and that’s pretty much what you need from permanent storage.
However, storing data in Kafka works really well because it has been designed to store data like this. Data in Kafka is persisted to disk, checksummed, and replicated for fault tolerance. Accumulating more stored data doesn’t make it slower, and that’s pretty much what you need from permanent storage. [More info on this can be found here]. The topics we use as permanent storage, or better to say source-of-truth for our reports, are the entrance to our microservice ecosystem. When we need to create a new instance of a specific report, we read the data from our source-of-truth topic, process it, and write it to the temporary topic for further processing and storing. This enables us to load data from the beginning for every new instance we create, and after that just stream the new incoming data.
The topics we use as permanent storage, or better to say source-of-truth for our reports, are the entrance to our microservice ecosystem. When we need to create a new instance of a specific report, we read the data from our source-of-truth topic, process it, and write it to the temporary topic for further processing and storing. This enables us to load data from the beginning for every new instance we create, and after that just stream the new incoming data.
## Usage of temporary topics
As mentioned before, temporary topics hold data for one report instance and they only serve as a queue for further processing and storing in a database. Because of that, the retention of these topics is set to 3 days, which is long enough to give us time to debug and test data if something goes wrong, while it doesn’t take that much space in our Kafka cluster.
All these topics are created with an identical prefix, which allows us to use one really useful Kafka consumer feature, and that is reading data from multiple topics based on regular expression. With this feature we can use only one consumer to consume records from all temporary topics.
Using one topic for each report instance enables us to decouple the data and that gives us the flexibility we need. For instance, it allows us to delete specific topics when needed, when a report instance has been deleted.
Kafka can help you solve different problems/obstacles in various ways. Do not be afraid to use Kafka as a fundamental part of your infrastructure, either as storage, queue, or streaming engine.
Different parts of our platform use Kafka in different ways, but the result is always the same: it works, really, really well. And for the foreseeable future, it will be an integral part of our platform, hopefully even a bigger part in some later stages.
December 18, 2019
Continuosly building, testing, releasing and monitoring t-matix mobile apps