It was a rainy morning two months ago, and I was the first one in the office. Our scheduler algorithm grabbed my attention before I could even take a sip of coffee! The office monitor mounted on the wall projected too much noise on the graphs. In short, the distribution of devices was unbalanced. Denis, our technical lead, had just come into the office. He was the brain behind our current scheduler, so I hoped that he could help me decide on the right person to assign with the task to rewrite the scheduling algorithm. We agreed that Antonija Samaržija was our man for the job. Even though she had never performed this kind of task before, we placed our confidence in her because of her strong background in math and her demonstrated ability to transfer that knowledge into practical algorithms.
It was a rough start! Antonija had a lack of hands-on experience, so she had to learn the ropes. Therefore, she needed some guidance from a more experienced colleague. Her mentor Denis Koronić offered a helping hand. But it wasn’t an easy task. After a while, the devices were better distributed, but the device switching between workers was occurring too often, which took a heavy toll on our databases; they were getting overloaded. Besides, Antonija had to learn by trial and error. Just when she thought she had found the solution, things would fall apart. There were many factors that had to be taken into account in the process, and a single deviation could lead to a cascading failure – overall, it wasn’t easy to move the needle.
However, during our status update meetings, I realized that Denis and Antonija were on the same page and that there was a big chance for improving the then-current way of job scheduling. I am glad that Denis had patience with his apprentice because, after a while, she acquired skills that made her more independent so that she could perform better and faster. Finally, Antonija approached me last month because she wanted to test the thing. Previous attempts to create a stable and more effective scheduler always brought her back to square one, so this time, she did not put any expectations on her performance.
To perform the test, we deployed the new scheduler in the middle of the night for a short period. Initial measuring was very promising, but we needed a more massive load to verify its odds. We would regularly increase the duration of trials until we were convinced that it was stable enough for the infamous Monday morning. The Monday three weeks earlier was our first chance to run the test. At 6:25, we deployed the new scheduler and then waited tensely for the results.
Figure 1. On the left side of the above image is the old scheduler. The right side shows the state after deploying the new scheduler.
In a nutshell, the new scheduler introduced the concept of “simple” round-robin scheduling of devices between workers which schedules only the devices that have work to be done. The new algorithm takes into consideration different factors when deciding whether to switch a new device or not.
For us, it was a win-win situation. t-matix had acquired a better-optimized distribution of processing work on fewer resources, while Antonija completed a difficult task and built up some self-confidence. As far as I am concerned – to be an early riser proved to be the right call.
December 18, 2019
Continuosly building, testing, releasing and monitoring t-matix mobile apps