My solution for designing a resilient system in an event-driven pipeline

mobin shaterian
DataDrivenInvestor
Published in
3 min readSep 12, 2022

--

I designed event-driven architecture for our fintech software. It consisted of three parts. Part one had a sync process, and Parts two and three had async operations. After responding to the client, I send the final result with Kafka ( Message broker ) to parts two and three. In part two, I settle with the customer. Certainly, sending data from part one to part two is essential for business. Furthermore, the DevOps team couldn’t stay Kafka Online, and sometimes Kafka was down! Significantly, I know that Kafka has a resilient system for persistent data, but Kafka may be down, and we lost the vital data.

event-driven pipeline

As shown in the upper picture, If Kafka is down parts two, and three don’t get data. So I need a resilient system to recover lost data.

Solutions

I offer many solutions to this problem, but I don’t know exactly which one is better.

Debezium

Debezium is one of the best CDC tools that can handle this problem. As we know, Debezium reads data from logs of the database, and if Kafka stops or something terrible happens to Kafka, It will stop unless Kafka is back.

https://debezium.io/

Unfortunately, we don’t use this tool in this project because the DevOps team does not agree with this architecture.

Outbox Design Pattern

An outbox design pattern is straight forward way to solve this problem. Based on this design pattern, we insert data in the outbox table before sending data to Kafka, and when we receive data in part two, we add the received field to the outbox table.

Actually, after inserting data in the outbox in part one, send data to part two via Kafka, and in part two, call GRPC and again insert the received data in the outbox in part one.

In another scenario, I can make another microservice (recovery) that puts an outbox table inside it. It means calling GRPC to send and receive data before and after using Kafka. After that, we check all data that has to send and receive data.

Inbox and compare with actual data

In this method, I copy data by Kafka into the inbox in part two, and every hour, I check the difference between the actual output and data in the inbox. If some data are missing, I can understand the contradiction between the inbox and output part one( I have output data of part one in another table). I know this method needs more Hard disk and IO, but I am sure all data go to part two.

Subscribe to DDIntel Here.

Join our network here: https://datadriveninvestor.com/collaborate

--

--