Kafka | All you need to know

Himanshu Tripathi
GoPenAI
Published in
6 min readMar 8, 2022

--

https://unsplash.com/photos/JKUTrJ4vK00

Suppose you’re working in a company and are dealing with real-time data, at first, when the data size is small and does not require any further analysis you are good to go, but as the data size increase and you need to store the data for further analysis then it’s become very hard to handle the data and do analysis on real-time data than how to make it work properly. Here comes Kafka into the picture.

What are we going to cover in this article?

  • What is Data Pipeline
  • Why do we need a system like Kafka?
  • Kafka, installation
  • Python Kafka implemenation

So let’s start…..

What is Kafka?

https://uploads-ssl.webflow.com/5f3acb2672fdcd05b7611500/5fdb9e7105edc00d5378b856_kafkalogo.jpg

Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.[wiki]

Kafka is a Distributed Event Streaming platform capable of storing and processing trillions of Messages or Data Streams per day. Users can access such instantaneous and real-time data to build event-driven applications.

https://kafka.apache.org/24/images/streams-architecture-overview.jpg

The Kafka Ecosystem consists of various Clusters that run a set of Kafka Servers or Brokers to store and manage Real-time Event Data. Usually, users separately create Producer, and Consumer consoles using Command-line Prompts to Publish and Consume messages, respectively.

Kafka is also known as Pub/Sub messaging system, if you have ever worked with Google Cloud Platform you know what is pub/sub messaging system is.

Let’s try to understand what is Pub/Sub messaging system is.

A messaging system sends messages between processes, applications, and severs using Topics.

  • Pub (Publisher) -> it’s simply a system in which…

--

--

NLP || Machine Learning || Deep Learning || Data Science || Web Developer || Android Developer (UI) ||