Is Kafka a database?

Your thoughts?

|

In 2022, the answer is yes. While Kafka was originally architected as an event streaming platform, it's ability to durably store an ordered commit log make it a preferred use case for certain use cases.

For example, if you're solution emphasizes the consumption of historical data (like bank transactions) then Kafka is a great means of storing information as events can be quickly consumed in order from a specific point in time.

There are also solutions like ANSI SQL Queries and ksqlDB which give you the ability to interact with a Kafka cluster like a traditional RDBMS through SQL like query languages. These solutions don't require a database but instead serve as abstractions over native Kafka functionality.

TLDR: Kafka can be used as a database but is not meant to replace all databases. How you consume from Kafka drives the answer.

|

Kafka is an event streaming platform than can function as a database.

Kafka can be used as a database but is not meant to replace other databases.

When consuming historical events, Kafka can be leveraged as an efficient database. The New York Times relies on Kafka to store and retrieve every article published.

When manipulating and analyzing data streams, Kafka can be relied on as a messaging platform. Using Kafka connect, data can be easily imported/exported to alternative data stores or offloaded to HDFS etc.

Kafka is a database but should not replace other databases depending on your use case.

|

Kafka is not a database but it can be used as a database.

Kafka is technically an event streaming platform. This means its primary function is to publish/consume messages in real time.

Kafka retains messages based on a retention policy. It is possible to set this configuration to "forever" meaning Kafka permanently retains or stores the message.

So while Kafka can "store" data, it wasn't architected as a database (emphasizing the storage and retrieval of information) but rather an event streaming platform.

Most companies leverage Kafka to process data in real time but augment this with more robust solutions for storing information long term.

|

No.

|

No. Kafka is an event streaming platform.

|

Yes and no.

Yes because you can configure Kafka to retain messages forever. Some companies have proven the use case for Kafka as a DB.

No because Kafka was build as an event streaming platform. It's design emphasizes the efficiency and scalability of publishing/consuming messages, NOT the storage and retrieval of information.