Back to Blog

Real-Time Data Streaming with Kafka CDC & Debezium

Implementing Change Data Capture for MySQL

February 10, 2026Data Engineering

Change Data Capture (CDC) is a design pattern that tracks and streams every change made to a database (inserts, updates, deletes) in real-time. It's the backbone of modern event-driven architectures.

What is CDC and Why It Matters

In traditional data architectures, we often relied on batch processing (ETL) or complex polling mechanisms to synchronize data between systems. These methods introduce latency and put unnecessary load on production databases.

Modern Architectures

CDC enables microservices to stay in sync, powers real-time analytics dashboards, and feeds search indexes without manual intervention.

Zero Data Loss

By reading from the database's transaction log (like MySQL's binary log), CDC ensures every single event is captured even if the service is temporarily offline.

Debezium & Kafka Connect Integration

Debezium is an open-source distributed platform for change data capture. It sits on top of Kafka Connect, which provides the framework for streaming data between Kafka and other systems.

How it works:

  1. Source DB: MySQL/Postgres writes a change to its transaction log.
  2. Debezium Connector: Monitors the log and formats changes into standardized Kafka events.
  3. Kafka Connect: Manages the execution, scaling, and fault tolerance of the connector.
  4. Kafka Topics: Changes are stored as streams of events for downstream consumers.

End-to-End Setup Guide

Setting up a production-ready CDC pipeline involves three main phases: environment preparation, connector installation, and configuration.

Phase 1: Install Debezium Connectors

Download the appropriate connector for your database. Here we use MySQL as an example.

# 1. Create a plugins directory
sudo mkdir -p /opt/kafka/plugins

# 2. Download Debezium MySQL Connector
wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/3.0.0.Final/debezium-connector-mysql-3.0.0.Final-plugin.tar.gz

# 3. Extract to plugins folder
sudo tar -xvzf debezium-connector-mysql-3.0.0.Final-plugin.tar.gz -C /opt/kafka/plugins

Phase 2: Configure Kafka Connect (Distributed)

The distributed mode is recommended for production as it provides high availability and scalability.

# edit /opt/kafka/config/connect-distributed.properties

bootstrap.servers=192.168.0.111:9092,192.168.0.112:9092,192.168.0.113:9092
group.id=kafka-connect-cluster

# Storage topics for metadata
config.storage.topic=connect-configs
offset.storage.topic=connect-offsets
status.storage.topic=connect-status

# Plugin path is critical
plugin.path=/opt/kafka/plugins

Phase 3: Deploy the Connector

Submit your configuration to the Kafka Connect REST API using a JSON payload.

# Save this as mysql-connector.json
{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "192.168.0.200",
    "database.port": "3306",
    "database.user": "debezium_user",
    "database.password": "dbz_pass",
    "database.server.id": "184054",
    "topic.prefix": "prod_db",
    "table.include.list": "inventory.customers,inventory.orders",
    "schema.history.internal.kafka.bootstrap.servers": "192.168.0.111:9092",
    "schema.history.internal.kafka.topic": "schema-changes.inventory"
  }
}

Submit the configuration using curl:

curl -i -X POST -H "Content-Type: application/json" \
  --data @mysql-connector.json \
  http://localhost:8083/connectors

Kafka Connect REST API Reference

Use these endpoints to manage your connectors in production:

MethodEndpointDescription
GET/connectorsList all connectors
POST/connectorsCreate a new connector
GET/connectors/[name]/statusCheck connector health
PUT/connectors/[name]/pausePause a connector
DELETE/connectors/[name]Delete a connector

Critical Prerequisites

  • MySQL Binary Log: Must be enabled (`log-bin`) and set to `ROW` format.
  • User Privileges: The DB user requires `SELECT`, `RELOAD`, `SHOW DATABASES`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` permissions.
  • Schema History Topic: Ensure this topic is created with sufficient retention and 1 partition.

Production Considerations & Challenges

Binary Log Management

MySQL binlogs can grow rapidly. Ensure you have a retention policy (`binlog_expire_logs_seconds`) that allows enough time for Debezium to catch up if the connector stops.

Tuning Throughput

Increase `max.batch.size` and `max.queue.size` for high-volume databases to reduce overhead and improve streaming performance.

Schema Evolution

Always use a Schema Registry (like Confluent/Apicurio). CDC events break downstream consumers if schema changes aren't handled gracefully.

Pro-Tip: Monitor Connector Health

Always monitor the "Snapshot in Progress" vs "Streaming" state.

curl -s http://localhost:8083/connectors/inventory-connector/status | jq