đź’Ą Change Data Capture and Schema Augmentation: The Devil is in the Details
Many companies struggles to mane efficiently the CDC event tables with the many change requests to implement from the data pipelines

Many third-party software companies (Fivetran, Striim, Datastream, etc.) provide solutions to stream Change Data Capture (CDC) events from databases and allow schema augmentation at the destination (like Redshift, BigQuery, etc.).
But what happens when you have massive tables on the data source and you are deploying DDL changes on the tables you want to stream changes from? Suddenly, you lose the changes, or you lose the events, and you need to reinitialize the entire streaming pipeline...
In many cases, it requires DBA expertise to make it smooth, and even with that, it is still not that simple.
The CHDS Solution to an Obscure, Yet Very Real Problem
CHDS has implemented a solution to address this obscure but very real issue.
Why Managing DDL with CDC is Crucial
- Complex Transaction Management: Your CI/CD pipeline provides assurance for the performance of the actual DDL change, but when CDC is in place underneath, it requires complex SQL transactions and lock management. Without expert implementation, this can lead to disaster.
- Retention and Large Tables: The default retention setup (e.g., 3 days of events) is usually fine for medium-sized databases. However, for large ones, a more subtle metadata repository design can go a long way to make the deployment much better, have less downtime during the lock for the CDC event table structure change, and not lose any events for the streaming pipeline.
- Lack of Observability: Many companies do not observe their CDC tables and how retention and table size are handling the stream of events. It is important to take a closer look and make sure the velocity still keeps up with the DML activities in your database.
- Operational Risks: When CDC events are not well handled, data files and log runaway files could occur, jeopardizing your storage and log management.
For all these issues—highly technical and highly specialized for database reliability engineers—CHDS implemented the solution for clients with hundreds of millions of events per day and many DDL changes occurring on the hundreds of tables implicated in the streaming pipelines.
It has been a full year now since it was implemented, and the client has completely forgotten about the implementation as it works like a well-oiled machine. The client also needed help to automate the rebase of the production databases with obfuscation and the CDC setup for the other environments without breaking a sweat.
Results of the Implementation
- Peace of mind for schema augmentation throughout the entire streaming pipelines.
- Peace of mind for CI/CD deployment of DDL even for large tables: in milliseconds (for small tables) and a few seconds (for large tables with loads of CDC events to retain). This automation allows data engineers implementation freedom and automation.
- More free time for the reliability engineer and database administrator.
- Systematic approach, even with computed columns or specific usages.
Do not discount the risk of operational activities that can be disturbed by a poor CDC configuration.
Do not hesitate to reach out to CHDS if you need a demo, a presentation, or to discuss your specific needs.