3 Comments
User's avatar
Emanuel Oliveira's avatar

Are suggested saas/tools chosen/ordered just for their CDC capabilities, or overal.. like Fivetran best on (ETL including cdc), etc?

Expand full comment
Estuary's avatar

The CDC vendors were compared by use case first, and features second. Perhaps the most important takeaway is that you can choose a technology for a specific use case and if that technology was built for that use case, it's going to work well. But eventually you will end up requiring a single tool that supports CDC to share data across different use cases. Once you get there most of these tools will struggle. The article tries to explain why.

Expand full comment
Emanuel Oliveira's avatar

perfect, thanks. Yes i agree.. most forget the single most important thing in analytics is CDC (we need the data) and the more automated the best, let snowflake or spark do the transformation. I have used ETL tools like Apache NiFi which uses a get-transform-put pattern, and its addictive easy no-code transformations of records but.. its manual pull..

I think best to transit data between databases, would be simply source database to contain a CDC module configuration driven.. 20 or more years ago vendors made hard getting data out (we needed explicit code dxports and imports). It's hard for me to see how in the XXI century of everyone merging data from multiple legacy databases and this foundation capacity still no existing on databases vendors, or at least not in compliance with the throughput desired.. only way out is either manual CDC (or binlog in case debezium supports your db) with need of orchestrator and kabum once again simple data transport becomes another project, or rest api that rain drop deltas few dozen or hundred records maximum at time.

The data transport shouldn't need its own system to move data from a data source into some other target source..

Expand full comment