Welcome to the 2023 Estuary Real-time Data landscape. Want to get started with real-time insights and data products? Here are the tools and how they fit together.
Want to get started in minutes? Try Estuary for end-to-end real-time data operations.
There has been major innovation throughout the entire real-time data landscape over the last few years. Some of the most interesting, mature companies have emerged on the analytics side, but simpler, more powerful pipelines to get data from sources to destinations and enabling more companies to work with low-latency data.
The above diagram has four sections, where hybrid denotes an open source product that’s being provided as a managed service:
1. Capture
Extracting data from source systems. For the real-time landscape, most systems are technologies like databases (using the write-ahead-log) and streams since most SaaS APIs are batch in nature.
Some SaaS API’s do support streaming. Ex. Salesforce has a streaming endpoint.
2. Transport
Moving data from point a to b. The de facto standard here is Kafka, but there are some emerging options – almost all require engineers, maintenance and infrastructure.
Streaming transport is complex and doesn’t usually retain historical data. For this reason, most streaming systems can be viewed as a “buffer” of current events. Notable exceptions here are Pulsar, Gazette and Estuary.
3. Operational Transforms
An in-pipeline transformation that one uses to massage data before getting it to either your production systems (as a data product) or analytics environment.
Operational transforms in real-time systems usually come with some gotchas – calculating things like “lifetime customer value” can be very difficult because doing so requires state which grows without bounds in streaming systems. They are extremely important though since they get data into the right “shape” for analytics queries.
4. Analytic Transforms
The real-time equivalent of a data warehouse. These are systems that can be loaded in real-time and provide up to the second answers for queries as you ask them.
Note:
The diagram is oversimplified, and many companies straddle two or more areas. For example, we at Estuary do offer Operational Transforms because we believe a pipeline needs to be end-to-end, but our logo is in the area that most people associate us.
Easily capture data from systems using CDC (change data capture), transport, transform it in motion and sync it where you want it, such as analytics or operational systems.
Ably
Transport
Simple transportation layer for events.
Amazon Kinesis
Transport
Amazon’s Pub/Sub system. Manages events produced by one system and subscribed to by another (or pub/sub).
Azure Web PubSub
Transport
Microsoft Azure’s Pub/Sub system.
Arcion
Capture
Low latency captures from databases using CDC.
Bytewax
Operational Transforms
Bytewax makes it turnkey to transform streaming data using python.
Clickhouse
Analytic Transforms
Real-time SQL transforms on Clickhouse by the team that created it.
Confluent
Transport & Operational Transforms
The original company behind Kafka with a core business model of managing Kafka.
Datacater
Transport & Operational Transforms
Managed Kafka stream to python transformations.
Decodable
Capture & Operational Transforms
Capture using managed Debezium and transform using managed Apache Flink.
Deltastream
Analytic & Operational Transforms
Managed service for analytic and operational transforms.
Firebolt
Analytic Transforms
Real-time analytic transforms using an improved version of managed Clickhouse.
Imply
Analytic Transforms
Real-time analytic transforms using managed Druid by the team that created it.
An end-to-end system that supports capturing data from databases in real-time using their write-ahead-log, transporting it, transforming it and materializing into destination systems.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok