The Real-time Data Landscape

Welcome to the 2023 Estuary Real-time Data landscape.  Want to get started with real-time insights and data products?  Here are the tools and how they fit together.

Want to get started in minutes? Try Estuary for end-to-end real-time data operations.

 

There has been major innovation throughout the entire real-time data landscape over the last few years.  Some of the most interesting, mature companies have emerged on the analytics side, but simpler, more powerful pipelines to get data from sources to destinations and enabling more companies to work with low-latency data. 

 

The above diagram has four sections, where hybrid denotes an open source product that’s being provided as a managed service:

 

1. Capture

 

Extracting data from source systems.  For the real-time landscape, most systems are technologies like databases (using the write-ahead-log) and streams since most SaaS APIs are batch in nature.


Some SaaS API’s do support streaming.  Ex. Salesforce has a streaming endpoint.

 

2. Transport

 

Moving data from point a to b.  The de facto standard here is Kafka, but there are some emerging options – almost all require engineers, maintenance and infrastructure.

Streaming transport is complex and doesn’t usually retain historical data.  For this reason, most streaming systems can be viewed as a “buffer” of current events.  Notable exceptions here are Pulsar, Gazette and Estuary.

 

3. Operational Transforms

 

An in-pipeline transformation that one uses to massage data before getting it to either your production systems (as a data product) or analytics environment.  

Operational transforms in real-time systems usually come with some gotchas – calculating things like “lifetime customer value” can be very difficult because doing so requires state which grows without bounds in streaming systems. They are extremely important though since they get data into the right “shape” for analytics queries.

 

4.  Analytic Transforms

 

The real-time equivalent of a data warehouse.  These are systems that can be loaded in real-time and provide up to the second answers for queries as you ask them.

 

Note:

The diagram is oversimplified, and many companies straddle two or more areas.  For example, we at Estuary do offer Operational Transforms because we believe a pipeline needs to be end-to-end, but our logo is in the area that most people associate us.

 

Products offered as SaaS Solutions

Company & Product

Solution

Background

Capture, Transport & Operational Transforms 

Easily capture data from systems using CDC (change data capture), transport, transform it in motion and sync it where you want it, such as analytics or operational systems.

Ably

Transport

Simple transportation layer for events.

Amazon Kinesis

Transport

Amazon’s Pub/Sub system.  Manages events produced by one system and subscribed to by another (or pub/sub). 

Azure Web PubSub

Transport

Microsoft Azure’s Pub/Sub system.  

Arcion

Capture

Low latency captures from databases using CDC.

Bytewax

Operational Transforms

Bytewax makes it turnkey to transform streaming data using python.

Clickhouse

Analytic Transforms

Real-time SQL transforms on Clickhouse by the team that created it.

Confluent

Transport & Operational Transforms

The original company behind Kafka with a core business model of managing Kafka.

Datacater

Transport & Operational Transforms

Managed Kafka stream to python transformations.

Decodable

Capture & Operational Transforms

Capture using managed Debezium and transform using managed Apache Flink.

Deltastream

Analytic & Operational Transforms

Managed service for analytic and operational transforms.

Firebolt

Analytic Transforms

Real-time analytic transforms using an improved version of managed Clickhouse.

Imply

Analytic Transforms

Real-time analytic transforms using managed Druid by the team that created it.

Operational Transforms

Managed Spark

Materialzie

Analytic Transforms

Real-time analytic transforms using open source SQL built on top of Timely Dataflow.

Memphis.dev

Transport

Simple but powerful transport layer.

Meroxa

Capture & Operational Transforms

Capture and transform real-time data.

Google Cloud Pub/Sub

Capture

Google’s Pub/Sub system.

Google Cloud Dataflow

Transform

Managed Apache Beam, allowing you to coordinate batch and streaming transforms using your favorite transformation system.

Oracle Golden Gate

Capture

Capture data from Oracle systems using their managed, proprietary product.

Rockset

Analytic Transforms

SQL transformations in real-time by the creators of RocksDB.

Transport

Transport data using the Kafka protocol and a full re-write of Kafka for greater efficiency.

Singlestore

Analytic Transforms

SQL transformations in real-time.

Startree

Analytic Transforms

SQL transformations in real-time built on top of managed Apache Pinot.

Streamnative

Transport

Managed Apache Pulsar.

Streamsets

Capture & Operational Transforms

Capture and transform data through a GUI.

Striim

Capture & Operational Transforms

Capture data from databases using managed CDC and transform it in-motion.

Upsolver

Operational Transforms

Transform micro-batches using SQL.

Quix

Operational Transforms

Real-time python transformations.

Analytic Transforms

SQL-based analytic transforms on time series data.

Tinybird

Capture, operational & Analytic Transforms

Managed Clickhouse for the easy creation of real-time data apis and analytics.  Some sources available to capture from out of the box.

Open Source Frameworks

Project

Solution

Background

Operational Transforms

A framework that allows for transforming data from both batch and streaming systems.

Analytic Transforms

A real-time analytics engine which quickly indexes streaming data allowing for efficient, high scale queries.

Operational Transforms

A stream processing framework that is natively event based.

Transport

A highly popular streaming system built by Linkedin.

Analytic Transforms

A real-time analytics engine which offers real-time SQL queries on high-scale streaming data.

Transport

A streaming system that has native cloud storage options.

Operational Transforms

A stream processing framework that is natively batch-based and expanded to near real-time micro batches.

Analytic Transforms

A real-time analytics engine which offers real-time SQL queries on high-scale streaming data.

Capture

A framework for capturing data from databases in real-time using their write-ahead-log.

Capture, Transport & Operational Transforms

An end-to-end system that supports capturing data from databases in real-time using their write-ahead-log, transporting it, transforming it and materializing into destination systems.

Transport

A streaming system that natively stores data in cloud storage enabling unlimited lookback and direct reads by batch systems.

Leave a Comment

Your email address will not be published. Required fields are marked *