Kafka to Parquet in minutes
Apache Kafka is a popular open-sourced event-streaming platform.
It’s focused on allowing enterprises to use real-time data as the backbone of their operations. Kafka provides an event-based backbone for many Fortune 500 companies.
Apache Parquet is an open-source, column-oriented data storage format of the Hadoop ecosystem designed to provide fast querying on large datasets. Parquet is routinely used for creating very highly scaled data lakes that can still be queried. Parquet is similar to other column-storage file formats that are available in Hadoop.
Estuary helps move data from
Kafka to Parquet in minutes with millisecond latency.
Estuary builds free, open-source connectors to extract data from Kafka in real-time, allowing you to offload data to various systems for both analytical and operational purposes. Kafka data exists in a stream and often benefits from being organized into a data lake or placed into a warehouse for analysis with history.
Data can then be directed to Parquet using materializations that are also open-source. Connectors have the ability to push data as quicikly as a destination will handle. Parquet likes files that are around 1 GB each. So, if you have high data volumes, Flow can keep your data lake up-to-date in near real-time.