Amazon S3 to Parquet in minutes
Amazon S3, or Simple Storage Service, is an object storage service offered by AWS. S3 is a cost-effective, durable, and elastic resource, making it easy to store data in the cloud.
S3 lets you organize your data into “buckets” and provision access, so you can share data quickly, easily, and safely.
Apache Parquet is an open-source, column-oriented data storage format of the Hadoop ecosystem designed to provide fast querying on large datasets. Parquet is routinely used for creating very highly scaled data lakes that can still be queried. Parquet is similar to other column-storage file formats that are available in Hadoop.
Estuary helps move data from
Amazon S3 to Parquet in minutes with millisecond latency.
Estuary builds free, open-source connectors to extract data from S3 as soon as it arrives, allowing you to easily create always-up-to-date copies of that data across your systems.
Data can then be directed to Parquet using materializations that are also open-source. Connectors have the ability to push data as quicikly as a destination will handle. Parquet likes files that are around 1 GB each. So, if you have high data volumes, Flow can keep your data lake up-to-date in near real-time.