Estuary

MongoDB to Redshift ETL: Full Guide

Need to sync data from MongoDB to Redshift? Look no further — find two step-by-step guides (and the method that's right for you) here!

mongodb to redshift
Share this article

If your product runs on MongoDB and your business relies on analytics, there's a good chance you need to move data into a structured environment like Amazon Redshift. MongoDB excels at handling flexible, high-velocity operational data, while Redshift is built for fast, scalable SQL queries across large volumes of records.

Teams often find themselves needing a reliable pipeline from MongoDB to Redshift to unlock use cases like:

  • Centralized reporting and dashboards
  • Cross-system analytics for product, finance, or operations
  • Feeding data science or machine learning pipelines
  • Powering reverse ETL to activate insights across tools

But syncing MongoDB to Redshift is not a plug-and-play task. These systems have fundamentally different structures. MongoDB stores deeply nested JSON documents with varying schemas. Redshift expects clean, columnar data optimized for analytical performance. Add in real-time requirements, and the gap widens.

In this guide, you’ll learn how to set up a robust MongoDB to Redshift ETL pipeline — first with a manual CSV-based approach, then with a fully managed, real-time solution using Estuary. Whether you need a one-time export or a streaming sync that updates Redshift within seconds, you’ll find a method that fits your stack.

MongoDB to Redshift in Two Ways

  • Method 1: Manually move data from MongoDB to Redshift using mongoexport, S3, and the COPY command
  • Method 2: Use Estuary to move data from MongoDB to Redshift continuously

MongoDB Data to Redshift in Minutes

Method 1: MongoDB to Redshift Using mongoexport and COPY

This section covers the step-by-step process of moving data manually from MongoDB to Redshift.

In order to complete the process, first, you need to generate a .csv file of the MongoDB database. MongoDB provides two different export formats: JSON and CSV. Use the mongoexport command to export the data.

mongoexport is a command line tool used to export data stored in MongoDB database in CSV or JSON format. This tool is a part of MongoDB tool packages. You need to run mongoexport in the command prompt, not in the Mongo shell.

Open the command prompt, connect to the MongoDB server, and follow the below steps. 

Step 1: Use mongoexport to export the collection from MongoDB to a JSON or a CSV file.

plaintext
mongoexport --collection=collection_name --db=db_name --out=outputfile.csv

db_nameName of the database.

collection_nameName of the collection to be exported.

outputfile.json: Name of the .csv file you want to export.

Step 2: Perform a transformation on the .csv file and upload the structured CSV or JSON to an AWS S3 bucket with either of the below methods.

2.1: As MongoDB supports flexible schema, it is difficult to understand a collection and create a compatible table in Amazon Redshift. Therefore, before uploading the file to the S3 bucket, you need to create a table schema. 

2.2: You can also install the AWS CLI to upload files into S3 from your local machine. The AWS CLI makes it easy to upload files into the S3 bucket. If you already have installed the  AWS CLI, use the below command to upload .csv files in the S3 bucket. After moving .csv files into the S3 bucket, you can create a table schema from the command prompt.

plaintext
AWS S3 CP D:\outputfile.csv S3://S3bucket01/outputfile.csv

Step 3: Load data to Amazon Redshift with the COPY command from the S3 bucket to the predefined table.

If you are following Step 2 (2.1), use the following COPY command to move files from the S3 bucket to Redshift.

plaintext
COPY table_name from 's3://S3bucket_name/table_name-csv.tbl' 'aws_iam_role=arn:aws:iam::<aws-account-id>:role/<role-name>' csv;

If you are following Step 2 (2.2), then run the below command to move files from the S3 bucket to Redshift using COPY command. To load files in CSV format, add csv at the end in your COPY command.

plaintext
COPY db_name.table_name FROM ‘S3://S3bucket_name/outputfile.csv’ 'aws_iam_role=arn:aws:iam::<aws-account-id>:role/<role-name>' csv;

IAM_ROLEThis parameter is used to authenticate the AWS account id and role.

After successfully executing the above commands, you can check if all the tables are copied correctly in STL_LOAD_ERRORS. STL_LOAD_ERRORS stores a history of all records and displays the errors that occurred while loading the data from the respective file. For reference, check the list of possible load errors here.

If you don’t get any STL_LOAD_ERRORS then your MongoDB to Redshift ETL is successful. 

Challenges in MongoDB to Redshift ETL

While the above manual process might look trouble-free and cost-effective, it comes with the following challenges:

  • Schema Detection: Redshift stores data in a columnar form, and understanding the schemaless database of MongoDB before storing it in Redshift is a burdensome task. In a collection, different documents can have the same field names with different data types. For instance, a field called mobile_no can be stored as a string or number. Fields like this will have to be converted while creating a table schema in Redshift. 
  • Complex structure of objects and arrays: MongoDB can have multiple nested arrays and objects in a document. To understand this dynamic nested structure is a challenging job before uploading a file in your S3 bucket. 
  • Data Type Incompatibility: Redshift does not support all the data types that are supported by MongoDB. Therefore, this might be a major concern while migrating files using the COPY command. For efficient data storage and query performance in the Redshift data warehouse, it is important to understand the Redshift data types.
  • Real-time Complexities: Data in MongoDB is updated quickly. This can lead to a scenario where you regularly keep on uploading CSV files!

Thankfully, alternatives exist. You can use Estuary to create a real-time ETL pipeline from MongoDB to Redshift. 

Method 2: MongoDB to Redshift ETL using Estuary

For a continuous, automated pipeline from MongoDB to Redshift, Estuary is a reliable option. Estuary uses Change Data Capture (CDC) via MongoDB change streams to capture every insert, update, and delete in real time and materialize it into Redshift.

Before You Connect: Key Prerequisites

A few things to sort out before starting the Estuary setup that are not obvious until you hit a connection error:

1. authSource parameter for full-database users

If your MongoDB user has access to all databases, you must include ?authSource=admin in your connection address. Without it, authentication fails silently.

plaintext
mongodb+srv://username:password@cluster.mongodb.net/?authSource=admin

2. Allowlist Estuary IPs in MongoDB Atlas

If you are using MongoDB Atlas (or any provider that restricts inbound connections by IP), you need to allowlist the Estuary IP addresses before the connector can connect. Find the current IP list in the MongoDB connector docs. In Atlas, go to Network Access > Add IP Address and add each one.

3. Oplog retention for reliable CDC

Estuary uses MongoDB change streams which rely on the oplog. If the connector is paused and the oplog evicts old change events, Estuary will need to redo the backfill. Set a minimum oplog retention period of at least 24 hours. For Atlas, check your oplog window under your cluster's Metrics > Oplog tab.

4. Redshift S3 staging requirement

Estuary's Redshift connector stages data in S3 before loading to Redshift. Before configuring the materialization, you need:

  • An S3 bucket in the same AWS region as your Redshift cluster
  • An IAM user with s3:GetObjects3:PutObjects3:DeleteObject, and s3:ListBucket permissions on that bucket
  • The Access Key ID and Secret Access Key for that IAM user

5. Capture modes

Estuary supports three MongoDB capture modes: Change Stream Incremental (preferred, captures inserts/updates/deletes in real time), Batch Snapshot (full collection re-scan on a schedule), and Batch Incremental (scans only new documents by cursor field). Change streams require a replica set or sharded cluster. Standalone instances, views, and time series collections fall back to batch mode automatically.

Follow these steps to build a real-time data pipeline from MongoDB to Redshift with Estuary.

Step 1: Register for Estuary or log in if you already have an account. Once you log in, click on Captures.

mongodb to redshift - estuary captures tab

Step 2: In the Captures window, click on + New Capture. The Create Capture window will appear.

mongodb to redshift - estuary new capture

Step 3: In this window, type MongoDB in the search connectors. You’ll see the MongoDB tile. Click on Capture.

mongodb to redshift - estuary mongodb connector

Step 4: Now fill in the Capture details, providing a unique name for your capture.

Step 5: In the Endpoint Config, fill in:

  • Address: your MongoDB connection string, including ?authSource=admin if your user has full-database access
  • User: MongoDB username
  • Password: MongoDB password
  • Database: leave empty to discover all databases, or specify a single database

mongodb to redshift - mongodb endpoint config

Step 6: Once you have filled in all the details, click on Next. Estuary will initiate a connection with your MongoDB database and identify data collections.

Step 7: Click Save and Publish.

Step 8: Once the dialog box shows that you’ve published the capture successfully, click Materialize Collections

Step 9: Search and choose Amazon Redshift. Click on Materialize.

mongodb to redshift - redshift connector

Step 10: Fill in the Endpoint Config:

  • Address: your Redshift cluster endpoint and port (e.g., your-cluster.account.us-east-1.redshift.amazonaws.com:5439)
  • User: Redshift database user with CREATE TABLE and INSERT privileges
  • Password: password for that user
  • Database: target database name
  • Schema: target schema (default: public)
  • AWS Access Key ID: from your IAM user
  • AWS Secret Access Key: from your IAM user
  • Bucket: your S3 staging bucket name
  • Region: AWS region matching both bucket and cluster

Step 11: Finally, click on Save and Publish.

For more help on this method, you can visit the following links:

Give it a quick try to replicate your MongoDB data to Redshift in real time, as your first pipeline is free!

Summary

Migrating data from MongoDB to the Redshift warehouse is definitely a great way to streamline data for analytical purposes. If you are used to MongoDB querying and COPY commands, using the manual method is an easy and cost-effective approach. But considering manual efforts, programmatic challenges, and latency issues, the manual method isn’t an ideal approach for all real-time scenarios. However, using the second method, Estuary, you can streamline your ETL without requiring new investments in infrastructure or development.

Start streaming your data for free

Build a Pipeline

About the author

Picture of Jeffrey Richman
Jeffrey RichmanData Engineering & Growth Specialist

Jeffrey is a data engineering professional with over 15 years of experience, helping early-stage data companies scale by combining technical expertise with growth-focused strategies. His writing shares practical insights on data systems and efficient scaling.

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.