postgresreal-timeelasticelasticsearchPostgreSQL

10 min read

Last updated: June 25, 2026

3 Ways to Stream Data from Postgres to Elasticsearch

A Postgres to ElasticSearch pipeline is a great way to store and analyze tons of relational data- as long as you keep the two systems in sync.

Jeffrey Richman Data Engineering & Growth Specialist

Share this article

Summarize this page with AI

Start Building For Free

If you have a high volume of relational data and are thinking through how you’ll both store and analyze it, the Postgres to Elasticsearch stack could be a great fit, as long as you keep the two systems in sync.

In this tutorial, we’ll cover:

Overview of Elasticsearch
Overview of Postgres
When & Why to use ETL from Postgres to Elasticsearch in real time
3 Methods to stream data from Postgres to Elasticsearch

⚡ Want to stream data from Postgres to Elasticsearch in real time, without code or maintenance?
Estuary lets you build a fully managed, no-code pipeline in minutes.
👉 Jump to Estuary Setup ↓ | Start Free

Introduction to Elasticsearch

Elasticsearch, often called just “Elastic” or the “ELK stack,” is a popular open-source distributed search and analytics engine built on the Apache Lucene library.

You’ll find it particularly useful for applications such as search and discovery, real-time log analysis, and geospatial analysis.

This is because Elastic:

Supports full-text search
Handles both relational and non-relational data.
Scales horizontally to support large loads across multiple nodes.

If you’re comparing Elastic against popular SQL databases like MySQL or Postgres, it is particularly favorable to use when:

Building search-based applications (e.g. eCommerce search) or analyzing textual data, due to its ability to rank results based on how closely results match a text-based query.
Analyzing or working with high-volume data such as log data because it can scale across multiple nodes.
Geospatial or other applications that work with time series data.
You’re finding yourself creating many indices and would rather quickly apply filters instead.

Introduction to Postgres

You’ve probably heard of PostgreSQL, or Postgres. At least hopefully. It is the fourth most popular relational database after all.

Postgres is a popular open-source RDBMS that extends the functionality of SQL. With the rise of the cloud, its open-source nature, and a strong feature set, it has been on the rise over the past decade.

Postgres to Elastic - Graph Showing Postgres as 4th Most popular database, and Rising in Popularity Over Time

image source

You’ll find that Postgres is an extensible enterprise-class database capable of:

Storing large data volumes.
Handling complex data structures.
Working seamlessly with many other applications.

Replicating data from Postgres to Elasticsearch in Real Time

Suppose you have relational data in Postgres representing a set of emails. You want to enable someone to search the emails as a data source, for, say, remembering their partner’s birthday. You may want to build a search application to help the user query the data (and quickly so they don’t miss it!).

By replicating the data over to Elasticsearch, you may be in a better position to build a scalable application for text-based querying.

You may want to consider whether it would make sense to continuously load the data into Elasticsearch using real-time streaming, perform a one-time load, or batch the data on some frequency. Streaming the data in real time can drive better outcomes for use cases like building recommendation systems or trying to identify fraud.

As with most data pipelines, there are a number of ways and tools to perform ETL from Postgres to Elasticsearch in real time.

In this guide we’ll cover three popular ways to stream your data.

Method	Latency	CDC type	Managed?	In-flight transforms	Best for
Estuary	Sub-second (under 100ms)	Log-based (Postgres WAL)	Fully managed	Yes (SQL / TypeScript)	Real-time sync that propagates inserts, updates, and deletes with no infrastructure to run
PGSync	Near real-time	Log-based (logical decoding)	Self-managed	No	Simple, low-latency open-source replication when prerequisites are met
Logstash JDBC	Polling interval (for example every 5 minutes)	Query / timestamp-based (not true CDC)	Self-managed	Yes (Logstash filters)	Teams already on the Elastic Stack with timestamped tables, accepting production-DB load

Method 1: Fully Managed Postgres to Elastic via Estuary

If you don't want to install libraries and learn new tooling, Estuary is a right-time data platform that builds the pipeline through a UI. It runs on the open-source Gazette streaming framework and uses log-based CDC to replicate change data and history from Postgres to Elasticsearch in milliseconds, supporting both real-time and batch, with continuous, fully managed pipelines that read straight from the Postgres write-ahead log.

Steps:

Create a free account in the Estuary web app here.
Set up change data capture from Postgres.
1. Configure your Postgres instance to meet the requirements.
2. Navigate to Captures in the web app and select Postgres.
3. Add a unique name for your capture.
4. Fill out the capture details with the server address, database username, and password.
5. Click Next. Estuary will find and list all the tables in your Postgres database. You can choose which tables you’d like to capture.
6. Click Save & Publish to begin the capture process.
The data from each table is now stored in a Estuary Collection. A collection is both your real-time and historical data stored as JSON documents in cloud storage. As a collection, the data can now be materialized in real time, transformed, and joined with other collections.
Materialize Postgres data into Elastic.
1. Select Materialize Collections from the dialog box of your successful capture.
2. Select the Elasticsearch connector.
3. Add a unique name for the materialization.
4. Input the Elastic cluster endpoint in the format https://CLUSTER_ID.REGION.CLOUD_PLATFORM.DOMAIN:PORT.
5. Input the username and password
6. Scroll to the Collection Selector. The tables ingested from Postgres will each be mapped to a separate index in Elasticsearch. Provide a name for each.
7. Click Next.
8. Click Save & Publish.

All historical data from Postgres is now backfilled into Elasticsearch documents, and any new change data into the source Postgres database will materialize to Elastic in less than 100ms.

You can optionally exert more control over the field mappings to Elastic with field overrides.

Visit our documentation for more details on building pipelines in Estuary.

Advantages of using Estuary for Postgres to Elasticsearch:

No-code UI-based setup.
Real-time data pipeline with materializations in under 100ms.
Fully managed enterprise-grade system supporting flows of 7GB/s+.
Ability to replicate Postgres data to destinations beyond Elastic without repeating ingestion.
Support for SSH tunneling.
Free tier 10GB/month (2 Connector Instances)
Able to perform in-flight transformations and joins with other data assets before syncing.

Disadvantages:

While the platform is user-friendly, it may take some time for users to become familiar with all its functionalities.

Method 2: PGSync – Open Source Project for Postgres to Elastic

PGSync is an open-source project for the continuous capture of data from Postgres to Elasticsearch.

It is managed by Tolu Aina, and having just finished emailing him, he’s a really nice helpful guy.

Tolu built PGsync to use the logical decoding feature of Postgres 9.4 to capture a stream of change events in the Postgres database.

If you modify the configuration file in Postgres to enable logical decoding, PGSync can consume the change events from the Postgres write-ahead log.

After you define a schema file for the resulting document, your captured change events will be transformed by PGSync’s query builder from relational data into the structured document format that Elasticsearch requires.

Steps:

Prerequisites: Python 3.7+, Postgres 9.6+, Redis 3.1.0, Elasticsearch 6.3.1+, SQLAlchemy 1.3.4+, Superuser privileges

Open the postgresql.conf file in the data directory, usually located in the directory etc/postgresql/[version]/main/.
Set wal_level=logical
Create a replication slot by running

SELECT * FROM pg_create_logical_replication_slot('slot_name', 'plugin');
Install PGSync pip install pgsync.
Create a schema.json that will match the expected document representation in Elasticsearch.
Run as a daemon pgsync --config schema.json -d.

Advantages of using PGSync

Streams data in real time.
Open-source project, so it’s free to use.
Uses logical decoding, so the impact on your production database is minimized.
If all prerequisites are met, it’s a good fit for simple low-latency replication needs

Disadvantages of using PGSync

Does not support in-flight transforms
.
Unable to support zero-downtime migrations as of today.
No formal support… A relatively new open-source project with infrequent commits by mostly one contributor.
Unclear if it can handle complex enterprise loads.

Method 3: Logstash JDBC plugin for Postgres to Elasticsearch

Pre-requisites: Java 8+, Logstash, JDBC

Elastic provides a documented process for using Logstash to sync from a relational database to Elasticsearch. Logstash is an open-source server-side data processing platform.

Note that this process is tested for MySQL, though it should work for other relational databases like Postgres. This integration is event-based, however, it should be noted that it does not capture change data from the write-ahead log via the enablement of logical decoding.

This means your production database could be heavily taxed by this implementation. You can read more about Postgres CDC types here.

Steps:

Gather credentials
1. Navigate to the Kibana menu and then Management->Integrations->View Deployment Details.
2. To authenticate you will use the Elastic API key.
Get Logstash the Postgres JDBC Driver.
1. Install Logstash.
2. Download and unpack the JDBC driver and take note of the driver’s location.
Add timestamps to Postgres (if you do not already have them).
1. For each table you plan to replicate to Elastic, you will need a column to reflect the time it was last modified.
Create a Logstash pipeline with the JDBC input plugin.
1. Create a file called jdbc.conf in <localpath>/logstash-7.12.
  
  Paste the code below into the file to generate the Logstash pipeline, substituting your driver location, credentials, and timestamp column name.

plaintextinput {
  jdbc {
    jdbc_driver_library => "<driverpath>/postgresql-<versionNumber>.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    jdbc_connection_string => "jdbc:postgresql://<postgres-host>:5432/es_db"
    jdbc_user => "<myusername>"
    jdbc_password => "<mypassword>"
    jdbc_paging_enabled => true
    tracking_column => "unix_ts_in_secs"
    use_column_value => true
    tracking_column_type => "numeric"
    schedule => "*/5 * * * * *"
    statement => "SELECT *, EXTRACT(EPOCH FROM modification_time)::bigint AS unix_ts_in_secs FROM es_table WHERE (EXTRACT(EPOCH FROM modification_time)::bigint > :sql_last_value AND modification_time < NOW()) ORDER BY modification_time ASC"
  }
}
filter {
  mutate {
    copy => { "id" => "[@metadata][_id]" }
    remove_field => ["id", "@version", "unix_ts_in_secs"]
  }
}
output {
  elasticsearch {
    index => "rdbms_idx"
    ilm_enabled => false
    cloud_id => "<DeploymentName>:<ID>"
    cloud_auth => "elastic:<Password>"
    ssl => true
    # api_key => "<myAPIid:myAPIkey>"
  }
}

With the configuration above saved, launch Logstash:
bin/logstash -f jcbc.conf

Advantages of using the Logstash JDBC plugin:

Open-source.
Can transform data in Logstash before populating to Elasticsearch.

Disadvantages:

Will negatively impact your production database as it uses timestamp rather than log-based CDC.
There are implementation nuances to avoid creating duplicates.
Users report that Logstash documentation can be outdated and incomplete at times.

Transfer data from Postgres to Elasticsearch

Postgres to Elastic Beyond This Guide…

As with any data pipeline, you can choose to build a custom solution using a streaming framework, such as Kafka and Debezium, or Amazon Kinesis.

If, for example, your use case is a text search of logs, you can consider using a tool like ZomboDB for Postgres. However, this may still not be enough power and functionality for a multi-node filtering system like Elastic.

But note that to replicate a real-time replication in custom code would require extensive work. For simple batch use cases or one-offs, the codebase will be more straightforward but still require operational overhead and management.

If Postgres is only one of several systems you need to index, our guide to the different ways to ingest data into Elasticsearch compares Logstash, the REST and Bulk API, and managed CDC side by side.

Create a free Estuary account and finish building your real-time Postgres to Elasticsearch pipeline within a half hour. Questions? Hit us up on Slack!

Related Articles From Estuary

About the author

Jeffrey RichmanData Engineering & Growth Specialist

Jeffrey is a data engineering professional with over 15 years of experience, helping early-stage data companies scale by combining technical expertise with growth-focused strategies. His writing shares practical insights on data systems and efficient scaling.