This post was co-written with Estuary CTO Johnny Graettinger.

Estuary is excited to announce our open-source connector repository. These connectors are dual-licensed under Apache 2.0 and MIT and will extend Airbyte’s community connector specification

We believe that an ecosystem of open-source connectors will be critical to the future of data integration. We’re committed to contributing to this space while offering a unique addition: enriching the existing spec so it can be used in low-latency, real-time pipelines at any scale. You can expect the best quality connectors from Estuary, mostly for high-scale technology systems.

Let’s break all that down.

What the heck is a “connector”?

We’ve witnessed a Cambrian explosion of interesting databases, services, and SaaS for processing, indexing, and querying data — systems that specialize in a wide variety of useful niches, such as large-scale analytics, time-series data, document-oriented databases, graph processing, global-scale OLTP, low-latency caching, publish/subscribe, web-to-backend synchronization, and more.

They all bring a related challenge: how is the user supposed to move their data into and out of these systems? It’s a Tower of Babel: most use bespoke protocols and clients, and have wildly different operational characteristics. Surely the user need not account for each and every system through custom application development?

Enter connectors: a standardized and pluggable component — speaking a common protocol — for interfacing with a system to pull or push data. Connectors encapsulate all of the messy details for working with a given system.

Why we believe in open source connectors

Connectors aren’t by any means a new concept. Many people have used this architecture before to create closed-sourced products or proprietary systems. But we think that the real value of connectors is unlocked by the use of an open source protocol. This allows anyone to write their own connectors, and allows any system to use them for integrations. With the thousands of data endpoints available, and orders of magnitude more unique use-cases, open-source is the best way to realistically scale and adapt the connector ecosystem.

To illustrate this, let’s look at a common scenario: a vendor that’s built a data pipeline for a specific customer using its own closed-source connectors. It’s up to the vendor to not only create the connectors, but also to maintain them and adapt them for this customer’s corner cases and those of every other customer. Each new connector the vendor creates becomes a huge commitment of engineering resources for its limited staff. 

When the customer wants a new connector made or a specific use-case addressed, they have to go through the vendor — even if it’s a small issue that the customer’s engineers could fix. They’re locked out of their own pipeline, and the vendor is busy. Often, the waiting period drags on, and in the meantime the customer is forced to build a separate, non-connector work-around. This creates labor and introduces complexity to their pipeline.

Contrast this to an open-source ecosystem, like the one we’re part of:

  • Estuary builds and maintains our own connectors, but users can fully adapt them to meet their needs. If an engineer has a corner case that they know how to address, they’re free to do so.
  • If you don’t see the connector you need, there are free resources and guidelines available to help you build your own. 
  • The community has a wide variety of use-cases, so the ecosystem of connectors organically evolves to meet more people’s needs.

This yields two very important though seemingly contradictory results: a more active and cooperative community, and increased autonomy for each user.

At this point, you might be thinking about how connectors can be challenging to build, and wondering if open-sourcing them is realistic. This is a valid concern, and it’s certainly true that some users would rather pay for connectors than spend time building them. But to end the conversation there would be to undersell the abilities of the open-source community. 

Hundreds of popular APIs depend on client libraries to be useful at scale, and the open-source community has proven it can maintain those. Developers understand the strength of open source: it allows us to benefit from each other’s work on an otherwise impossible scale. Contributions vary based on each person’s desires and ability, but everyone gets more out than they put in. As long as the integrations are standardized and follow protocols, there’s really no reason not to go open-source.

Estuary engineers will continue to work alongside the open-source community to create as many connectors as we can. Even if you never build a new connector from scratch, you’ll have the freedom to make relatively quick, easy updates to customize connectors for your use case. 

In addition to our connector efforts, Estuary is actively developing our product, Flow. Flow is a powerful, flexible, and fast service for composing connectors and transformations into complete, continuous data pipelines. You can deploy Flow yourself in-house, or leverage our affordable managed service.

Why we’re embracing an existing spec

A successful open-source connector ecosystem needs a robust specification: one that allows the maximum amount of flexibility and extensibility while still maintaining quality control protocols. 

Instead of reinventing the wheel, we’re building our open-source connectors using an innovative open-source spec that has been gaining traction in recent months: the community spec created by Airbyte. 

This spec brings several major advantages, most of which are possible because it’s built on top of Docker.

  • Connectors can be written in any language and run on any machine. 
  • Each connector tells the machine running it about itself, allowing backward compatibility as the connectors and the spec both evolve.
  • Docker registries allow you to pull connectors (in the form of Docker images) from different sources at will, without the centralized control of a single company. 

As we progress, we are committed to ensuring that Flow can run Airbyte’s connectors. We’re also committed to making sure the connectors we build work on the Airbyte runtime or any other runtime that supports the spec. 

We have active lines of communication open with Airbyte to make all this happen. Ultimately, our goal is to be part of a huge open-source community built on compatible work happening across companies. Compatibility is key to everyone’s success; creating too many open-source frameworks for the same thing diminishes the potential power of open-source.

How Estuary is advancing OSS connectors

As we build Flow, we’re also building a backward-compatible extension of the community spec Airbyte created. The flexibility and decentralization of that spec — and of Docker — make this possible.

We’ve introduced and implemented proposals that make connectors suitable for low-latency use cases, where connectors update in milliseconds rather than running on a periodic schedule. We’ve also proposed features for parallelism, allowing connectors to tackle high-scale data sources.

Our connector library is just getting started. We currently offer connectors for major endpoints including Snowflake, Kafka, Amazon S3, Amazon Kinesis, and PostgreSQL, and many more are in progress. We are focusing on high-scale technologies rather than SaaS products.

We hope you’ll give our connectors a try, and look forward to future contributions as the repository grows. 

And if you’re in need of a runtime, give Flow a try. Flow provides an easy, containerized way for engineers to build extremely low-latency data pipelines, and its capabilities are rapidly growing. If you’re interested in our managed service offerings, contact us.

Comments

  1. Pingback: Connector stories: Apache Kakfa

Leave a Comment

Your email address will not be published. Required fields are marked *