Database Engineering

PostgreSQL Foreign Data Wrappers: Query External Data Sources

Use PostgreSQL FDW to query MySQL, S3, MongoDB, and other databases from within PostgreSQL — setup and performance tips

JusDB Team
February 7, 2023
11 min read
133 views

Running queries across multiple databases used to mean writing ETL pipelines, scheduling cron jobs, and maintaining brittle sync logic that breaks the moment a schema changes. PostgreSQL Foreign Data Wrappers (FDWs) let you skip all of that — querying remote MySQL instances, flat CSV files, S3-hosted Parquet files, and even MongoDB collections as if they were ordinary local tables. The SQL interface stays the same, the data stays where it lives, and your application sees a single unified PostgreSQL connection. This post covers the four most production-relevant wrappers, their setup commands, and the performance realities you need to understand before shipping to production.

TL;DR
  • FDWs let PostgreSQL query remote data sources (other Postgres instances, MySQL, CSV files, S3/Parquet) using standard SQL.
  • postgres_fdw supports WHERE/JOIN/aggregate pushdown — use it for cross-Postgres federation.
  • mysql_fdw bridges PostgreSQL and MySQL with read/write support.
  • file_fdw exposes local CSV files as foreign tables with zero configuration.
  • parquet_s3_fdw queries Parquet files on S3 directly from PostgreSQL.
  • All FDWs carry network overhead — plan for it, and push filters to the remote server whenever possible.
  • FDWs are not a replacement for proper replication; treat them as a federation and migration tool.

What Are Foreign Data Wrappers?

Foreign Data Wrappers are PostgreSQL's implementation of the SQL/MED (Management of External Data) standard, introduced in SQL:2003 and available in PostgreSQL since version 9.1. The architecture is straightforward: a FDW is a shared library that translates PostgreSQL's internal scan API into calls against an external data source. From the query planner's perspective, a foreign table looks like any other relation — it has a schema, statistics, and can participate in joins.

The core objects you will always create are:

  • Extension — loads the FDW shared library.
  • Foreign Server — defines the remote endpoint (host, port, database).
  • User Mapping — maps a local PostgreSQL role to credentials on the remote server.
  • Foreign Table — defines the local schema that maps to the remote table or file.

PostgreSQL ships two FDWs in its core distribution: postgres_fdw (for remote PostgreSQL servers) and file_fdw (for local files). Everything else — MySQL, MongoDB, S3, Redis — is provided by third-party extensions.


postgres_fdw: Federating PostgreSQL Instances

postgres_fdw is the most capable wrapper in the ecosystem. It connects to a remote PostgreSQL server over libpq and, critically, supports predicate pushdown, join pushdown, and aggregate pushdown — meaning the remote server does most of the work before sending rows back.

Installation and Setup

sql
-- Step 1: Load the extension on the local server
CREATE EXTENSION postgres_fdw;

-- Step 2: Define the remote server
CREATE SERVER remote_pg
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host 'db2.internal', port '5432', dbname 'analytics');

-- Step 3: Map a local role to a remote user
CREATE USER MAPPING FOR reporting_user
  SERVER remote_pg
  OPTIONS (user 'analytics_reader', password 's3cr3t');

-- Step 4a: Import an entire remote schema at once
IMPORT FOREIGN SCHEMA public
  FROM SERVER remote_pg
  INTO local_analytics_schema;

-- Step 4b: Or define a single foreign table manually
CREATE FOREIGN TABLE orders_remote (
  order_id   bigint,
  customer_id bigint,
  total_cents integer,
  created_at  timestamptz
)
  SERVER remote_pg
  OPTIONS (schema_name 'public', table_name 'orders');

IMPORT FOREIGN SCHEMA is the practical choice for any schema with more than a handful of tables — it introspects the remote catalog and creates all foreign table definitions automatically. Re-run it after remote schema changes to pick up new columns.

Pushdown Behavior

The single biggest performance lever in postgres_fdw is controlling what gets pushed to the remote server versus evaluated locally.

sql
-- This WHERE clause is pushed to the remote server
-- Only matching rows are transferred over the network
SELECT order_id, total_cents
FROM orders_remote
WHERE created_at >= '2025-01-01'
  AND total_cents > 10000;

-- Join pushdown: both tables on the same remote server,
-- the join executes remotely
SELECT o.order_id, c.email
FROM orders_remote o
JOIN customers_remote c ON o.customer_id = c.customer_id
WHERE o.created_at >= '2025-01-01';

-- Aggregate pushdown: GROUP BY and SUM run on the remote server
SELECT DATE_TRUNC('day', created_at) AS day,
       SUM(total_cents) AS daily_revenue
FROM orders_remote
GROUP BY 1
ORDER BY 1;
Tip

Enable use_remote_estimate = true on the server definition so PostgreSQL fetches remote row count statistics when planning queries. Without it, the planner uses a default estimate of 1000 rows and may choose poor join strategies.

sql
ALTER SERVER remote_pg OPTIONS (ADD use_remote_estimate 'true');
Warning

Functions that are not immutable or that depend on local session state (e.g., current_setting(), custom PL/pgSQL functions) will not be pushed down. The planner fetches all candidate rows and evaluates the function locally — potentially transferring millions of rows across the network for a filter that returns ten.


mysql_fdw: Bridging PostgreSQL and MySQL

mysql_fdw, maintained by EnterpriseDB, provides read and write access to MySQL and MariaDB tables. It is the most common FDW used during MySQL-to-PostgreSQL migrations because you can keep the MySQL instance running, query it live from PostgreSQL, and cut over at your own pace.

sql
-- Install the extension (must be compiled against libmysqlclient)
CREATE EXTENSION mysql_fdw;

-- Define the MySQL server
CREATE SERVER mysql_crm
  FOREIGN DATA WRAPPER mysql_fdw
  OPTIONS (host '10.0.1.50', port '3306');

-- Map credentials
CREATE USER MAPPING FOR local_app_user
  SERVER mysql_crm
  OPTIONS (username 'crm_reader', password 'hunter2');

-- Create a foreign table mirroring the MySQL schema
CREATE FOREIGN TABLE contacts_mysql (
  id          int,
  first_name  varchar(100),
  last_name   varchar(100),
  email       varchar(255),
  created_at  datetime
)
  SERVER mysql_crm
  OPTIONS (dbname 'crm', table_name 'contacts');

-- Write back to MySQL (INSERT/UPDATE/DELETE are supported)
INSERT INTO contacts_mysql (id, first_name, last_name, email, created_at)
VALUES (9001, 'Ada', 'Lovelace', 'ada@example.com', NOW());
Warning

MySQL's type system does not map perfectly to PostgreSQL. TINYINT(1) comes across as smallint, not boolean. DATETIME has no timezone, while PostgreSQL's timestamptz does. Always verify type mappings before assuming data integrity in migration scenarios.


file_fdw: Querying CSV Files with SQL

file_fdw ships with PostgreSQL and exposes files readable by the PostgreSQL server process as foreign tables. It is ideal for loading reference data, querying log exports, or giving analysts a SQL interface to CSV dumps without running an import.

sql
CREATE EXTENSION file_fdw;

CREATE SERVER csv_files
  FOREIGN DATA WRAPPER file_fdw;

-- The file must be readable by the postgres OS user
CREATE FOREIGN TABLE sales_export (
  region      text,
  product_sku text,
  quantity    integer,
  revenue     numeric(12,2),
  sale_date   date
)
  SERVER csv_files
  OPTIONS (
    filename '/var/lib/postgresql/exports/sales_2025.csv',
    format   'csv',
    header   'true',
    delimiter ','
  );

-- Query it like any other table
SELECT region, SUM(revenue) AS total_revenue
FROM sales_export
WHERE sale_date >= '2025-01-01'
GROUP BY region
ORDER BY total_revenue DESC;
Note

file_fdw is read-only. The file must reside on the PostgreSQL server's filesystem and must be accessible by the postgres OS user. It cannot read files from object storage directly — use parquet_s3_fdw or aws_s3 extensions for S3.


parquet_s3_fdw: Querying S3-Hosted Parquet Files

parquet_s3_fdw (the parquet-s3-fdw project) lets PostgreSQL query columnar Parquet files stored on Amazon S3, MinIO, or any S3-compatible object store. Because Parquet is columnar, the extension can perform column projection — only transferring the columns your query actually reads.

sql
CREATE EXTENSION parquet_s3_fdw;

CREATE SERVER parquet_s3
  FOREIGN DATA WRAPPER parquet_s3_fdw
  OPTIONS (
    use_minio 'false',   -- set 'true' for MinIO
    region    'us-east-1'
  );

CREATE USER MAPPING FOR analytics_role
  SERVER parquet_s3
  OPTIONS (
    access_key_id     'AKIAIOSFODNN7EXAMPLE',
    secret_access_key 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
  );

CREATE FOREIGN TABLE events_parquet (
  event_id   bigint,
  user_id    bigint,
  event_type text,
  properties jsonb,
  occurred_at timestamptz
)
  SERVER parquet_s3
  OPTIONS (filename 's3://data-lake-bucket/events/2025/*.parquet');

-- Column projection reduces data transfer significantly
SELECT user_id, COUNT(*) AS event_count
FROM events_parquet
WHERE occurred_at >= '2025-06-01'
  AND event_type = 'purchase'
GROUP BY user_id
HAVING COUNT(*) > 5;

Python-Based FDWs with Multicorn

When no compiled FDW exists for your target data source, Multicorn provides a Python API for writing custom wrappers. MongoDB, Redis, Elasticsearch, and REST APIs have all been implemented as Multicorn wrappers. The tradeoff is performance: Python function call overhead is significant compared to a compiled C extension. Use Multicorn for low-volume lookups and reference data, not for high-throughput analytical queries.


FDW Performance: What to Expect

FDWs are not a transparent performance layer. Every query against a foreign table incurs at minimum one network round trip to the remote server. Understanding where the overhead comes from lets you design around it.

Network and Serialization Overhead

Even with pushdown enabled, every result set crosses the network. A query returning 10 million rows from a remote table is slow — not because PostgreSQL is doing extra work, but because 10 million rows of serialized data must travel over the wire. Design FDW queries to return aggregates or narrow result sets.

Index Limitations

PostgreSQL cannot use indexes that exist on the remote server to satisfy local query plans. postgres_fdw pushes WHERE clauses to the remote server, which can then use its own indexes — but only for simple, pushdown-eligible predicates. Wrappers like file_fdw and parquet_s3_fdw have no index concept at all; they always perform full scans (though Parquet row group statistics can skip data blocks).

Warning

Avoid joining a large local table to a foreign table without a selective WHERE clause on the foreign side. PostgreSQL may choose a nested loop join that issues one remote query per local row — a classic N+1 pattern that can bring a database to its knees. Use EXPLAIN (ANALYZE, VERBOSE) to verify the join strategy before running in production.

Connection Pooling

postgres_fdw opens a persistent libpq connection per local backend, per remote server. On a busy application server with many short-lived connections, this multiplies the connection count on the remote server. Place a connection pooler (PgBouncer) in front of the remote server and keep keep_connections enabled on the server definition (the default since PostgreSQL 14).

sql
-- Verify pushdown decisions for a query
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT region, SUM(total_cents)
FROM orders_remote
WHERE created_at >= '2025-01-01'
GROUP BY region;

-- Look for "Remote SQL" in the output to confirm pushdown:
-- Foreign Scan on orders_remote
--   Remote SQL: SELECT region, total_cents FROM public.orders
--               WHERE ((created_at >= '2025-01-01 00:00:00+00'))

Use Cases and Limitations

Where FDWs Shine

  • Data federation — Combining data from a production OLTP database with an analytical warehouse without running an ETL pipeline. Run the join in SQL and let pushdown do the heavy lifting.
  • Migration phases — During a MySQL-to-PostgreSQL migration, keep the old MySQL instance as a foreign server. New code writes to PostgreSQL; legacy reads can still hit MySQL via mysql_fdw until cutover.
  • Read-only replicas — Expose a read-only replica of a remote PostgreSQL instance as a foreign server, giving reporting users a SQL interface without granting direct access to the replica's credentials.
  • Data lake queries — Query S3-hosted Parquet files from your PostgreSQL application without standing up a separate query engine like Athena or Trino.

Known Limitations

  • FDWs do not participate in distributed transactions. A local transaction that writes to both a local table and a write-enabled foreign table has no two-phase commit unless you implement it manually.
  • VACUUM and ANALYZE do not run on foreign tables. Statistics for the query planner must come from use_remote_estimate or be set manually with ALTER FOREIGN TABLE ... ALTER COLUMN ... SET STATISTICS.
  • Row-level security policies on local PostgreSQL do not apply to data returned from foreign servers — enforce access controls at the remote server level.
  • Not all FDWs support RETURNING clauses, ON CONFLICT, or TRUNCATE.

Key Takeaways

Key Takeaways
  • PostgreSQL FDWs expose remote data sources as local SQL tables using the SQL/MED standard — no ETL pipelines required.
  • The setup pattern is always the same: CREATE EXTENSIONCREATE SERVERCREATE USER MAPPINGCREATE FOREIGN TABLE (or IMPORT FOREIGN SCHEMA).
  • postgres_fdw supports WHERE, JOIN, and aggregate pushdown — always verify with EXPLAIN VERBOSE and look for the Remote SQL line.
  • mysql_fdw is write-enabled, making it the go-to choice for live MySQL-to-PostgreSQL migration phases.
  • file_fdw is the fastest path to querying CSV exports with SQL — read-only, no external dependencies.
  • parquet_s3_fdw enables columnar data lake queries directly from PostgreSQL without a separate query engine.
  • Network overhead is real — filter aggressively, return aggregates not raw rows, and monitor remote connection counts.
  • FDWs are a federation and migration tool, not a replacement for streaming replication or a dedicated data warehouse.

Optimize Your PostgreSQL Deployment with JusDB

Setting up FDWs is the straightforward part. The harder challenges — right-sizing connection pools, tuning use_remote_estimate, ensuring pushdown actually fires for your workload, and monitoring cross-server query latency — require deep PostgreSQL expertise and proper observability tooling.

JusDB provides managed PostgreSQL with expert DBA support, covering FDW configuration, performance tuning, and ongoing query optimization. Whether you are federating multiple PostgreSQL instances, migrating off MySQL, or building a data lake query layer, our team has done it at scale.

Start your free trial at jusdb.com and get your PostgreSQL environment production-ready from day one.

Share this article