Thursday, 14 May 2026

Multithreading in PostgreSQL Using dblink and FDW (Reality vs Myth)

First: The Critical Truth (Must Be Clear)

You cannot create true multithreading inside a single PostgreSQL backend, even with dblink or Foreign Data Wrappers (FDW).

What you can do: Achieve parallel execution across multiple PostgreSQL backends
Simulate multithreading by delegating work to other connections
Run independent queries concurrently using remote sessions

This is process‑level parallelism, not thread‑level multithreading.


1. What dblink and FDW Actually Do

dblink

  • Opens another PostgreSQL connection
  • Executes SQL on that connection
  • Returns results to the caller
  • Runs in a separate backend process

postgres_fdw

  • Creates foreign tables mapped to another PostgreSQL server
  • Uses remote connections
  • Query planner decides whether to push down work

📌 Key Point
Each dblink or FDW call = another PostgreSQL process, not a thread.


2. Why People Use dblink / FDW for “Multithreading”

Typical goals:

  • Parallel processing of large datasets
  • Running independent tasks at the same time
  • Avoiding blocking in long‑running operations
  • Batch processing / ETL parallelism

PostgreSQL does not allow THREAD {} blocks, so people use extra connections instead.


3. What You CANNOT Do (Important)

You cannot:

  • Run two SQL statements at the same time inside one function
  • Spawn threads inside PL/pgSQL
  • Use dblink calls concurrently from the same backend
  • Force dblink calls to run asynchronously

This will still execute sequentially in one backend:

SQL

SELECT dblink_exec(...);

SELECT dblink_exec(...);

Show more lines


4. Correct Way: Parallel Work Using dblink + Multiple Sessions

Architecture Concept

Plain Text

Coordinator Session

── DBLink Session 1 Task A

── DBLink Session 2 Task B

└── DBLink Session 3 → Task C

Show more lines

Each DBLink session:

  • Is a separate PostgreSQL backend
  • Can run concurrently
  • Uses a different CPU core

5. Example: Parallel Task Execution Using dblink

Step 1: Enable dblink

Plain Text

CREATE EXTENSION dblink;

Show more lines


Step 2: Open Multiple Connections

SQL

SELECT dblink_connect('w1', 'dbname=mydb');

SELECT dblink_connect('w2', 'dbname=mydb');

SELECT dblink_connect('w3', 'dbname=mydb');

Show more lines

Each connection runs in a separate backend process.


Step 3: Dispatch Work

SQL

SELECT dblink_exec('w1', 'CALL process_partition(1)');

SELECT dblink_exec('w2', 'CALL process_partition(2)');

SELECT dblink_exec('w3', 'CALL process_partition(3)');

Show more lines

Work runs in parallel Uses multiple CPU cores Each task isolated


Step 4: Disconnect

SQL

SELECT dblink_disconnect('w1');

SELECT dblink_disconnect('w2');

SELECT dblink_disconnect('w3');

Show more lines


6. Using FDW for Parallelism (Cleaner, Safer)

FDW is often better than dblink.

Setup postgres_fdw

SQL

CREATE EXTENSION postgres_fdw;

 

CREATE SERVER worker1 FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (host 'localhost', dbname 'mydb', port '5432');

 

CREATE USER MAPPING FOR CURRENT_USER

SERVER worker1 OPTIONS (user 'postgres');

Show more lines


Parallel Reads via FDW

SQL

SELECT count(*) FROM remote_table_part1

UNION ALL

SELECT count(*) FROM remote_table_part2;

Show more lines

If:

  • use_remote_estimate = on
  • parallel_append = on

PostgreSQL can plan parallel execution across FDW connections.

📌 FDW parallelism is planner‑dependent, not guaranteed.


7. Best Pattern: Leader / Worker Model

This is how production systems do it.

Leader (Coordinator)

  • Assigns tasks
  • Uses dblink or FDW connections

Workers

  • Run heavy logic
  • Independent transactions
  • Fail independently

Example use cases:

  • ETL pipelines
  • Parallel data validation
  • Background batch jobs
  • Sharded processing

8. Performance and Safety Considerations

Pros

  • True CPU parallelism
  • Failure isolation
  • Scales well on multi‑core systems

⚠️ Cons

  • High connection cost
  • More memory usage
  • Needs connection pooling
  • Harder error handling

9. When NOT to Use dblink/FDW for Parallelism

For OLTP transaction processing
For row‑by‑row loops
For small queries
Instead of native parallel queries

Use PostgreSQL parallel queries first, then dblink only when needed.


10. Recommended Alternatives (Often Better)

Goal

Better Option

CPU parallelism

Native parallel query

Task concurrency

App‑level threads

Batch processing

Job queue + workers

Scheduling

pg_cron

Background jobs

Background workers

High throughput

Partitioning


Final Verdict (Very Important)

dblink and FDW do NOT create multithreading.
They create controlled, parallel multiprocessing using additional sessions.

This is: Safe
Supported
Production‑proven

But it must be designed carefully.


Simple One‑Line Summary

Multithreading in PostgreSQL using dblink/FDW = parallel work via multiple backend processes, not real threads.

 

No comments:

Post a Comment

Multithreading in PostgreSQL Using dblink and FDW (Reality vs Myth) First: The Critical Truth (Must Be Clear) You cannot create true mul...