PostgreSql

Multithreading in PostgreSQL Using dblink and FDW (Reality vs Myth)

First: The Critical Truth (Must Be Clear)

You cannot create true multithreading inside a single PostgreSQL backend, even with dblink or Foreign Data Wrappers (FDW).

What you can do: ✅ Achieve parallel execution across multiple PostgreSQL backends
✅ Simulate multithreading by delegating work to other connections
✅ Run independent queries concurrently using remote sessions

This is process‑level parallelism, not thread‑level multithreading.

1. What dblink and FDW Actually Do

dblink

Opens another PostgreSQL connection
Executes SQL on that connection
Returns results to the caller
Runs in a separate backend process

postgres_fdw

Creates foreign tables mapped to another PostgreSQL server
Uses remote connections
Query planner decides whether to push down work

📌 Key Point
Each dblink or FDW call = another PostgreSQL process, not a thread.

2. Why People Use dblink / FDW for “Multithreading”

Typical goals:

Parallel processing of large datasets
Running independent tasks at the same time
Avoiding blocking in long‑running operations
Batch processing / ETL parallelism

PostgreSQL does not allow THREAD {} blocks, so people use extra connections instead.

3. ❌ What You CANNOT Do (Important)

❌ You cannot:

Run two SQL statements at the same time inside one function
Spawn threads inside PL/pgSQL
Use dblink calls concurrently from the same backend
Force dblink calls to run asynchronously

This will still execute sequentially in one backend:

SQL

SELECT dblink_exec(...);

Show more lines

4. ✅ Correct Way: Parallel Work Using dblink + Multiple Sessions

Architecture Concept

Plain Text

Coordinator Session

├── DBLink Session 1 → Task A

├── DBLink Session 2 → Task B

└── DBLink Session 3 → Task C

Show more lines

Each DBLink session:

Is a separate PostgreSQL backend
Can run concurrently
Uses a different CPU core

5. Example: Parallel Task Execution Using dblink

Step 1: Enable dblink

Plain Text

CREATE EXTENSION dblink;

Show more lines

Step 2: Open Multiple Connections

SQL

SELECT dblink_connect('w1', 'dbname=mydb');

SELECT dblink_connect('w2', 'dbname=mydb');

SELECT dblink_connect('w3', 'dbname=mydb');

Show more lines

Each connection runs in a separate backend process.

Step 3: Dispatch Work

SQL

SELECT dblink_exec('w1', 'CALL process_partition(1)');

SELECT dblink_exec('w2', 'CALL process_partition(2)');

SELECT dblink_exec('w3', 'CALL process_partition(3)');

Show more lines

✅ Work runs in parallel ✅ Uses multiple CPU cores ✅ Each task isolated

Step 4: Disconnect

SQL

SELECT dblink_disconnect('w1');

SELECT dblink_disconnect('w2');

SELECT dblink_disconnect('w3');

Show more lines

6. Using FDW for Parallelism (Cleaner, Safer)

FDW is often better than dblink.

Setup postgres_fdw

SQL

CREATE EXTENSION postgres_fdw;

CREATE SERVER worker1 FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (host 'localhost', dbname 'mydb', port '5432');

CREATE USER MAPPING FOR CURRENT_USER

SERVER worker1 OPTIONS (user 'postgres');

Show more lines

Parallel Reads via FDW

SQL

SELECT count(*) FROM remote_table_part1

UNION ALL

SELECT count(*) FROM remote_table_part2;

Show more lines

If:

use_remote_estimate = on
parallel_append = on

PostgreSQL can plan parallel execution across FDW connections.

📌 FDW parallelism is planner‑dependent, not guaranteed.

7. ✅ Best Pattern: Leader / Worker Model

This is how production systems do it.

Leader (Coordinator)

Assigns tasks
Uses dblink or FDW connections

Workers

Run heavy logic
Independent transactions
Fail independently

Example use cases:

ETL pipelines
Parallel data validation
Background batch jobs
Sharded processing

8. Performance and Safety Considerations

✅ Pros

True CPU parallelism
Failure isolation
Scales well on multi‑core systems

⚠️ Cons

High connection cost
More memory usage
Needs connection pooling
Harder error handling

9. When NOT to Use dblink/FDW for Parallelism

❌ For OLTP transaction processing
❌ For row‑by‑row loops
❌ For small queries
❌ Instead of native parallel queries

Use PostgreSQL parallel queries first, then dblink only when needed.

10. Recommended Alternatives (Often Better)

Goal	Better Option
CPU parallelism	Native parallel query
Task concurrency	App‑level threads
Batch processing	Job queue + workers
Scheduling	pg_cron
Background jobs	Background workers
High throughput	Partitioning

Final Verdict (Very Important)

dblink and FDW do NOT create multithreading.
They create controlled, parallel multiprocessing using additional sessions.

This is: ✅ Safe
✅ Supported
✅ Production‑proven

But it must be designed carefully.

Simple One‑Line Summary

Multithreading in PostgreSQL using dblink/FDW = parallel work via multiple backend processes, not real threads.

PostgreSql

Thursday, 14 May 2026

No comments:

Post a Comment