PostgreSQL Architecture

PostgreSQL Architecture & How SELECT, INSERT, UPDATE, DELETE Work Internally | Bright DBA
PostgreSQL Architecture & Internals
PostgreSQL Internals · Deep Dive

PostgreSQL Architecture &
How Queries Work Inside

A complete technical guide to PostgreSQL's process, memory, and storage architecture — and a step-by-step explanation of exactly what happens when you run SELECT, INSERT, UPDATE, or DELETE. With Oracle comparisons throughout.

Architecture MVCC WAL Internals Shared Buffers Query Execution Process Model Storage Layout Oracle Comparison
01
PostgreSQL Architecture Overview
Three-layer architecture: Client · Server Processes · Storage

PostgreSQL follows a client-server, multi-process architecture. Unlike MySQL's multi-threaded model or Oracle's mix of threads and processes, PostgreSQL spawns a dedicated OS process for each client connection. The architecture has three distinct layers: the client application layer, the server process layer, and the storage layer.

PostgreSQL Full Architecture Stack
🖥️ Client Layer — Applications, psql, JDBC/ODBC, PgBouncer
ToolpsqlCLI client
DriverJDBC / ODBCApp connection
PoolerPgBouncerConnection pool
ProtocolTCP Port 5432Wire protocol
⚙️ Process Layer — Postmaster + Backend Processes + Background Daemons
SupervisorPostmasterAccepts connections
Spawns backends
Manages all processes
Per ConnectionBackend #1Parser · Planner
Executor · Result
Per ConnectionBackend #2Parser · Planner
Executor · Result
Authpg_hba.confClient authentication
before backend
is spawned
DaemonAutovacuumClean dead tuples
DaemonWAL WriterFlush WAL to disk
DaemonCheckpointerFlush dirty pages
DaemonBgwriterPre-write dirty pages
DaemonArchiverCopy WAL to archive
DaemonStats Collectorpg_stat_* views
🧠 Memory Layer — Shared Memory + Per-Process Memory
Shared (all processes)Shared Buffers8KB pages
≈ Oracle Buffer Cache
Shared (all processes)WAL BuffersWAL before write
≈ Oracle Redo Buffer
Shared (all processes)Lock TableAll lock states
+ other shared state
Per-Processwork_memSorts, joins
hashes
Per-ProcessLocal buffersTemp tables
per session
💾 Storage Layer — $PGDATA Directory on Disk
Data Filesbase/Table & index
heap files (8KB pages)
WAL Logspg_wal/Redo log files
16MB segments
Systemglobal/pg_database
pg_authid etc.
Configpostgresql.conf
pg_hba.conf
Settings &
auth rules
Catalogpg_catalogSystem tables
pg_class, pg_attribute...
💡
Oracle DBA Note
Postmaster ≈ Oracle Listener + PMON. Each backend process ≈ Oracle dedicated server process (dedicated server mode). Shared Buffers ≈ Oracle SGA Buffer Cache. WAL Buffers ≈ Oracle Redo Log Buffer. work_mem ≈ Oracle PGA sort area. background processes (WAL writer, checkpointer, bgwriter) ≈ Oracle LGWR, CKPT, DBWn.
02
Process Architecture
Postmaster · Backend Processes · Background Workers

When PostgreSQL starts, the Postmaster process launches and listens on port 5432. For every new client connection, Postmaster calls fork() to create a dedicated Backend process. That backend handles the entire session lifecycle — parsing queries, planning, executing, and returning results — all within its own private process address space.

🏠
Postmaster (postgres)
The parent of all PostgreSQL processes. Authenticates connections via pg_hba.conf, then forks a backend for each client. Restarts crashed backends. PID 1 of the cluster.
👤
Backend Process
One per client connection. Handles parsing, planning, and execution of SQL statements. Has access to shared memory but its own stack. Dies when client disconnects.
✍️
WAL Writer
Periodically flushes WAL buffers to pg_wal on disk. Ensures WAL is durable even if no transaction commits. Runs every wal_writer_delay (default 200ms).
Checkpointer
Writes all dirty pages from shared buffers to data files at checkpoint intervals. Reduces crash recovery time. Triggered by checkpoint_timeout or max_wal_size.
🧹
Autovacuum Launcher
Monitors tables for dead tuple accumulation. Spawns autovacuum worker processes to VACUUM/ANALYZE tables that cross the configured thresholds.
📝
Background Writer
Proactively writes dirty buffers to disk between checkpoints, reducing I/O spikes at checkpoint time. Works alongside the checkpointer process.
📦
WAL Archiver
When archive_mode = on, copies completed WAL segments to the archive location using archive_command. Enables PITR and remote standby servers.
📊
Stats Collector
Gathers statistics used by pg_stat_activity, pg_stat_user_tables, pg_stat_statements etc. Writes stats to temporary files in pg_stat_tmp/.
03
Memory Architecture
Shared Memory (all processes) + Local Memory (per backend)
🔵 Shared Memory — All Processes
Shared Buffers The main data cache. 8KB pages from data files. Default 128MB, recommend 25% of RAM. All backends share this pool. ≈ Oracle SGA Buffer Cache.
WAL Buffers In-memory buffer for WAL records before they are flushed to pg_wal on disk. Default wal_buffers = -1 (auto). ≈ Oracle Redo Log Buffer.
Lock Space Shared hash table of all active locks. Used by all processes to detect conflicts. Part of shared memory segment.
Proc Array Shared array of all running backend PIDs, their XIDs, and snapshot info. Used for MVCC visibility checks.
🟠 Local Memory — Per Backend Process
work_mem Memory for sort operations, hash joins, bitmap scans. Per-sort, not per-query. A query with 5 sorts uses up to 5 × work_mem. Default 4MB. ≈ Oracle PGA sort area.
maintenance_work_mem Memory for VACUUM, CREATE INDEX, ALTER TABLE. Default 64MB. Larger values speed up index builds significantly.
temp_buffers Cache for temporary table data. Per-session. Default 8MB. Only allocated when session uses temp tables.
Catalog Cache Per-backend cache of system catalog lookups (table definitions, index info). Avoids repeated catalog queries for the same objects.
💡
Key Insight
PostgreSQL has no Oracle-equivalent SGA/PGA distinction as a hard boundary. Instead, shared memory (shared_buffers + WAL buffers + locks) is the shared pool, and each process has private memory. The effective_cache_size parameter is NOT actual memory — it's a hint to the query planner about how much OS file cache exists.
04
Storage Architecture
$PGDATA directory · Pages · Heap Files · TOAST · WAL

Every piece of data in PostgreSQL is ultimately stored as 8KB pages in heap files. A table is a collection of heap pages. An index is a separate collection of index pages. The storage is entirely on the filesystem — there is no raw device mode.

$PGDATA Directory Structure
Database filesbase/One subdirectory per database OID. Heap files named by OID (e.g., 16384, 16385...)
Redo logspg_wal/WAL segment files (16MB each). Named by LSN. archive_status/ subdir tracks archiving.
Catalogglobal/Cluster-wide tables: pg_database, pg_tablespace, pg_auth. Shared across databases.
Tablespacepg_tblspc/Symlinks to tablespace locations. Allows data on different disk volumes.
Configpostgresql.conf
pg_hba.conf
pg_ident.conf
Configuration & auth files
Heap Page (8KB) Internal Layout
Byte Range
Content
0 – 23
PageHeaderData — LSN, checksum, flags, free space pointers
24 – …
ItemId Array — (offset, length) pairs pointing to each tuple ↓
… free space …
Free Space (grows toward each other)
… end ↑
Tuples (rows) — grow from bottom up. Each has HeapTupleHeader + data
Special
Special area (used by indexes for B-tree right link etc.)

Each tuple (row) header contains: t_xmin (inserting XID), t_xmax (deleting XID), t_ctid (current tuple location) — the foundation of MVCC.

05
How SELECT Works
Parse → Analyze → Rewrite → Plan → Execute → Return

A SELECT statement passes through 5 stages inside the backend process before results reach the client. No WAL is written for a pure SELECT. Data is served from shared buffers if cached, or read from disk if not.

SQL Text
Client sends query
Parser
Syntax check
→ Parse Tree
Analyzer
/ Rewriter
Semantics
View expansion
Planner
/ Optimizer
Cost-based plan
pg_statistic
Executor
Walk plan tree
Fetch pages
Results
Wire protocol
→ Client
1
Parser — SQL Text → Parse Tree
The backend receives the SQL string and runs it through gram.y (yacc grammar). It performs syntactic analysis only — checking for correct SQL syntax, not whether tables exist. Output: a raw parse tree (List of Node structures).
SELECT * FROM orders WHERE status = 'ACTIVE' → RawStmt → SelectStmt
2
Analyzer + Rewriter — Semantic Validation + View Expansion
The analyzer looks up table and column names in pg_catalog, resolves types, checks permissions, and builds a Query tree. The rewriter then expands any VIEWs (replacing the view reference with its defining SQL) and applies pg_rewrite rules.
3
Planner / Optimizer — Generate the Cheapest Execution Plan
The planner is the most complex stage. It generates candidate execution plans (sequential scan, index scan, bitmap scan, nested loop, hash join, merge join...) and estimates the cost of each using statistics from pg_statistic. It selects the plan with the lowest total cost. Output: a Plan tree.
EXPLAIN SELECT * FROM orders WHERE status = 'ACTIVE' → shows chosen plan
4
Executor — Walk the Plan Tree, Fetch Data
The executor traverses the plan tree node-by-node. For each data page needed:

1. Check shared_buffers — is the 8KB page already cached? If yes → read from RAM (buffer hit).
2. Cache miss — find a free buffer frame, read the 8KB page from disk into shared_buffers, then read from there.
3. MVCC visibility check — for each tuple (row) on the page, check if it is visible to the current transaction using t_xmin/t_xmax (see Section 9).
4. Filter — apply WHERE clause predicates.
5. Project — extract only the requested columns.
Results are streamed back to the client via the wire protocol (not buffered all at once).
5
No WAL Written — SELECT is Read-Only
A plain SELECT never writes WAL. No data files are modified. The only side effect is that shared_buffers may be populated with pages that weren't there before (they stay cached for future queries). Lock: typically an AccessShareLock on the relation — compatible with everything except ACCESS EXCLUSIVE (e.g., DROP TABLE).
💡
Oracle DBA Note
PostgreSQL SELECT ≈ Oracle SELECT. Both check their buffer caches first (Oracle: Buffer Cache; PostgreSQL: shared_buffers). Both use MVCC to ensure consistent reads without locking rows. Oracle uses UNDO tablespace for old row versions; PostgreSQL keeps them in the heap itself. Oracle's SQL Trace / AUTOTRACE ≈ PostgreSQL's EXPLAIN ANALYZE.
06
How INSERT Works
Parse → Plan → Execute → WAL → Shared Buffers → Commit

An INSERT must write both the WAL record and the tuple into a shared buffer page. The page is written to disk lazily (by bgwriter or checkpointer) — not immediately on INSERT. The WAL record however must be flushed to disk before the COMMIT returns (when synchronous_commit = on).

INSERT SQL
Client sends
Parse
+ Plan
Same as SELECT
stages 1–3
Get XID
Assign transaction
ID (t_xmin)
Find Page
FSM → free
space in page
Write Tuple
Insert into
shared buffer
Write WAL
WAL record to
WAL buffers
COMMIT
Flush WAL to disk
Mark visible
1
Begin Transaction — Assign XID
When the first DML statement executes, PostgreSQL assigns a Transaction ID (XID) — a 32-bit monotonically increasing integer. This XID becomes the t_xmin value stamped into every new tuple this transaction inserts. The new row will only be visible to other transactions after this XID commits.
-- Internal: XID = 1001 assigned. New tuple will have t_xmin = 1001, t_xmax = 0
2
Free Space Map (FSM) — Find Target Page
The executor consults the Free Space Map (FSM) — a compact data structure (stored in the _fsm fork of each relation) that tracks how much free space each heap page has. It finds a page with enough free space to fit the new tuple. If no page has enough space, a new page is allocated (extends the file).
3
Acquire Buffer Page — Write Tuple
The executor loads the target 8KB heap page into shared_buffers (if not already there). It then writes the new HeapTuple into the page — stamping:
t_xmin = current XID (transaction that inserted it)
t_xmax = 0 (not deleted yet)
t_ctid = (page, offset) (physical location of this tuple)
The page is now marked dirty in shared_buffers. It is NOT written to disk yet.
New tuple: t_xmin=1001 | t_xmax=0 | data: ('Alice', 'DBA', 50000)
4
Write WAL Record — Into WAL Buffers
A WAL record describing the INSERT is written into the WAL buffers (in shared memory). The WAL record contains enough information to redo the insert during crash recovery. On the first modification after a checkpoint, a Full Page Image (FPI) of the entire 8KB page is also included to protect against torn page writes.
5
COMMIT — Flush WAL to Disk, Make Visible
On COMMIT, PostgreSQL writes a COMMIT WAL record and calls fsync() (or equivalent) to ensure all WAL up to this LSN is durably written to disk in pg_wal. Only after this disk write returns successfully does COMMIT return success to the client. The page in shared_buffers remains dirty — it will be written to the actual data file later by bgwriter or checkpointer.
-- After COMMIT: WAL on disk. Tuple visible to new transactions. Data page still in RAM (dirty).
6
Index Update (if indexed columns)
For each index on the table, a corresponding index entry must be inserted pointing to the new tuple's ctid (physical location). Each index entry also has its own WAL record. HOT (Heap-Only Tuple) optimization: if no indexed column changed and the new tuple fits on the same page, PostgreSQL can skip the index insert and use a ctid chain instead.
Performance Insight
INSERT performance bottleneck is almost always WAL fsync latency. Setting synchronous_commit = off eliminates the fsync wait, achieving 3–10× higher INSERT throughput with only a tiny (600µs) durability window. Use for non-critical bulk inserts. For critical data, keep synchronous_commit = on.
💡
Oracle DBA Note
PostgreSQL INSERT ≈ Oracle INSERT in flow: both write redo (WAL≈redo log) and update the buffer cache. Key difference: Oracle also writes UNDO data (to reconstruct the old row for other readers). PostgreSQL does NOT need UNDO because old row versions stay in the heap — they just become invisible to new transactions once t_xmax is set.
07
How UPDATE Works — The MVCC Magic
UPDATE = DELETE old + INSERT new. Old version kept for concurrent readers.

This is one of the most important things to understand about PostgreSQL: UPDATE does not modify the row in place. Instead, it creates a brand new row version and marks the old row as "dead". This is the foundation of MVCC — multiple transactions can read different versions of the same row concurrently without blocking each other.

UPDATE Step-by-Step: what happens to row versions in the heap

BEFORE UPDATE (salary from 50000 → 65000):

Row
t_xmin
t_xmax
t_ctid
name
salary
Visible to
Row v1
1001 (committed)
0 (not deleted)
(5, 1)
Alice
50000
ALL transactions

DURING UPDATE (XID 2005 running: UPDATE emp SET salary=65000 WHERE name='Alice'):

Row
t_xmin
t_xmax
t_ctid
name
salary
Visible to
Row v1
1001 (committed)
2005 (in-progress)
(5, 2) →
Alice
50000
Txns started before 2005
Row v2
2005 (in-progress)
0
(5, 2)
Alice
65000
Only XID 2005 itself

AFTER COMMIT (XID 2005 committed):

Row
t_xmin
t_xmax
t_ctid
name
salary
Visible to
Row v1 (dead)
1001
2005 (committed)
(5, 2) →
Alice
50000
Dead — invisible to all new txns
Row v2 (live)
2005 (committed)
0
(5, 2)
Alice
65000
ALL new transactions

⚠ Row v1 (salary=50000) remains on the page as a dead tuple. It will be cleaned up by VACUUM. This is why UPDATE-heavy tables need regular vacuuming.

1
Find the Old Row — Take Row-Level Lock
The executor locates the target row via an index or sequential scan. It checks that the row is visible and updatable (not already being updated by another transaction). It acquires an ExclusiveLock on the tuple (stored in t_xmax as the current XID) to prevent concurrent updates.
2
Mark Old Row Dead — Set t_xmax
PostgreSQL stamps the old row's t_xmax = current XID. This marks it as "deleted by this transaction". The old row is still physically present in the page — just logically invisible to transactions that start after this XID commits.
Old row: t_xmin=1001 | t_xmax=2005 | salary=50000 ← marked dead
3
Insert New Row Version — Set t_xmin
A completely new HeapTuple is inserted (same as INSERT): t_xmin = current XID, t_xmax = 0. It's stamped with the new column values. The old row's t_ctid is updated to point to the new row's location (ctid chain — allows readers of the old version to find the new one).
New row: t_xmin=2005 | t_xmax=0 | salary=65000 ← new live row
4
WAL Records Written for Both Operations
Two WAL records are written: one for "mark old row dead" (UPDATE_OLD) and one for "insert new row" (UPDATE_NEW). If this is the first page modification after a checkpoint, a Full Page Image of the affected page is also included in WAL.
5
Index Update — Unless HOT Optimization Applies
Every index on the table must have its entry updated to point from the old tuple's ctid to the new tuple's ctid. HOT Optimization (Heap-Only Tuple): if the updated columns are NOT part of any index AND the new row fits on the same page as the old one, PostgreSQL skips the index update entirely — dramatically reducing write amplification for frequently-updated rows with non-indexed column changes.
Write Amplification Warning
Every UPDATE writes 2 tuples (old dead + new live) + potentially 2 index entries + WAL for all of the above. On a table with many indexes, a single-column UPDATE can write 5–10× more data than the actual change. Use partial indexes, set appropriate fillfactor (e.g., 80) to leave room for HOT updates, and run VACUUM regularly to reclaim dead tuple space.
08
How DELETE Works
Logical deletion only — physical removal happens at VACUUM time

DELETE in PostgreSQL is purely logical. No data is physically removed from the page when you DELETE. The row is simply marked as deleted by setting t_xmax. The physical space is reclaimed only when VACUUM runs.

1
Locate the Row — Visibility Check
The executor finds the target row(s) via index scan or sequential scan. For each row, it confirms it is visible to the current transaction (t_xmin committed, t_xmax = 0 or t_xmax not yet committed).
2
Mark as Dead — Set t_xmax = Current XID
The row is marked deleted by setting t_xmax = current XID. No new row is inserted. The row data (column values) is still physically on the page — it just becomes invisible to transactions starting after this XID commits.
Before DELETE: t_xmin=1001 | t_xmax=0 | data: ('Alice', 50000) After DELETE: t_xmin=1001 | t_xmax=3007 | data: ('Alice', 50000) ← dead tuple
3
WAL Record Written for the Deletion
A WAL DELETE record is written, describing which page and which offset was marked dead. Much smaller than an INSERT WAL record since we're only recording the t_xmax change (unless a Full Page Image is required).
4
Index Entries Are NOT Immediately Removed
The index entries still point to the now-dead tuple. When a query uses the index and follows the pointer to the dead tuple, it performs a visibility check and skips it. Index cleanup happens during VACUUM (VACUUM cleans both the heap pages and the corresponding index entries).
5
VACUUM — Physical Space Reclamation
VACUUM scans heap pages looking for dead tuples (t_xmax set and that XID committed). For each dead tuple that is no longer needed by any active transaction, VACUUM:
• Clears the tuple slot (marks it available for reuse)
• Updates the Free Space Map (FSM) so future inserts can use that space
• Cleans corresponding index entries (index vacuum)
Note: VACUUM does NOT shrink the file — it just marks space as reusable. Only VACUUM FULL or pg_repack physically returns space to the OS.
SELECT n_dead_tup, n_live_tup FROM pg_stat_user_tables WHERE relname = 'emp';
OperationOracle BehaviorPostgreSQL Behavior
DELETERow marked in data block. Old version moved to UNDO segment.t_xmax set in heap tuple. Old data stays in the page as dead tuple.
Rollback readsReaders reconstruct old version from UNDO tablespace.Old tuple version already in heap — readers use t_xmin/t_xmax to find correct version.
Space reclamationAutomatic via UNDO retention period.Manual/auto via VACUUM. Must run regularly to prevent bloat.
TRUNCATEDrops and recreates the segment. Instant. Needs commits.Same concept. Drops and recreates relation files. Instant. Generates WAL.
09
MVCC — Multi-Version Concurrency Control
Transaction snapshots · Visibility rules · Isolation levels

MVCC is the mechanism that allows concurrent readers and writers to never block each other. Each transaction sees a consistent snapshot of the database as of its start time (or statement start time, depending on isolation level). The visibility of a tuple is determined entirely by its t_xmin and t_xmax values.

MVCC Visibility Rules — How PostgreSQL decides if a tuple is visible

TUPLE IS VISIBLE when:

  • t_xmin is committed AND is older than my snapshot
  • t_xmax is 0 (not deleted) OR t_xmax is my own XID OR t_xmax is not yet committed

TUPLE IS INVISIBLE when:

  • t_xmin is not yet committed (inserting txn still running)
  • t_xmin was rolled back (aborted)
  • t_xmax is committed and older than my snapshot (deleted before I started)
Transaction Snapshot Timeline — READ COMMITTED vs REPEATABLE READ
T1 START
Snapshot taken
(READ COMMITTED: per-stmt)
T2 INSERT
Other txn inserts
a row
T2 COMMIT
Row now committed
t_xmin = 2005
T1 SELECT
READ COMMITTED:
sees T2's row ✓
REPEATABLE READ:
does NOT see it ✗
T1 COMMIT
T1 ends
Isolation LevelSnapshot Taken AtDirty ReadNon-Repeatable ReadPhantom Read
READ COMMITTED (default)Start of each statement❌ No✅ Yes — sees committed changes✅ Yes
REPEATABLE READStart of transaction❌ No❌ No❌ No (PostgreSQL prevents)
SERIALIZABLEStart of transaction❌ No❌ No❌ No (SSI algorithm)
💡
Oracle DBA Note
Oracle defaults to READ COMMITTED isolation (same as PostgreSQL default). Oracle's "consistent read" using UNDO data ≈ PostgreSQL's MVCC using old tuple versions in the heap. Oracle's SCN (System Change Number) ≈ PostgreSQL's transaction ID (XID) for snapshot comparison. Oracle doesn't support REPEATABLE READ as a distinct level — PostgreSQL does.
10
WAL — Write-Ahead Logging
Durability · Crash Recovery · Replication · PITR

WAL is the backbone of PostgreSQL's durability and replication story. The principle is simple: write the log record before changing the data. This guarantees that even if PostgreSQL crashes mid-write, no committed data is lost — recovery replays the WAL to reconstruct any missing changes.

SQL DML
INSERT/UPDATE
DELETE
WAL Record
Written
Into WAL buffers
(shared memory)
Dirty Page
in Buffers
8KB page changed
in shared_buffers
COMMIT
WAL flushed
to pg_wal disk
COMMIT OK
Client receives
success
Later:
Checkpoint
Dirty pages flushed
to data files
🔄
Crash Recovery
On startup after a crash, PostgreSQL replays WAL records from the last checkpoint to reconstruct all committed changes that weren't written to data files yet. Fast and automatic.
🏭
Streaming Replication
Standby servers connect to primary and receive a continuous stream of WAL records. They apply them to their own data files — staying in sync in near-real time. pg_stat_replication monitors this.
📅
Point-in-Time Recovery
When archive_mode=on, completed WAL segments are archived. You can restore a base backup + replay WAL to any specific timestamp — e.g., just before an accidental DROP TABLE.
📡
Logical Decoding
WAL records can be decoded into logical row-level changes (INSERT/UPDATE/DELETE) via output plugins (pgoutput, wal2json). Used for logical replication and CDC with Debezium/Kafka.
WAL File Naming — Understanding LSN and Segments
shell — WAL file structure
# WAL files in $PGDATA/pg_wal/
ls -la $PGDATA/pg_wal/

# Filename format: 000000TT XXXXXXXX YYYYYYYY
# TT = Timeline ID (increases on failover/PITR)
# XXXXXXXX YYYYYYYY = 64-bit LSN (Log Sequence Number) split

000000010000000000000091    # Timeline 1, LSN segment 0x91
000000010000000000000092    # Timeline 1, LSN segment 0x92
000000010000000000000093    # After DROP TABLE — segment 0x93

# Each WAL segment = 16MB (default wal_segment_size)

# Check current WAL position (LSN)
SELECT pg_current_wal_lsn();        -- e.g., 0/93007FF0
SELECT pg_walfile_name(pg_current_wal_lsn());  -- convert LSN to filename

# Measure WAL generated in last 5 minutes
SELECT pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), '0/91000000')
) AS wal_since_checkpoint;
🔒
Full Page Images (FPI) — Write Amplification Source
After every checkpoint, the first write to any page includes a full 8KB copy of that page in the WAL record — even if only 10 bytes changed. This protects against torn page writes (partial OS writes). Enable wal_compression = lz4 or zstd to compress FPIs and reduce WAL volume by 50–80% with minimal CPU cost.
💡
Oracle DBA Note
PostgreSQL WAL ≈ Oracle Online Redo Logs. WAL archiving ≈ Oracle ARCHIVELOG mode. PITR ≈ Oracle RMAN point-in-time recovery. LSN (Log Sequence Number) ≈ Oracle SCN. WAL segments (16MB files) ≈ Oracle redo log members/groups. The WAL Writer process ≈ Oracle LGWR. The Checkpointer ≈ Oracle CKPT + DBWn combined.

Summary: PostgreSQL vs Oracle — Key Architecture Equivalents

PostgreSQL ComponentOracle EquivalentPurpose
PostmasterPMON + ListenerMaster process, accepts connections
Backend processDedicated server processHandles single client session
Shared BuffersSGA Buffer CacheMain data page cache
WAL BuffersRedo Log Buffer (SGA)In-memory WAL before disk flush
work_memPGA (sort area)Per-operation sort/hash memory
pg_wal/ (WAL files)Online Redo LogsCrash recovery, replication
WAL archivingARCHIVELOG modeHistorical redo for PITR
CheckpointerCKPT + DBWnFlush dirty pages to disk
WAL WriterLGWRWrite redo log to disk
AutovacuumSMON (partial)Reclaim dead tuple space
MVCC (heap tuples)MVCC (UNDO tablespace)Consistent reads without locks
XID / LSNSCNTransaction ordering / point-in-time
pg_stat_activityV$SESSIONSession monitoring
pg_stat_statementsV$SQLSQL performance stats
EXPLAIN ANALYZEAUTOTRACE / SQL TraceQuery execution plan + timings
VACUUM FULL / pg_repackALTER TABLE MOVEReclaim bloated table space