What are the key differences between Oracle and PostgreSQL for migration planning?

Key differences include: Oracle uses UNDO tablespace for MVCC while PostgreSQL stores old versions in the heap; Oracle's sequences have NOCACHE by default while PostgreSQL defaults vary; Oracle ROWNUM vs PostgreSQL LIMIT/OFFSET; Oracle packages have no direct equivalent in PostgreSQL; NULLs and empty strings behave differently.

PostgreSQL Advanced DBA Interview Questions – 25 Expert Q&A | Bright DBA

/Bright DBA

●●● Advanced Level

Interview Preparation Series · Part 3 of 3

PostgreSQL
Advanced
DBA Interview

Q: How does the PostgreSQL query planner work internally?

The PostgreSQL query planner uses a cost-based optimizer. It parses SQL into a parse tree, rewrites it, then generates possible execution plans using statistics from pg_statistic. It estimates the cost of each plan using row count estimates, I/O costs, and CPU costs, then selects the cheapest plan.

Q: What is JIT compilation in PostgreSQL?

JIT (Just-In-Time) compilation in PostgreSQL uses LLVM to compile parts of a query execution plan into native machine code at runtime. This speeds up CPU-bound operations like expression evaluation, tuple deforming, and aggregate functions for complex analytical queries.

Q: What is logical decoding in PostgreSQL?

Logical decoding is a PostgreSQL feature that interprets WAL records and outputs them as a logical stream of row-level changes (INSERT, UPDATE, DELETE). It is used by tools like pgoutput (built-in), wal2json, and Debezium to implement Change Data Capture (CDC).

Q: How does Citus extend PostgreSQL for horizontal sharding?

Citus is a PostgreSQL extension that transforms a PostgreSQL cluster into a distributed database. It distributes table rows across multiple worker nodes using a distribution column (hash or range). The coordinator node receives queries and routes them to the appropriate worker shards.

Expert-level questions covering query planner internals, JIT compilation, logical decoding & CDC, Citus sharding, custom extensions, WAL internals, parallel query, and Oracle-to-PostgreSQL migration strategy.

25Questions

7Categories

●●●Expert Level

v17PostgreSQL

Planner Internals JIT Compilation Logical Decoding / CDC Citus Sharding Parallel Query WAL Internals Custom Extensions Oracle → PostgreSQL Migration pg_stat_activity Deep Dive Write Amplification

Query Planner & Execution Engine

Planner internals, cost model, JIT, parallel query, statistics

Q01 – Q05

01/ 25

How does the PostgreSQL query planner work internally? Walk through the stages from SQL text to execution plan.

Planner+

The PostgreSQL query planner uses a cost-based optimizer (CBO). It processes a query through five stages:

Parse — SQL text → parse tree (syntax check via gram.y). Produces an abstract syntax tree.
Analyze/Rewrite — Semantic validation, name resolution, rule rewriting (views expand here). Produces a query tree.
Plan/Optimize — Generates candidate execution plans, estimates their cost using pg_statistic data, selects the cheapest plan.
Execute — The executor walks the plan tree, calling each node's Init/Next/End functions.

The planner estimates cost using three components: seq_page_cost, random_page_cost, and cpu_tuple_cost. Row count estimates come from pg_statistic (populated by ANALYZE). The planner uses a dynamic programming algorithm for join ordering up to join_collapse_limit tables; beyond that it switches to a genetic algorithm (GEQO).

SQL — Inspect Planner Internals

-- See full plan with cost breakdown
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL, FORMAT JSON)
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'ACTIVE';

-- Tune planner cost constants
SET random_page_cost = 1.1;   -- SSD (default 4.0 for HDD)
SET seq_page_cost    = 1.0;   -- default
SET cpu_tuple_cost   = 0.01;  -- default

-- Check statistics on a column
SELECT * FROM pg_statistic
WHERE starelid = 'orders'::regclass;

Oracle DBA Note

PostgreSQL's CBO ≈ Oracle's CBO (introduced in Oracle 7). Key difference: PostgreSQL does not have SQL profiles, SQL plan baselines, or adaptive query optimization natively. Use pg_hint_plan extension for query hints.

02/ 25

What is JIT compilation in PostgreSQL and when does it help (or hurt) performance?

JIT+

JIT (Just-In-Time) compilation uses LLVM to compile portions of an execution plan into native machine code at query runtime, rather than interpreting them through PostgreSQL's generic executor. It was introduced in PostgreSQL 11.

What gets JIT-compiled: expression evaluation, tuple deforming, aggregate functions, projections.
When it helps: CPU-intensive analytical queries with complex WHERE clauses, large aggregations, or many columns — typically OLAP workloads.
When it hurts: OLTP queries with low row counts. JIT compilation has ~20–100ms overhead. For short queries this overhead exceeds any gain.

SQL — JIT Configuration

-- Check if JIT is enabled
SHOW jit;
SHOW jit_above_cost;         -- default: 100000
SHOW jit_inline_above_cost;  -- default: 500000
SHOW jit_optimize_above_cost;-- default: 500000

-- Disable JIT for OLTP workloads (set in postgresql.conf)
jit = off

-- Check if JIT was used in a plan
EXPLAIN (ANALYZE, FORMAT TEXT)
SELECT SUM(amount * discount * tax_rate)
FROM orders WHERE year = 2025;
-- Look for: JIT: Functions: N, Options: Inlining true...

⚠ Tuning Tip

If your OLTP workload is slower after upgrading to PostgreSQL 11+, check if JIT is being triggered on short queries. The fix: raise jit_above_cost or disable JIT globally and enable it only for specific analytical sessions.

03/ 25

Explain parallel query in PostgreSQL — how does it work and what are its limitations?

Parallel+

PostgreSQL can execute certain queries using multiple CPU cores via parallel workers (background processes). The leader process divides work among parallel workers using Parallel Seq Scan, Parallel Index Scan, Parallel Hash Join, and Parallel Aggregate.

postgresql.conf — Parallel Query Settings

max_parallel_workers              = 8   # total parallel workers
max_parallel_workers_per_gather   = 4   # per query
max_parallel_maintenance_workers  = 4   # for CREATE INDEX
parallel_setup_cost               = 1000
parallel_tuple_cost               = 0.1
min_parallel_table_scan_size      = '8MB'

-- Force parallel for testing
SET max_parallel_workers_per_gather = 4;
SET parallel_setup_cost = 0;
EXPLAIN ANALYZE SELECT COUNT(*) FROM large_table;

Limitations:

Parallel query is NOT used inside transactions with SERIALIZABLE isolation level.
Queries with FOR UPDATE/SHARE, CURSOR, or LIMIT (in some cases) don't parallelize.
Functions marked PARALLEL UNSAFE (default for user functions) prevent parallelism.
Small tables: the planner's parallel_setup_cost threshold prevents parallel overhead on tiny scans.
Worker spawning has overhead — not beneficial for queries under ~100ms.

04/ 25

What are extended statistics in PostgreSQL and when do you need them?

Planner+

By default, PostgreSQL collects statistics on each column independently. When columns are correlated (e.g., city + zip_code), the planner severely underestimates row counts for multi-column predicates. Extended statistics capture cross-column correlations.

SQL

-- Problem: city and zip_code are correlated
-- Planner underestimates: rows = 5 (actual: 500)
EXPLAIN ANALYZE SELECT * FROM addresses
WHERE city = 'London' AND zip_code = 'EC1A';

-- Solution: create extended statistics on correlated columns
CREATE STATISTICS stat_city_zip (dependencies)
    ON city, zip_code FROM addresses;

ANALYZE addresses;

-- Now rerun EXPLAIN — row estimate dramatically improves

-- Types of extended statistics:
-- dependencies: functional dependency (A determines B)
-- ndistinct: number of distinct value combinations
-- mcv: most common value combinations (PostgreSQL 12+)
CREATE STATISTICS stat_full (dependencies, ndistinct, mcv)
    ON city, state, zip_code FROM addresses;

Oracle DBA Note

Extended statistics ≈ Oracle's column group statistics. In Oracle: DBMS_STATS.CREATE_EXTENDED_STATS. In PostgreSQL: CREATE STATISTICS. Both solve the same correlated-column row estimation problem.

05/ 25

What is the difference between Hash Join, Nested Loop, and Merge Join? When does PostgreSQL choose each?

Planner+

Join Type	Algorithm	Memory	Best When
Nested Loop	For each outer row, scan inner table/index	Low	Small inner table, index available on join key, OLTP
Hash Join	Build hash table from smaller relation, probe with larger	High (work_mem)	Large tables, no useful index, OLAP aggregations
Merge Join	Sort both sides, merge-scan sorted results	Medium (sort memory)	Both sides already sorted or have btree index on join key

SQL — Force / Diagnose Join Types

-- Diagnose which join is being chosen
EXPLAIN (ANALYZE, FORMAT TEXT)
SELECT * FROM orders o JOIN customers c ON o.cid = c.id;

-- Force specific join types for testing
SET enable_hashjoin     = off;
SET enable_nestloop     = off;
SET enable_mergejoin    = off;

-- Hash join spills to disk if work_mem is too low
-- Look for: Batches: N (if N > 1, it spilled to disk)
SET work_mem = '256MB'; -- raise per session for large joins

⚠ Expert Tip

When you see Hash Batches: 8 in EXPLAIN output, the hash join spilled to disk 8 times. Increasing work_mem for that session can collapse it to 1 batch (in-memory) and cut the query time dramatically.

WAL Internals & Checkpoints

WAL format, LSN, checkpoints, write amplification, WAL compression

Q06 – Q09

06/ 25

Explain the internal structure of a WAL record and how LSN (Log Sequence Number) works.

WAL+

Every change in PostgreSQL generates a WAL record — a binary log entry describing exactly what changed. WAL records are appended sequentially to WAL segment files (16MB each by default).

A WAL record contains: header (total length, resource manager ID, CRC32), data (the actual change: block number, offset, new data), and optionally a full-page image (FPI) — a complete copy of the 8KB block written on the first modification after a checkpoint.

LSN (Log Sequence Number) — A 64-bit byte offset into the WAL stream. Format: 0/A1B2C3D4. Monotonically increasing. Used to identify a precise WAL position for replication lag, PITR, and crash recovery.
Full-Page Images (FPI) — Written to survive partial write scenarios (torn pages). Controlled by full_page_writes = on (default). This is the main driver of WAL write amplification.
WAL compression — wal_compression = on compresses FPIs with LZ4/zstd, dramatically reducing WAL volume with minimal CPU overhead.

SQL — WAL Inspection

-- Current WAL write position (LSN)
SELECT pg_current_wal_lsn();

-- WAL generated between two points (bytes)
SELECT pg_wal_lsn_diff(
    pg_current_wal_lsn(),
    '0/A1B2C3D4'
);

-- Decode WAL records (pg_walinspect extension - PG 14+)
CREATE EXTENSION pg_walinspect;
SELECT * FROM pg_get_wal_records_info(
    '0/A1000000', '0/A2000000'
) LIMIT 20;

-- Check WAL statistics
SELECT * FROM pg_stat_wal;

Oracle DBA Note

WAL LSN ≈ Oracle SCN (System Change Number). Both are monotonically increasing numbers identifying a precise point in the redo log stream. WAL segment files ≈ Oracle online redo log members.

07/ 25

What is write amplification in PostgreSQL and how do you diagnose and reduce it?

WAL+

Write amplification is when PostgreSQL writes significantly more data to disk than the logical data change size. The primary sources are full-page images (FPIs) in WAL and autovacuum rewrites.

FPI amplification: After every checkpoint, the first write to each page generates a full 8KB FPI in WAL — even if only 10 bytes changed. A table doing many random updates can generate 100x WAL volume vs actual data changes.
Heap amplification: MVCC means every UPDATE writes a new row version — the old version stays until vacuumed. High-update tables swell in size.
Index amplification: Every row version change requires an index entry update. Partial indexes and HOT (Heap-Only Tuple) updates help.

SQL + Config — Diagnose & Reduce

-- Measure WAL generation rate per statement
EXPLAIN (ANALYZE, WAL) UPDATE orders
SET status = 'CLOSED' WHERE id = 42;
-- Output shows: WAL Records: N, WAL Bytes: N

-- Reduce write amplification
-- 1. Enable WAL compression (huge win for FPIs)
wal_compression = lz4       # or zstd (PG 15+)

-- 2. Increase checkpoint_timeout (fewer FPI triggers)
checkpoint_timeout = '30min'

-- 3. Use HOT updates (no index key change needed)
-- HOT = Heap Only Tuple: no index update needed
-- Requires: update only non-indexed columns
--           sufficient fillfactor on table
ALTER TABLE orders SET (fillfactor = 80);

08/ 25

What happens during a PostgreSQL checkpoint and how do you tune checkpoint behavior?

WAL+

A checkpoint is a point where PostgreSQL guarantees all dirty (modified) shared_buffer pages have been flushed to disk. After a crash, recovery only needs to replay WAL from the last checkpoint — reducing recovery time.

Triggers: checkpoint_timeout elapsed, max_wal_size exceeded, or explicit CHECKPOINT command.
The checkpointer process writes dirty pages spread across the checkpoint window — controlled by checkpoint_completion_target.
Checkpoints cause I/O spikes if they complete too quickly. Setting checkpoint_completion_target = 0.9 spreads the writes over 90% of the checkpoint interval.

postgresql.conf — Checkpoint Tuning

checkpoint_timeout           = '15min'  # default 5min — longer = less WAL
max_wal_size                 = '4GB'    # default 1GB
min_wal_size                 = '1GB'
checkpoint_completion_target = 0.9     # spread I/O over 90% of interval

-- Monitor checkpoint frequency
SELECT checkpoints_timed, checkpoints_req,
       buffers_checkpoint, buffers_clean,
       round(buffers_checkpoint /
           NULLIF(checkpoints_timed + checkpoints_req, 0)) AS avg_bufs
FROM pg_stat_bgwriter;
-- High checkpoints_req = max_wal_size too small

⚠ Key Signal

If checkpoints_req is much higher than checkpoints_timed, your max_wal_size is too small and checkpoints are being forced by WAL volume, not by time. Increase max_wal_size.

09/ 25

What is the difference between synchronous_commit = on/off/remote_write/remote_apply?

WAL+

synchronous_commit controls when PostgreSQL considers a transaction committed relative to WAL durability. It is a per-transaction setting — no restart required.

Value	WAL Written to Disk	Standby Received	Standby Applied	Risk
on (default)	✅ Before COMMIT returns	—	—	None (fully durable)
off	❌ Async (up to 3x wal_writer_delay later)	—	—	Last ~600µs of commits on crash
remote_write	✅	✅ Written to OS buffer	—	Small: standby OS crash
remote_apply	✅	✅	✅ Applied to standby	None — reads from standby guaranteed fresh
local	✅ Primary only	—	—	Standby may lag

⚡

Expert Usage

synchronous_commit = off is safe for non-critical workloads (logging, analytics inserts) and can give 3–10x write throughput improvement with only a very small durability window. The database will never corrupt — you only risk losing the last ~600µs of commits on a hard crash.

Logical Decoding & Change Data Capture

Decoding plugins, Debezium, wal2json, replication slots, CDC pipelines

Q10 – Q12

10/ 25

What is logical decoding in PostgreSQL and how is it used for CDC pipelines?

CDC+

Logical decoding interprets WAL records and emits them as a logical stream of row-level changes (INSERT/UPDATE/DELETE) using a pluggable output plugin. This powers Change Data Capture (CDC) without application code changes.

Output plugins: pgoutput (built-in, used by logical replication), wal2json (JSON output), test_decoding (text, for debugging).
Requires wal_level = logical and a logical replication slot to track consumer position.
Debezium is the most popular CDC framework — it connects to PostgreSQL's logical decoding API and streams changes to Kafka.

SQL — Logical Decoding Manually

-- Create a logical decoding slot with wal2json
SELECT pg_create_logical_replication_slot(
    'my_cdc_slot', 'wal2json'
);

-- Peek at changes (non-consuming)
SELECT * FROM pg_logical_slot_peek_changes(
    'my_cdc_slot', NULL, NULL
);

-- Consume changes (advances slot position)
SELECT * FROM pg_logical_slot_get_changes(
    'my_cdc_slot', NULL, NULL,
    'pretty-print', '1'
);

-- Always monitor slot lag!
SELECT slot_name, active,
    pg_size_pretty(pg_wal_lsn_diff(
        pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag
FROM pg_replication_slots;

Oracle DBA Note

PostgreSQL logical decoding ≈ Oracle LogMiner + GoldenGate extract. Debezium/Kafka pipeline ≈ Oracle GoldenGate replicat to Kafka. Key advantage: PostgreSQL's built-in logical decoding is free; Oracle LogMiner is built-in but GoldenGate requires expensive licensing.

11/ 25

How do you build a CDC pipeline from PostgreSQL to a data warehouse using Debezium?

CDC+

A production CDC pipeline from PostgreSQL to a data warehouse (Snowflake, BigQuery, Redshift) typically uses: PostgreSQL → Debezium → Apache Kafka → Kafka Connect sink.

Step 1 — PostgreSQL config: Set wal_level = logical, create a dedicated replication user, add pg_hba.conf entry.
Step 2 — Debezium PostgreSQL connector: Creates a replication slot (pgoutput plugin), reads WAL changes, publishes to Kafka topics per table.
Step 3 — Kafka topics: Each table gets a topic (e.g., pg.public.orders). Payload contains before/after row images in Avro or JSON.
Step 4 — Sink connector: Kafka Connect sink writes to target (Snowflake Kafka Connector, BigQuery Connector).

JSON — Debezium Connector Config

{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "database.hostname": "192.168.1.51",
  "database.port": "5432",
  "database.user": "repl_user",
  "database.password": "secret",
  "database.dbname": "mydb",
  "database.server.name": "pg",
  "plugin.name": "pgoutput",
  "slot.name": "debezium_slot",
  "publication.name": "dbz_publication",
  "table.include.list": "public.orders,public.customers",
  "snapshot.mode": "initial"
}

⚠ Operational Risk

The Debezium replication slot MUST be monitored. If Kafka is down for an extended period, the slot accumulates WAL indefinitely. Set max_slot_wal_keep_size = 10GB as a safety valve — but note: if exceeded, the slot is invalidated and you must re-snapshot.

12/ 25

What is REPLICA IDENTITY in PostgreSQL and why is it critical for logical replication?

CDC+

REPLICA IDENTITY controls what column data is included in the WAL record for UPDATE and DELETE operations. Without sufficient identity columns, subscribers cannot identify which row to update or delete.

Setting	OLD Row Data in WAL	Use When
DEFAULT	Primary key columns only	Table has a primary key (recommended)
FULL	All columns (before image)	No PK — but doubles WAL volume!
NOTHING	None	Only INSERT replication needed
USING INDEX	Specified unique index columns	No PK but has unique index

SQL

-- Check current replica identity
SELECT relname, relreplident
FROM pg_class WHERE relname = 'orders';
-- d=DEFAULT, f=FULL, n=NOTHING, i=INDEX

-- Set FULL for tables without PK (CDC requirement)
ALTER TABLE orders_legacy REPLICA IDENTITY FULL;

-- Use index instead (less WAL overhead than FULL)
ALTER TABLE orders REPLICA IDENTITY USING INDEX idx_orders_uuid;

Citus & Horizontal Sharding

Distributed PostgreSQL, shard placement, colocation, distributed queries

Q13 – Q15

13/ 25

How does Citus extend PostgreSQL for horizontal sharding? Explain the architecture.

Citus+

Citus is a PostgreSQL extension (now open-source by Microsoft) that transforms a single PostgreSQL instance into a distributed database by adding a coordinator node and multiple worker nodes.

Coordinator node: Receives all queries. Parses and plans them. Routes shard-specific parts to workers. Merges results.
Worker nodes: Each holds a subset of shards. Executes pushed-down query fragments in parallel.
Distribution column: A column used to hash rows across shards. Choose carefully — cross-shard JOINs are expensive.
Shard count: Default 32 shards per table. Shards are regular PostgreSQL tables on workers.

SQL — Citus Setup

-- On coordinator: add worker nodes
SELECT citus_add_node('worker1', 5432);
SELECT citus_add_node('worker2', 5432);

-- Distribute a table by tenant_id
SELECT create_distributed_table('orders', 'tenant_id');

-- Colocate related tables (JOIN-friendly)
SELECT create_distributed_table('order_items', 'tenant_id',
    colocate_with := 'orders');

-- Reference table (replicated to all workers)
SELECT create_reference_table('countries');

-- Check shard placement
SELECT * FROM citus_shards;
SELECT * FROM pg_dist_placement;

⚡

Expert Tip

Choose the distribution column carefully — it should be the most common JOIN key in your queries. For multi-tenant SaaS apps, tenant_id is ideal: all tenant data goes to the same shard, enabling efficient local JOINs without cross-node network hops.

14/ 25

What is table colocation in Citus and why is it important for query performance?

Citus+

Table colocation ensures that related tables (e.g., orders and order_items) that share the same distribution key have their corresponding shards placed on the same worker node. This allows JOINs between them to be executed locally on each worker — no network hops between workers.

SQL

-- Without colocation: JOIN requires cross-node data transfer
-- With colocation: JOIN runs locally on each worker (parallel)

SELECT create_distributed_table('orders',      'tenant_id');
SELECT create_distributed_table('order_items', 'tenant_id',
    colocate_with := 'orders');

-- This JOIN now executes locally on workers (fast)
SELECT o.id, SUM(oi.amount)
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
  AND o.tenant_id = oi.tenant_id
WHERE o.tenant_id = 42
GROUP BY o.id;

-- Verify colocation group assignment
SELECT logicalrelid, colocationid
FROM pg_dist_partition
WHERE logicalrelid::text IN ('orders', 'order_items');

15/ 25

What are the limitations of Citus that a senior DBA must know before adopting it?

Citus+

No cross-shard transactions with full ACID: Multi-shard writes use 2-phase commit (2PC) which is slower and can leave transactions in an uncertain state if coordinator crashes.
Changing distribution column: Once a table is distributed, you cannot easily change the distribution column without redistributing all data.
Cross-shard JOINs on non-distribution key: If you JOIN on a column other than the distribution key, Citus must repartition data on the fly — expensive and slow.
Sequences: Citus uses sharded sequences that may produce gaps. Standard SERIAL sequences may not be globally unique across workers.
DDL limitations: Some DDL operations require quorum across all nodes. Schema changes must be applied carefully.
Aggregation pushdown: Only certain aggregates (SUM, COUNT, AVG, MIN, MAX) can be pushed down. Complex aggregates may pull data to coordinator.

⚠ Adoption Advice

Citus is an excellent choice for multi-tenant SaaS and time-series analytics at scale. However, it is a significant architectural commitment. Evaluate whether PostgreSQL native partitioning + read replicas solves your problem first — it often does up to several TBs with less complexity.

Custom Extensions & Deep Monitoring

Writing extensions, pg_stat_activity internals, wait events, bgworkers

Q16 – Q19

16/ 25

What are PostgreSQL background workers and how are they used in custom extensions?

Extensions+

Background workers (BGW) are custom processes that run inside the PostgreSQL server process tree — started at server startup or dynamically. They are used by extensions to run periodic tasks, listen to events, or implement custom replication.

pg_cron — Uses a BGW to wake up every minute and check the cron schedule.
pg_logical — Uses BGWs per subscription to stream WAL changes.
Patroni — Not a BGW, but Patroni watches PostgreSQL as an external agent.
BGWs run as normal postgres backend processes — they can execute SQL, access shared memory, and use background worker APIs.

C — Minimal BGW Registration (Extension)

/* In your extension's _PG_init() function */
void _PG_init(void) {
    BackgroundWorker worker;
    MemSet(&worker, 0, sizeof(worker));
    worker.bgw_flags      = BGWORKER_SHMEM_ACCESS |
                            BGWORKER_BACKEND_DATABASE_CONNECTION;
    worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
    worker.bgw_restart_time = 10; /* restart after 10s on crash */
    snprintf(worker.bgw_name, BGW_MAXLEN, "my_extension worker");
    worker.bgw_main_arg  = (Datum) 0;
    RegisterBackgroundWorker(&worker);
}

/* Worker main function: runs in its own process */
void my_worker_main(Datum arg) {
    BackgroundWorkerInitializeConnection("mydb", NULL, 0);
    while (!ShutdownRequestPending()) {
        /* do work every 5 seconds */
        WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT, 5000, ...);
    }
}

17/ 25

Explain PostgreSQL wait events — what are the most important ones and how do you diagnose them?

Monitoring+

Wait events in pg_stat_activity tell you exactly what a session is waiting for — the single most important diagnostic tool for PostgreSQL performance problems.

SQL — Wait Event Analysis

-- Real-time wait event summary (run repeatedly)
SELECT
    wait_event_type,
    wait_event,
    COUNT(*) AS sessions
FROM pg_stat_activity
WHERE state != 'idle'
GROUP BY wait_event_type, wait_event
ORDER BY sessions DESC;

Wait Event	Type	Meaning & Action
Lock:relation	Lock	Table-level lock contention. Find blocker with pg_blocking_pids().
Lock:tuple	Lock	Row-level lock contention. Typical in high-concurrency OLTP.
IO:DataFileRead	IO	Reading data pages from disk. shared_buffers too small or missing indexes.
IO:WALWrite	IO	WAL writes slow. Check disk I/O, consider wal_compression.
CPU	CPU	Query is running. Normal — unless it never finishes. Check for full table scans.
Client:ClientRead	Client	Waiting for client to send next command. Idle connection pooling issue.
Lock:advisory	Lock	Application-level advisory lock contention.
LWLock:WALInsert	LWLock	Contention on WAL insertion. Very high write concurrency.

Oracle DBA Note

PostgreSQL wait events ≈ Oracle V$SESSION.WAIT_CLASS + V$SESSION.EVENT. Oracle has hundreds of named wait events; PostgreSQL has a similar taxonomy through wait_event_type + wait_event pairs.

18/ 25

How do you implement sampling-based ASH (Active Session History) equivalent in PostgreSQL?

Monitoring+

PostgreSQL has no built-in ASH equivalent, but you can build one by periodically sampling pg_stat_activity into a history table. Extensions like pg_activity, pgBadger, and Percona Monitoring and Management (PMM) provide this.

SQL — Build a Simple ASH Table

-- Create ASH history table
CREATE TABLE ash_history (
    sample_time   TIMESTAMPTZ DEFAULT NOW(),
    pid           INT,
    usename       TEXT,
    datname       TEXT,
    state         TEXT,
    wait_event_type TEXT,
    wait_event    TEXT,
    query         TEXT
);

-- Schedule sampling every 1 second via pg_cron
SELECT cron.schedule('ash-sample', '* * * * *', $$
    INSERT INTO ash_history
    SELECT NOW(), pid, usename, datname, state,
           wait_event_type, wait_event, left(query,200)
    FROM pg_stat_activity
    WHERE state != 'idle' AND pid != pg_backend_pid();
$$);

-- Query ASH: top wait events in last hour
SELECT wait_event_type, wait_event, COUNT(*) AS samples
FROM ash_history
WHERE sample_time > NOW() - interval '1 hour'
GROUP BY 1,2 ORDER BY 3 DESC;

Oracle DBA Note

Oracle ASH samples v$session every 1 second into v$active_session_history (in-memory) and DBA_HIST_ACTIVE_SESS_HISTORY (persisted). This PostgreSQL approach replicates that pattern. Commercial tools like pganalyze provide a polished ASH equivalent.

19/ 25

How do you write a custom PostgreSQL extension in C? What are the key components?

Extensions+

A PostgreSQL extension packages custom SQL objects (functions, types, operators, views) with C or PL/pgSQL code. The minimum files are: a control file, an SQL script, and optionally a C shared library.

Shell / C — Minimal Extension Structure

## Directory structure
my_extension/
├── my_extension.control        # extension metadata
├── my_extension--1.0.sql       # SQL definitions
├── my_extension.c              # C implementation
└── Makefile

## my_extension.control
comment   = 'My custom extension'
default_version = '1.0'
relocatable = true
module_pathname = '$libdir/my_extension'

## my_extension--1.0.sql
CREATE FUNCTION my_add(a INT, b INT)
    RETURNS INT
    AS 'MODULE_PATHNAME', 'my_add'
    LANGUAGE C STRICT;

/* my_extension.c */
#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(my_add);
Datum my_add(PG_FUNCTION_ARGS) {
    int32 a = PG_GETARG_INT32(0);
    int32 b = PG_GETARG_INT32(1);
    PG_RETURN_INT32(a + b);
}

## Install
make install
psql -c "CREATE EXTENSION my_extension;"

Oracle → PostgreSQL Migration

Key differences, incompatibilities, migration tools, and strategy

Q20 – Q22

20/ 25

What are the most critical incompatibilities between Oracle and PostgreSQL that affect migration?

Migration+

Feature	Oracle	PostgreSQL Equivalent
Empty string vs NULL	'' = NULL (same!)	'' ≠ NULL — critical behavioural difference
ROWNUM / ROWID	ROWNUM, ROWID	LIMIT/OFFSET, ctid (internal, not stable)
Packages	CREATE PACKAGE (state + code)	No direct equivalent — use schemas + functions
CONNECT BY (hierarchical)	CONNECT BY PRIOR	WITH RECURSIVE (standard SQL)
DECODE / NVL	DECODE(), NVL()	CASE WHEN, COALESCE()
Sequences	NEXTVAL on sequence object	SERIAL / IDENTITY columns or explicit sequences
SYSDATE / SYSTIMESTAMP	SYSDATE	NOW(), CURRENT_TIMESTAMP, CURRENT_DATE
VARCHAR2	VARCHAR2(n BYTE/CHAR)	VARCHAR(n) — always character-based
DUAL table	SELECT 1 FROM DUAL	SELECT 1 (no FROM needed)
Outer join syntax	WHERE a.id = b.id(+)	Standard ANSI: LEFT JOIN ... ON

⚡

Critical Difference

The NULL vs empty string difference is the #1 most commonly missed incompatibility. In Oracle, WHERE col = '' is equivalent to WHERE col IS NULL. In PostgreSQL they are completely different. Applications that store '' in Oracle to mean "no value" will break silently if migrated without code audit.

21/ 25

What tools and approach do you use for a large-scale Oracle to PostgreSQL migration?

Migration+

ora2pg — Open-source tool that exports Oracle schema, data, and PL/SQL to PostgreSQL-compatible SQL. Best for schema conversion and data export. Handles tables, indexes, sequences, views, functions, triggers.
AWS Schema Conversion Tool (SCT) — GUI tool, good for assessing migration complexity and converting stored procedures.
AWS Database Migration Service (DMS) — For ongoing data replication from Oracle to PostgreSQL during cutover period.
Striim / GoldenGate — For zero-downtime migration using CDC from Oracle to PostgreSQL.

Shell — ora2pg Migration Workflow

# Step 1: Assess migration complexity
ora2pg --type SHOW_REPORT --estimate_cost \
    --oracle_dsn "dbi:Oracle:host=ora-server;sid=ORCL"

# Step 2: Export schema only (no data)
ora2pg --type TABLE   -o tables.sql
ora2pg --type INDEX   -o indexes.sql
ora2pg --type PACKAGE -o packages.sql  # converts to PL/pgSQL

# Step 3: Export data in COPY format (fast)
ora2pg --type COPY --table ORDERS \
    -o orders_data.sql

# Step 4: Load into PostgreSQL
psql -d targetdb -f tables.sql
psql -d targetdb -f orders_data.sql

⚠ Strategy Advice

Never do a "big bang" cutover on a large Oracle database. Use CDC (DMS/GoldenGate) to keep PostgreSQL in sync with Oracle for weeks, run full regression testing, then perform a planned maintenance window switchover with minimal downtime.

22/ 25

How do you convert Oracle PL/SQL packages to PostgreSQL? What are the main challenges?

Migration+

Oracle PL/SQL packages (package spec + body with global state) have no direct equivalent in PostgreSQL. The conversion strategy depends on what the package does:

Package as namespace: Create a PostgreSQL schema with the package name. Put all functions inside that schema.
Package global variables: No equivalent. Use a session-local temporary table, a custom GUC parameter, or pass state explicitly as function parameters.
Package initialization code: Use a function called explicitly on session start, or a SET LOCAL in a transaction.
EXECUTE IMMEDIATE: Convert to PostgreSQL EXECUTE format(...) in PL/pgSQL.
Autonomous transactions: PostgreSQL 14+ has limited support via dblink or the pg_background extension.

PL/pgSQL — Package Simulation

-- Oracle package → PostgreSQL schema approach

-- Oracle: CREATE PACKAGE emp_pkg AS ...
-- PostgreSQL equivalent:
CREATE SCHEMA emp_pkg;

CREATE OR REPLACE FUNCTION emp_pkg.get_salary(p_id INT)
RETURNS NUMERIC AS $$
BEGIN
    RETURN (SELECT salary FROM employees WHERE id = p_id);
END;
$$ LANGUAGE plpgsql;

-- Oracle global package variable → custom GUC
-- In C extension: DefineCustomIntVariable('myapp.user_id', ...)
-- Then in PL/pgSQL:
SELECT current_setting('myapp.user_id')::INT;
PERFORM SET_CONFIG('myapp.user_id', 42::text, true);

Security, Row-Level Security & Advanced Admin

RLS, pgAudit, SSL, column encryption, security best practices

Q23 – Q25

23/ 25

What is Row-Level Security (RLS) in PostgreSQL and how do you implement a multi-tenant security model with it?

Security+

Row-Level Security (RLS) allows you to define policies that restrict which rows a user can see or modify. The database enforces these policies transparently — the application doesn't need to add WHERE clauses.

SQL — Multi-Tenant RLS Implementation

-- Step 1: Enable RLS on the table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Step 2: Create isolation policy (tenants see only their data)
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant')::INT);

-- Step 3: Application sets tenant context on connection
SET app.current_tenant = '42';

-- Now tenant 42 only sees their own rows
SELECT * FROM orders;  -- automatically filtered!

-- Step 4: Superusers bypass RLS by default
-- Force RLS even for table owner:
ALTER TABLE orders FORCE ROW LEVEL SECURITY;

-- Separate policies for SELECT vs INSERT vs UPDATE
CREATE POLICY select_policy ON orders FOR SELECT
    USING (tenant_id = current_setting('app.current_tenant')::INT);

CREATE POLICY insert_policy ON orders FOR INSERT
    WITH CHECK (tenant_id = current_setting('app.current_tenant')::INT);

Oracle DBA Note

RLS ≈ Oracle Virtual Private Database (VPD) / Fine-Grained Access Control (FGAC). Oracle VPD requires DBMS_RLS package and applies a predicate dynamically. PostgreSQL RLS is the native equivalent — simpler syntax, same security model.

24/ 25

How do you implement database activity auditing in PostgreSQL using pgAudit?

Security+

pgAudit is the standard PostgreSQL extension for detailed audit logging. It extends PostgreSQL's standard log_statement to provide fine-grained control over which operations are logged — required for SOX, PCI-DSS, HIPAA compliance.

postgresql.conf + SQL — pgAudit Setup

## postgresql.conf
shared_preload_libraries = 'pgaudit'

## Session-level audit (all DDL + DML)
pgaudit.log = 'ddl, write, role'
## Options: read, write, function, role, ddl, misc, all

## Object-level audit (specific tables)
pgaudit.log_level    = log
pgaudit.log_relation = on
pgaudit.log_parameter= on

-- Enable extension
CREATE EXTENSION pgaudit;

-- Audit specific role
ALTER ROLE sensitive_user SET pgaudit.log = 'all';

-- Audit specific table (object-level)
ALTER ROLE auditor SET pgaudit.log = 'read';
-- then grant privileges — pgaudit tracks access

-- Log format (in PostgreSQL log file):
-- AUDIT: SESSION,1,1,DDL,CREATE TABLE,,,CREATE TABLE secret_data...

Oracle DBA Note

pgAudit ≈ Oracle Database Vault + Unified Auditing. Oracle's unified auditing is more feature-rich (fine-grained audit conditions, audit vault). pgAudit is simpler but covers all compliance requirements for most use cases.

25/ 25

How would you design a PostgreSQL architecture for a 50TB OLTP database with 99.999% availability requirements?

Architecture+

This is a capstone architecture question. A 50TB OLTP database with five-nines availability (~5 minutes downtime/year) requires layered redundancy across every tier.

HA Layer: Patroni cluster (1 primary + 2 synchronous replicas + 1 async replica in DR region). etcd 3-node cluster for leader election. HAProxy + Keepalived for VIP failover. Target RTO: <30 seconds automatic.
Synchronous replication: synchronous_standby_names = 'FIRST 1 (replica1, replica2)' — RPO = 0 (zero data loss) for primary datacenter failure.
Storage: NVMe SSDs with hardware RAID-10 or all-flash SAN. random_page_cost = 1.1, wal_compression = lz4.
Connection pooling: PgBouncer tier (2+ instances behind load balancer) in transaction mode. Target: 2000 app connections → 200 PostgreSQL connections.
Partitioning: Range-partition large tables by date. Detach old partitions for archival without locking.
Backup: pgBackRest with WAL archiving to S3/object storage. Daily base backups, continuous WAL archiving. RPO target: 5 minutes. Monthly restore drills.
Monitoring: Prometheus + postgres_exporter + Grafana. Alert on: replication lag >5s, XID age >1.5B, dead tuple ratio >10%, connection usage >80%, checkpoint frequency, WAL retention of replication slots.
Maintenance: pg_repack for online bloat removal. REINDEX CONCURRENTLY for index maintenance. Patch upgrades via logical replication-based rolling upgrade.

⚡

Expert Note

99.999% availability means ~5 minutes downtime per year. Patroni automated failover typically takes 10–30 seconds. To achieve five-nines you also need zero-downtime deployments, online schema changes (pg_repack, CREATE INDEX CONCURRENTLY), and a rigorous runbook for all failure modes. Plan for the unexpected: test your failover monthly, not just on paper.

BRIGHT DBA

ORACLE DBA – Tips and Techniques

PostgreSQL Advanced DBA Interview Questions

PostgreSQL
Advanced
DBA Interview