What is table partitioning in PostgreSQL and what types are supported?

Table partitioning in PostgreSQL divides a large table into smaller physical pieces called partitions, while appearing as a single logical table to queries. PostgreSQL supports Range, List, and Hash partitioning natively since PostgreSQL 10.

What is logical replication in PostgreSQL and how is it different from streaming replication?

Logical replication replicates data changes at the row level using a publish-subscribe model, allowing selective table replication and cross-version replication. Streaming replication replicates the entire WAL stream at the physical block level and requires identical PostgreSQL versions.

What is connection pooling in PostgreSQL and why is it needed?

Connection pooling manages a pool of database connections that are shared among multiple application clients. PostgreSQL creates a new process per connection which is expensive. PgBouncer is the most popular connection pooler, supporting session, transaction, and statement pooling modes.

What is autovacuum tuning in PostgreSQL?

Autovacuum tuning involves adjusting parameters like autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold, autovacuum_max_workers, and autovacuum_vacuum_cost_limit to control how aggressively autovacuum cleans dead tuples in high-transaction environments.

What is Patroni in PostgreSQL?

Patroni is an open-source high availability solution for PostgreSQL that automates failover using distributed consensus systems like etcd, ZooKeeper, or Consul. It manages the primary/replica relationship and automatically promotes a replica if the primary fails.

PostgreSQL Interview Questions for DBAs – 25 Intermediate Q&A | Bright DBA

Interview Preparation Series · Part 2

PostgreSQL DBA
Intermediate Interview
Questions & Answers

25 carefully crafted intermediate-level questions covering table partitioning, logical replication, autovacuum tuning, connection pooling, Patroni HA, query optimization, and real-world DBA scenarios.

Partitioning Logical Replication Autovacuum Tuning PgBouncer Patroni HA Query Optimization pg_stat_statements Extensions

Questions

Categories

●●○

Intermediate

v17

PostgreSQL

Partitioning & Storage Architecture

Table partitioning, bloat, TOAST, tablespaces

Q1 – Q5

01Q

What is table partitioning in PostgreSQL? What types are supported?

Partitioning +

Table partitioning divides a large logical table into smaller physical pieces called partitions, while queries still work against the parent table. PostgreSQL has supported declarative partitioning since version 10.

PostgreSQL supports three partition types:

Range Partitioning — Rows are distributed based on a range of values (e.g., date ranges). Best for time-series data.
List Partitioning — Rows are distributed based on a discrete list of values (e.g., country code, region).
Hash Partitioning — Rows are distributed using a hash function for even distribution when no natural range/list exists.

SQL — Range Partitioning Example

-- Create partitioned parent table
CREATE TABLE orders (
    order_id   BIGSERIAL,
    order_date DATE NOT NULL,
    amount     NUMERIC
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Query works on parent — partition pruning happens automatically
SELECT * FROM orders WHERE order_date = '2025-06-01';

💡

Oracle DBA Note

PostgreSQL declarative partitioning ≈ Oracle interval/range partitioning. However, in PostgreSQL, indexes and constraints must be created on each partition separately (or use ONLY on parent). There is no global index equivalent.

02Q

What is partition pruning and how does PostgreSQL use it?

Partitioning +

Partition pruning is PostgreSQL's optimizer feature that skips scanning partitions which cannot contain rows matching the query's WHERE clause. This dramatically reduces I/O on large partitioned tables.

Static pruning — Happens at plan time when the WHERE clause uses constant values.
Dynamic pruning — Happens at execution time when values are not known until runtime (e.g., parameterized queries). Controlled by enable_partition_pruning = on (default).

SQL

-- Check if partition pruning is working
EXPLAIN SELECT * FROM orders
WHERE order_date = '2025-03-15';

-- Good output: only orders_2025 scanned
-- Bad output: all partitions scanned → no pruning

-- Ensure partition pruning is enabled
SHOW enable_partition_pruning;
SET enable_partition_pruning = on;

⚠ Common Mistake

Partition pruning will NOT work if you cast the column: WHERE order_date::TEXT = '2025-03-15'. Always query the partition key in its native type.

03Q

What is table bloat in PostgreSQL and how do you measure and fix it?

Storage +

Table bloat occurs when dead tuples (from MVCC updates/deletes) accumulate faster than autovacuum can clean them, causing tables and indexes to consume more disk space than necessary and slowing queries.

SQL — Detect Bloat

-- Find tables with high dead tuple ratio
SELECT
    schemaname,
    relname AS table_name,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup::numeric /
        NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 15;

-- Fix 1: Standard VACUUM (no lock, space reusable)
VACUUM VERBOSE mytable;

-- Fix 2: VACUUM FULL (reclaims disk, takes EXCLUSIVE LOCK!)
VACUUM FULL mytable;

-- Fix 3: pg_repack (online defrag — no exclusive lock)
-- pg_repack -d mydb -t mytable

A dead_pct above 10–15% indicates a table needing immediate attention.
Prefer pg_repack over VACUUM FULL in production — it rebuilds the table without long locks.
Index bloat is separate — use REINDEX CONCURRENTLY for bloated indexes.

04Q

What is TOAST in PostgreSQL and how does it work?

Storage +

TOAST (The Oversized-Attribute Storage Technique) is PostgreSQL's mechanism for storing large column values that exceed the 8KB page size. PostgreSQL automatically stores large values in a separate TOAST table linked to the main table.

TOAST kicks in when a row exceeds roughly 2KB (the TOAST_TUPLE_THRESHOLD).
TOAST can compress and/or move out-of-line large values (TEXT, BYTEA, JSONB, arrays).
Four storage strategies: PLAIN, EXTENDED (default), EXTERNAL, MAIN.
Each table with TOAST-able columns has a corresponding pg_toast.pg_toast_NNNN table.

SQL

-- Check TOAST strategy for each column
SELECT attname, attstorage
FROM pg_attribute
WHERE attrelid = 'mytable'::regclass
  AND attnum > 0;

-- Change storage strategy for a column
ALTER TABLE mytable
  ALTER COLUMN doc_content SET STORAGE EXTERNAL;
-- EXTERNAL = no compression, fast access

05Q

What is Transaction ID (XID) wraparound and how do you prevent it?

Critical +

PostgreSQL uses a 32-bit transaction ID (XID) counter. After ~2.1 billion transactions, the XID wraps around. Without preventive action, PostgreSQL will shut down the database to prevent data loss — this is one of the most critical operational risks in PostgreSQL.

SQL — Monitor XID Age

-- Check XID age per database (alert if > 1.5 billion)
SELECT
    datname,
    age(datfrozenxid) AS xid_age,
    round(age(datfrozenxid)::numeric / 2100000000 * 100, 1) AS pct_used
FROM pg_database
ORDER BY xid_age DESC;

-- Check per-table (find which table is holding old XID)
SELECT relname, age(relfrozenxid) AS table_xid_age
FROM pg_class
WHERE relkind = 'r'
ORDER BY age(relfrozenxid) DESC
LIMIT 10;

-- Force freeze old XIDs
VACUUM FREEZE mytable;

PostgreSQL automatically triggers anti-wraparound autovacuum when XID age exceeds autovacuum_freeze_max_age (default: 200M).
Monitor and alert when XID age exceeds 1.5 billion.
Root cause is often a long-running idle transaction holding back the oldest XID — check pg_stat_activity.

⚠ Production Risk

If XID age reaches autovacuum_freeze_max_age (200M by default), PostgreSQL forces VACUUM even on busy tables. If it reaches 2.1B with no freeze done, PostgreSQL shuts down with "database is not accepting commands to avoid wraparound data loss."

Autovacuum Tuning

Configuring autovacuum for high-transaction environments

Q6 – Q9

06Q

How do you tune autovacuum for a high-transaction PostgreSQL database?

Autovacuum +

Default autovacuum settings are too conservative for high-write databases. The key is to make autovacuum trigger more frequently and work faster, especially for large tables.

postgresql.conf — Tuned Autovacuum

── Global tuning (postgresql.conf) ──────────────────

# More workers for concurrent vacuuming
autovacuum_max_workers        = 6      # default: 3

# Check tables more frequently
autovacuum_naptime            = '30s'  # default: 1min

# Trigger vacuum when 5% rows are dead (not 20%)
autovacuum_vacuum_scale_factor = 0.05  # default: 0.2

# Allow autovacuum to work faster
autovacuum_vacuum_cost_limit  = 1000  # default: 200
autovacuum_vacuum_cost_delay  = '2ms' # default: 20ms

── Per-table override for very large/hot tables ──────
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_threshold    = 1000
);

For a 100M-row table, the default 20% scale factor waits for 20M dead tuples — far too late.
Use per-table overrides via ALTER TABLE ... SET for your hottest tables.
Monitor with SELECT * FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;

07Q

What is autovacuum cost-based delay and why does it matter?

Autovacuum +

Cost-based delay throttles autovacuum's I/O by making it sleep when it exceeds a cost threshold, preventing it from overwhelming the disk subsystem on production systems.

postgresql.conf

# Cost assigned per vacuum I/O operation
vacuum_cost_page_hit    = 1    # buffer cache hit
vacuum_cost_page_miss   = 10   # disk read
vacuum_cost_page_dirty  = 20   # dirty page write

# When accumulated cost hits limit, sleep for delay duration
autovacuum_vacuum_cost_limit  = 200   # default
autovacuum_vacuum_cost_delay  = '20ms' # default

# For NVMe/fast storage — let autovacuum run at full speed
autovacuum_vacuum_cost_limit  = 2000
autovacuum_vacuum_cost_delay  = '0ms'

High cost_delay with low cost_limit = slow autovacuum = bloat accumulation.
On fast NVMe storage, set cost_delay = 0 and raise cost_limit significantly.
On shared/spinning disk environments, keep cost_delay higher to avoid I/O saturation.

08Q

How do you find which sessions are blocking autovacuum?

Autovacuum +

Long-running transactions and idle-in-transaction sessions prevent autovacuum from cleaning dead tuples because it cannot advance the oldest XID past the blocking transaction.

SQL

-- Find sessions blocking autovacuum (idle-in-transaction)
SELECT
    pid,
    usename,
    state,
    wait_event_type,
    wait_event,
    now() - xact_start AS xact_duration,
    left(query, 80) AS query
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
ORDER BY xact_start;

-- Find oldest running transaction (the XID bottleneck)
SELECT pid, usename, state,
    now() - xact_start AS age,
    backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY backend_xmin;

-- Terminate the blocker
SELECT pg_terminate_backend(pid);

⚠ Best Practice

Set idle_in_transaction_session_timeout = '10min' in postgresql.conf to automatically terminate sessions that hold open transactions too long.

09Q

What is pg_repack and when should you use it instead of VACUUM FULL?

Maintenance +

pg_repack is a PostgreSQL extension that removes bloat from tables and indexes online — without holding an exclusive lock for the full duration. It is the production-safe alternative to VACUUM FULL.

Feature	VACUUM FULL	pg_repack
Table Lock	Exclusive (full duration)	Brief lock at start/end only
Production Safe	❌ Blocks all reads/writes	✅ Online operation
Reclaims Disk	✅ Yes	✅ Yes
Rebuilds Indexes	✅ Yes	✅ Yes
Requires Extension	No	Yes (pg_repack)

Shell

# Install pg_repack
yum install pg_repack17

# Repack a specific table online
pg_repack -d mydb -t orders --no-superuser-check

# Repack all tables in a database
pg_repack -d mydb

Logical Replication

Publish-subscribe model, selective replication, cross-version

Q10 – Q13

10Q

What is logical replication and how does it differ from streaming replication?

Replication +

Feature	Streaming Replication	Logical Replication
Level	Physical (block/byte level)	Logical (row level)
Selective Tables	❌ Entire cluster	✅ Per table
Cross-Version	❌ Same major version	✅ Different versions
Cross-Platform	❌ Same OS/arch	✅ Flexible
DDL Replication	✅ Automatic	❌ Manual
wal_level needed	replica	logical
Standby writable	❌ Read-only	✅ Yes

💡

Oracle DBA Note

Logical replication ≈ Oracle GoldenGate (selective, row-level, cross-version). Streaming replication ≈ Oracle Data Guard Physical Standby.

11Q

How do you set up logical replication between two PostgreSQL databases?

Replication +

SQL — Publisher (Source DB)

-- Step 1: Set wal_level = logical in postgresql.conf
-- wal_level = logical

-- Step 2: Create a replication user
CREATE ROLE repl_user WITH REPLICATION LOGIN PASSWORD 'secret';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO repl_user;

-- Step 3: Create publication for specific tables
CREATE PUBLICATION pub_orders
    FOR TABLE orders, order_items;

-- Or publish ALL tables
CREATE PUBLICATION pub_all FOR ALL TABLES;

SQL — Subscriber (Target DB)

-- Step 4: Tables must exist on subscriber first
CREATE TABLE orders ( ... );

-- Step 5: Create subscription
CREATE SUBSCRIPTION sub_orders
    CONNECTION 'host=192.168.1.51 port=5432 dbname=mydb user=repl_user password=secret'
    PUBLICATION pub_orders;

-- Step 6: Monitor subscription status
SELECT * FROM pg_stat_subscription;
SELECT * FROM pg_replication_slots;  -- on publisher

12Q

What is a replication slot and what are the risks of leaving them unused?

Critical +

A replication slot is a durable state tracker that ensures WAL files needed by a subscriber or standby are NOT deleted — even if the consumer disconnects — until those WAL files have been confirmed consumed.

SQL

-- List all replication slots and WAL retained
SELECT
    slot_name,
    active,
    restart_lsn,
    confirmed_flush_lsn,
    pg_size_pretty(pg_wal_lsn_diff(
        pg_current_wal_lsn(), restart_lsn)) AS wal_retained
FROM pg_replication_slots;

-- Drop an inactive/stale replication slot
SELECT pg_drop_replication_slot('stale_slot_name');

⚠ Production Risk

An inactive replication slot retains ALL WAL files since the slot's last confirmed LSN. This can fill up your pg_wal directory and crash the database. Always monitor slot WAL retention and drop stale slots. Set max_slot_wal_keep_size in PostgreSQL 13+ as a safety limit.

13Q

How do you use logical replication for a zero-downtime major version upgrade?

Replication +

Logical replication allows replication between different major PostgreSQL versions, making it ideal for near-zero downtime upgrades (e.g., PG 15 → PG 17).

Step 1 — Install new PostgreSQL 17 on a new server and initialize a cluster.
Step 2 — Dump schema only from old cluster: pg_dump --schema-only -d mydb | psql -d mydb_new
Step 3 — Set up logical replication: create publication on PG15, subscription on PG17.
Step 4 — Wait for initial data copy to complete and replication to catch up (monitor lag).
Step 5 — During maintenance window: stop application writes, let replication catch up fully, switchover application to PG17.
Step 6 — Drop subscription, sequences need to be reset manually.

💡

Oracle DBA Note

This approach is very similar to Oracle GoldenGate-based migration between versions. The key difference is PostgreSQL's native logical replication is free, while GoldenGate requires licensing.

Connection Pooling & Query Performance

PgBouncer, pg_stat_statements, EXPLAIN, indexes

Q14 – Q18

14Q

What is PgBouncer and what are its three pooling modes?

Pooling +

PgBouncer is a lightweight connection pooler for PostgreSQL. Since PostgreSQL spawns a new OS process per client connection (expensive), PgBouncer maintains a pool of server connections and multiplexes client connections onto them.

Mode	Server Connection Released	Best For
Session Mode	When client disconnects	Legacy apps, safe default
Transaction Mode	After each transaction ends	Most web apps — best efficiency
Statement Mode	After each SQL statement	Rarely used — breaks multi-statement logic

pgbouncer.ini — Key Config

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port         = 6432
listen_addr         = *
auth_type           = scram-sha-256
pool_mode           = transaction      # recommended
max_client_conn     = 1000
default_pool_size   = 25
server_idle_timeout = 600

⚠ Transaction Mode Limitation

Transaction pooling mode breaks features that depend on session-level state: SET LOCAL, prepared statements (without protocol-level support), advisory locks, and LISTEN/NOTIFY.

15Q

How do you use pg_stat_statements to find the top slow queries?

Performance +

pg_stat_statements is a PostgreSQL extension that tracks execution statistics for every SQL statement. It is the equivalent of Oracle's V$SQL — your first stop for query performance analysis.

SQL

-- Enable in postgresql.conf (requires restart)
-- shared_preload_libraries = 'pg_stat_statements'

-- Enable the extension
CREATE EXTENSION pg_stat_statements;

-- Top 10 queries by total execution time
SELECT
    round(total_exec_time::numeric, 2) AS total_ms,
    calls,
    round(mean_exec_time::numeric, 2)  AS avg_ms,
    round(stddev_exec_time::numeric, 2) AS stddev_ms,
    rows,
    left(query, 80) AS query_snippet
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

💡

Oracle DBA Note

pg_stat_statements ≈ Oracle V$SQL + V$SQLSTATS. total_exec_time ≈ ELAPSED_TIME. calls ≈ EXECUTIONS. mean_exec_time ≈ ELAPSED_TIME/EXECUTIONS.

16Q

What is a partial index and a covering index in PostgreSQL?

Performance +

Partial Index — An index built on a subset of rows defined by a WHERE clause. Much smaller than a full index, making it faster and cheaper to maintain.

SQL

-- Partial index: only index active orders (not all 50M rows)
CREATE INDEX idx_active_orders
    ON orders(customer_id)
    WHERE status = 'ACTIVE';

-- Covering index: include extra columns to avoid table fetch
-- (Index-Only Scan becomes possible)
CREATE INDEX idx_orders_covering
    ON orders(customer_id)
    INCLUDE (order_date, amount);

-- Query that can use covering index (Index-Only Scan)
SELECT customer_id, order_date, amount
FROM orders
WHERE customer_id = 12345;

Covering indexes using INCLUDE allow Index-Only Scans — no table heap access needed, dramatically faster.
Partial indexes shine on tables with a small percentage of frequently-queried rows.

17Q

What causes the PostgreSQL planner to choose a Seq Scan over an Index Scan?

Performance +

Stale statistics — The planner estimates too many or too few rows. Fix: ANALYZE tablename;
Low selectivity — Query returns a large % of rows (e.g., >10-20%). Seq Scan is cheaper than many random I/Os.
Small table — If the entire table fits in a few pages, a Seq Scan is faster than index lookup overhead.
No index on the column — Index simply doesn't exist.
Data type mismatch / implicit cast — WHERE id = '123' when id is INTEGER prevents index use.
Function on indexed column — WHERE LOWER(email) = 'x' bypasses index unless a functional index exists.
Low effective_cache_size — Planner underestimates OS cache, making index scans seem more expensive.

SQL — Diagnose

-- Force index scan for testing only
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 99;
SET enable_seqscan = on; -- always reset after testing

-- Update statistics
ANALYZE orders;

18Q

What is the difference between EXPLAIN and EXPLAIN ANALYZE BUFFERS?

Performance +

SQL

-- EXPLAIN: plan only, no execution, no actual timings
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- EXPLAIN ANALYZE: executes the query + actual timings
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- EXPLAIN (ANALYZE, BUFFERS): shows buffer hits/misses
-- Shared hit = from shared_buffers (fast)
-- Shared read = from disk (slow)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42;

-- EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL, FORMAT JSON)
-- Full diagnostic output for pgBadger / explain.dalibo.com
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 42;

Shared hit — Pages served from shared_buffers (no I/O). Good.
Shared read — Pages read from disk. High value means cache miss — consider increasing shared_buffers.
Use explain.dalibo.com to visualise the JSON output as a graphical plan.

High Availability & Patroni

Patroni, repmgr, failover automation, etcd

Q19 – Q22

19Q

What is Patroni and how does it provide automatic failover for PostgreSQL?

HA +

Patroni is an open-source HA solution that uses a distributed configuration store (etcd, ZooKeeper, or Consul) to manage leader election and automatic failover for PostgreSQL clusters.

Patroni runs as a daemon on every node and monitors the local PostgreSQL instance.
The current primary holds a leader lock in etcd (with TTL). If the primary fails, the lock expires.
Replicas race to acquire the leader lock. The winner promotes itself to primary.
HAProxy or Keepalived (VIP) routes application traffic to the current primary automatically.
Full cluster configuration is stored in etcd — making it consistent across all nodes.

Shell — Patroni Stack Components

# Typical Patroni HA stack
┌─────────────────────────────────────────────┐
│  App → HAProxy (VIP via Keepalived)          │
│         ↓               ↓                   │
│  PostgreSQL Primary   PostgreSQL Replica     │
│  + Patroni            + Patroni             │
│         ↓               ↓                   │
│         └── etcd cluster (3 nodes) ─────────┘
└─────────────────────────────────────────────┘

# Check cluster status
patronictl -c /etc/patroni/patroni.yml list

# Manual switchover
patronictl -c /etc/patroni/patroni.yml switchover

💡

Oracle DBA Note

Patroni ≈ Oracle RAC / Data Guard Fast-Start Failover. The etcd cluster plays the role of Oracle Clusterware / Grid Infrastructure for distributed coordination.

20Q

What is the difference between Patroni and repmgr?

HA +

Feature	Patroni	repmgr
Auto Failover	✅ Native, consensus-based	⚠ Requires repmgrd + extra setup
Split-brain Protection	✅ etcd/ZK consensus prevents it	⚠ Risk without fencing
Complexity	Higher (needs etcd cluster)	Lower (simpler setup)
DCS Dependency	etcd / ZooKeeper / Consul	PostgreSQL itself
Cluster Management	patronictl (full REST API)	repmgr CLI
Industry Adoption	Widely used in enterprise	Common for simpler setups

For production enterprise environments, Patroni is the recommended choice due to its reliable consensus-based failover and split-brain prevention.

21Q

What is split-brain in PostgreSQL HA and how does Patroni prevent it?

Critical +

Split-brain occurs when two PostgreSQL nodes both believe they are the primary and accept writes simultaneously — leading to data divergence and corruption.

Patroni prevents split-brain using a distributed lock in etcd. Only the node holding the leader key can act as primary.
The leader key has a TTL (Time-To-Live). The primary must renew it continuously. If the primary cannot reach etcd, it loses the key and demotes itself.
Replicas will only promote if they successfully acquire the leader key — guaranteed by etcd's atomic compare-and-swap.
Network partitions where a primary cannot reach etcd will cause self-demotion, never split-brain.

⚠ Without Patroni

Manual or repmgr-based HA without distributed consensus is vulnerable to split-brain if network partitions occur. This is why a production HA stack MUST include a DCS (distributed consensus store) like etcd.

22Q

How do you monitor replication lag and what are acceptable thresholds?

HA +

SQL — Comprehensive Lag Monitoring

-- On PRIMARY: lag per standby (bytes and estimated time)
SELECT
    application_name,
    client_addr,
    state,
    sync_state,
    pg_size_pretty(sent_lsn - replay_lsn)  AS replay_lag_bytes,
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication;

-- On STANDBY: how far behind in seconds
SELECT
    now() - pg_last_xact_replay_timestamp() AS replication_delay,
    pg_is_in_recovery()                     AS is_standby,
    pg_last_wal_receive_lsn()               AS received_lsn,
    pg_last_wal_replay_lsn()                AS replayed_lsn;

< 1 second — Excellent. Standard for healthy async replication on LAN.
1–30 seconds — Acceptable for most OLTP workloads.
> 60 seconds — Alert: investigate I/O bottleneck on standby, network issues, or standby under load.
High lag + network failure = data loss window. Size your recovery point objective (RPO) accordingly.

Extensions & Real-World Administration

pg_cron, pgAudit, pg_upgrade, common troubleshooting

Q23 – Q25

23Q

How do you upgrade PostgreSQL major versions using pg_upgrade?

Upgrade +

pg_upgrade upgrades a PostgreSQL cluster to a new major version in-place, by copying only the catalog files and hard-linking or copying the data files — much faster than a dump/restore for large databases.

Shell — pg_upgrade from PG15 to PG17

# Step 1: Install new PostgreSQL 17 binaries (don't init cluster)
yum install postgresql17-server

# Step 2: Initialize new cluster (empty)
/usr/pgsql-17/bin/initdb -D /pgdata/pgsql17/data

# Step 3: Stop old cluster
systemctl stop postgresql-15

# Step 4: Run pg_upgrade check first (dry run)
/usr/pgsql-17/bin/pg_upgrade \
    -b /usr/pgsql-15/bin \
    -B /usr/pgsql-17/bin \
    -d /pgdata/pgsql15/data \
    -D /pgdata/pgsql17/data \
    --check

# Step 5: Run actual upgrade (--link = hard-link, no copy)
/usr/pgsql-17/bin/pg_upgrade \
    -b /usr/pgsql-15/bin \
    -B /usr/pgsql-17/bin \
    -d /pgdata/pgsql15/data \
    -D /pgdata/pgsql17/data \
    --link

# Step 6: Start new cluster and run post-upgrade tasks
systemctl start postgresql-17
./analyze_new_cluster.sh  # generated by pg_upgrade

💡

Oracle DBA Note

pg_upgrade ≈ Oracle DBUA (Database Upgrade Assistant). The --link option makes it extremely fast (seconds for TB databases) because it uses OS hard-links rather than copying data files.

24Q

What is pg_cron and how do you schedule jobs in PostgreSQL?

Extensions +

pg_cron is a PostgreSQL extension that enables scheduling of SQL queries or stored procedures using cron syntax — directly inside the database, without needing external OS cron jobs.

SQL

-- Enable in postgresql.conf (requires restart)
-- shared_preload_libraries = 'pg_cron'
-- cron.database_name = 'mydb'

CREATE EXTENSION pg_cron;

-- Schedule a job: delete old logs every day at 3 AM
SELECT cron.schedule(
    'cleanup-old-logs',
    '0 3 * * *',
    $$DELETE FROM app_logs WHERE created_at < NOW() - INTERVAL '90 days'$$
);

-- Schedule a job every 5 minutes
SELECT cron.schedule('refresh-stats', '*/5 * * * *',
    $$CALL refresh_materialized_views()$$);

-- List all scheduled jobs
SELECT * FROM cron.job;

-- View execution history
SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;

💡

Oracle DBA Note

pg_cron ≈ Oracle DBMS_SCHEDULER. Both allow scheduling SQL/PL jobs directly in the database. pg_cron uses familiar Unix cron syntax.

25Q

How do you troubleshoot a PostgreSQL database that is running out of connections?

Troubleshooting +

A FATAL: remaining connection slots are reserved for non-replication superuser connections error means max_connections is exhausted. Here is a systematic approach to diagnose and resolve it:

SQL — Diagnose Connection Exhaustion

-- Step 1: Check current connection usage
SELECT
    max_conn,
    used_conn,
    superuser_reserved,
    max_conn - used_conn AS free_conn
FROM (
    SELECT
        current_setting('max_connections')::int AS max_conn,
        COUNT(*) AS used_conn,
        current_setting('superuser_reserved_connections')::int AS superuser_reserved
    FROM pg_stat_activity
) t;

-- Step 2: Find connection counts per user/database
SELECT usename, datname, state, COUNT(*)
FROM pg_stat_activity
GROUP BY usename, datname, state
ORDER BY COUNT(*) DESC;

-- Step 3: Find idle/idle-in-transaction sessions to terminate
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND now() - xact_start > interval '30 minutes';

Immediate fix — Terminate idle/stale sessions as shown above.
Short-term fix — Temporarily raise max_connections (requires restart). Note: each connection uses ~5–10MB RAM.
Permanent fix — Deploy PgBouncer in transaction pooling mode to dramatically reduce the number of actual PostgreSQL connections needed.
Set idle_in_transaction_session_timeout to auto-kill stuck transactions.
Set CONNECTION LIMIT per role to prevent any single user from exhausting connections.

BRIGHT DBA

ORACLE DBA – Tips and Techniques

PostgreSQL Intermediate DBA Interview Questions

PostgreSQL DBA
Intermediate Interview
Questions & Answers

PostgreSQL DBAIntermediate InterviewQuestions & Answers

PostgreSQL DBA
Intermediate Interview
Questions & Answers