PostgreSQL Interview Questions
for Beginner DBAs
25 essential questions every beginner PostgreSQL DBA must know — with clear, practical answers. Ideal for Oracle DBAs transitioning to PostgreSQL.
PostgreSQL is a free, open-source object-relational database management system (ORDBMS) that has been actively developed for over 35 years. It is known for its reliability, standards compliance (SQL:2016), and powerful extensibility.
- Open Source & Free — No licensing cost, unlike Oracle which requires expensive licenses.
- ACID Compliant — Full support for Atomicity, Consistency, Isolation, and Durability.
- Extensible — You can define custom data types, operators, functions, and index methods.
- JSON Support — Supports both relational (SQL) and non-relational (JSONB) querying.
- Cross-Platform — Runs on Linux, Windows, and macOS.
PostgreSQL follows a client-server, multi-process architecture. The main components are:
- Postmaster Process — The master daemon that listens on port 5432, accepts connections, and spawns backend processes.
- Backend Process — A dedicated server process is spawned for each client connection to handle queries.
- Shared Buffers — Shared memory pool for caching data pages (like Oracle's SGA Buffer Cache).
- WAL Writer — Writes transaction logs (WAL) to disk for durability.
- Checkpointer — Flushes dirty pages from shared buffers to data files at regular intervals.
- Autovacuum Daemon — Cleans dead tuples and updates statistics automatically.
- Background Writer — Proactively writes dirty buffers to disk to reduce I/O spikes at checkpoints.
- WAL Archiver — Archives WAL files when continuous archiving is enabled.
- Stats Collector — Gathers activity statistics stored in
pg_stat_*views.
MVCC is the mechanism PostgreSQL uses to allow multiple transactions to read and write data concurrently without blocking each other. Instead of locking rows, PostgreSQL creates new versions of rows for each update or delete.
- When a row is updated, the old version is marked as dead and a new version is created.
- When a row is deleted, it is just marked as dead — not physically removed immediately.
- Each transaction sees a consistent snapshot of the data as of the transaction start time.
- Dead row versions are later cleaned up by the VACUUM process.
- This eliminates read-write conflicts — readers never block writers and writers never block readers.
WAL (Write-Ahead Logging) is PostgreSQL's mechanism for ensuring data durability and crash recovery. The core principle is: changes are written to the WAL log file first, before being applied to actual data files.
- WAL files are stored in the
$PGDATA/pg_waldirectory. - Each WAL file is 16 MB by default and has a name like
000000010000000000000001. - WAL is used for crash recovery, point-in-time recovery (PITR), and streaming replication.
- WAL level is configured via
wal_levelparameter:minimal,replica, orlogical.
In PostgreSQL, there is no difference between a role and a user at the database level. A USER is simply a ROLE with the LOGIN privilege.
- Roles can be granted to other roles, enabling role inheritance.
SUPERUSERroles have unrestricted access — equivalent to Oracle's SYSDBA.- Use
NOLOGINfor group roles that are never used to connect directly.
pg_hba.conf is the Host-Based Authentication configuration file in PostgreSQL. It controls which hosts are allowed to connect, which databases they can access, which users can connect, and what authentication method is used.
- local — Unix socket connections.
- host — TCP/IP connections (both SSL and non-SSL).
- hostssl — SSL-only connections.
- Authentication methods:
trust,md5,scram-sha-256,peer,ldap. - After editing, reload with:
SELECT pg_reload_conf();
A tablespace in PostgreSQL is a named location on disk where database objects (tables, indexes) are stored. It allows DBAs to control where data is physically stored — for example, placing large tables on a faster disk.
PostgreSQL supports two main types of backups:
- Logical Backup — Uses
pg_dumporpg_dumpall. Exports SQL statements. Human-readable, portable across versions. Cannot be used for PITR. Best for schema/table level recovery. - Physical Backup — Uses
pg_basebackup. Copies the actual data files. Can be combined with WAL archives for Point-in-Time Recovery (PITR). Used for full cluster recovery and standby server setup.
pg_dumpbacks up one database at a time. Supports custom (-Fc), plain (-Fp), directory (-Fd) formats.pg_dumpallbacks up all databases AND global objects (roles, tablespaces). Output is always plain SQL.- Neither supports PITR — for that, use
pg_basebackup+ WAL archiving.
pg_basebackup takes a physical binary backup of the entire PostgreSQL cluster while the database is running (online backup). It copies all data files, WAL files, and configuration files.
- Used for setting up standby (replica) servers for streaming replication.
- Used as the base for Point-in-Time Recovery (PITR) when combined with WAL archives.
- Requires
wal_level = replicaand replication privilege on the user.
PITR allows you to restore a PostgreSQL database to any specific point in time — for example, just before an accidental DROP TABLE. It requires a base backup + continuous WAL archives.
- Enable archiving: set
archive_mode = onandarchive_commandin postgresql.conf. - Take a base backup using
pg_basebackup. - During recovery, set
restore_commandandrecovery_target_timein postgresql.conf. - Create an empty
recovery.signalfile in PGDATA to trigger recovery mode. - After recovery, run
SELECT pg_wal_replay_resume();to promote the DB.
%pin archive_command = full path of the WAL file to be archived.%f= filename only.- The archive directory must be owned by the
postgresOS user. - Check
pg_stat_archiver.last_failed_walto diagnose archiving failures.
- Use
-Fc(custom format) for table-level restores — plain SQL format does not support selective restore. - If the table already exists, add
--cleanflag to drop it before restoring. - Use
-n schema_nameto specify schema if needed.
Due to MVCC, PostgreSQL does not immediately delete updated or deleted rows — it marks them as "dead tuples". Over time, these accumulate and cause table bloat, slowing down queries. VACUUM reclaims this dead space.
- Autovacuum runs VACUUM automatically in the background — it is enabled by default.
VACUUMdoes NOT lock the table.VACUUM FULLholds an exclusive lock.- VACUUM also prevents Transaction ID (XID) wraparound — a critical PostgreSQL risk.
Autovacuum is a background PostgreSQL daemon that automatically runs VACUUM and ANALYZE on tables when they accumulate enough dead tuples or changed rows, without DBA intervention.
- It is triggered when: dead tuples >
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × total_rows - Key parameters:
autovacuum_max_workers(default: 3),autovacuum_naptime(default: 1 min). - Never disable autovacuum on production — it prevents bloat and XID wraparound.
- B-Tree (default) — Best for equality and range queries. Used for most cases.
- Hash — Only for equality comparisons (
=). Faster than B-Tree for pure equality. - GIN (Generalized Inverted Index) — Best for full-text search, arrays, and JSONB data.
- GiST (Generalized Search Tree) — Used for geometric data, full-text search, and range types.
- BRIN (Block Range Index) — Very small index for large tables with naturally ordered data (e.g., timestamps).
- SP-GiST — Space-partitioned trees for non-balanced data structures.
- Seq Scan — Full table scan (no index used).
- Index Scan — Index used to fetch specific rows.
- cost= — Estimated startup cost .. total cost.
- actual time= — Actual execution time in milliseconds.
- rows= — Estimated vs actual row count. Large difference = stale statistics → run ANALYZE.
shared_buffers— Most impactful single parameter. Set to 25% of RAM.work_mem— Be careful: each sort/hash in each query can use this amount.effective_cache_size— Hint to the planner only; does not allocate memory.
Streaming replication allows a standby (replica) server to stay in sync with the primary by continuously receiving and applying WAL records over a TCP connection — in near real time.
- The standby connects to the primary using the replication protocol.
- By default it is asynchronous — primary does not wait for standby confirmation before committing.
- Synchronous replication is possible via
synchronous_standby_names— zero data loss but adds latency. - The standby can serve read-only queries (Hot Standby).
- Monitor replication lag via
pg_stat_replicationon primary.
- Asynchronous (default) — Primary commits immediately without waiting for standby confirmation. Better performance, but risk of small data loss if primary crashes before WAL is sent to standby.
- Synchronous — Primary waits for at least one standby to confirm WAL receipt before committing. Zero data loss, but adds latency to every write transaction.
- After promotion, the standby becomes a new primary on a new timeline.
- Old primary should NOT be brought back up without reinitializing as a new standby.
- Tools like Patroni or repmgr automate this process.
pg_stat_activity— Active sessions, queries, wait events. (≈ Oracle v$session)pg_stat_user_tables— Table-level stats: dead tuples, seq scans, last vacuum. (≈ Oracle dba_tab_modifications)pg_stat_user_indexes— Index usage statistics. Identify unused indexes.pg_stat_replication— Standby connection status and replication lag. (≈ Oracle v$managed_standby)pg_stat_archiver— WAL archiving status and failures.pg_stat_statements— Top SQL by execution count, total time. (≈ Oracle v$sql — requires extension)pg_locks— Active lock information. (≈ Oracle v$lock)pg_database— Database list with sizes. (≈ Oracle dba_data_files)