PostgreSQL Tablespace Internals

PostgreSQL Tablespace Internals
RAJASEKHAR AMUDALA
Storage Engine

PostgreSQL Tablespace
Internals

How tables become physical files — from logical abstraction to OS-level storage blocks.

PostgreSQL 14 – 17
Physical Storage Model
vs Oracle Comparison
01

What is a PostgreSQL Tablespace?

A tablespace is a named mapping from a logical label to a directory path on the operating system filesystem. It tells PostgreSQL where to physically store the files that back a database object — tables, indexes, sequences, or entire databases.

Unlike Oracle, where a tablespace is a container for datafiles which are themselves managed by the RDBMS, PostgreSQL tablespaces are essentially symbolic links to directories. The OS filesystem is the real container; PostgreSQL simply points at it.

Logical Layer (SQL / Catalog)
DATABASE
mydb
TABLE
orders
INDEX
orders_pkey
↓ OID resolution via pg_class / pg_tablespace
Tablespace (symlink)
pg_default
$PGDATA/base/
fast_ssd
/mnt/nvme/pg/
↓ Actual filesystem directory
Physical Layer (OS Files)
16384
heap file
16384_fsm
FSM fork
16384_vm
VM fork
16384.1
segment 2
Key Insight
PostgreSQL does NOT have its own block-device layer. There are no "datafiles" to manage. Each relation (table, index, etc.) maps directly to one or more ordinary OS files. This makes PostgreSQL storage maximally transparent and debuggable — you can inspect it with ls, du, or hexdump.
02

Physical Layout on Disk

The root of everything is $PGDATA — the data directory specified when initdb is run. Inside that directory the following structure exists:

$PGDATA/ ├── base/ ← pg_default tablespace root (all databases land here by default) │ ├── 1/ ← OID of template1 database │ ├── 4/ ← OID of template0 database │ └── 16384/ ← OID of your database "mydb" │ ├── PG_VERSION ← major version string, e.g. "17" │ ├── pg_filenode.map ← maps fixed-OID catalogs to relfilenodes │ ├── pg_internal.init ← cached relation descriptor (relcache init file) │ │ │ ├── ── system catalog tables (per-database) ────────────────── │ ├── 1259 ← pg_class heap (relation metadata) │ ├── 1259_fsm ← pg_class FSM fork │ ├── 1259_vm ← pg_class VM fork │ ├── 1249 ← pg_attribute heap (column metadata) │ ├── 1255 ← pg_proc heap (functions) │ ├── 2619 ← pg_statistic heap (planner statistics) │ │ │ ├── ── user tables ──────────────────────────────────────── │ ├── 16385 ← table "orders" main heap (segment 0, up to 1 GB) │ ├── 16385_fsm ← orders Free Space Map fork │ ├── 16385_vm ← orders Visibility Map fork │ ├── 16385.1 ← orders segment 1 (1 GB – 2 GB, auto-created) │ ├── 16385.2 ← orders segment 2 (2 GB – 3 GB, auto-created) │ │ │ ├── ── indexes ──────────────────────────────────────────── │ ├── 16390 ← index "orders_pkey" (segment 0, up to 1 GB) │ ├── 16390.1 ← orders_pkey segment 1 (if index grows past 1 GB) │ │ │ └── ── TOAST table (for large column values) ────────────── │ ├── 16392 ← pg_toast_16385 heap (out-of-line values) │ └── 16393 ← pg_toast_16385_index │ ├── global/ ← pg_global tablespace (shared cluster-wide catalogs) │ ├── pg_filenode.map ← OID → relfilenode map for global catalogs │ ├── 1260 ← pg_authid heap (roles / users) │ ├── 1260_fsm ← pg_authid FSM fork │ ├── 1261 ← pg_auth_members heap │ ├── 1262 ← pg_database heap (list of databases) │ ├── 2396 ← pg_shdepend heap (cross-DB dependencies) │ └── 2397 ← pg_shdescription heap (shared obj comments) │ ├── pg_tblspc/ ← symlinks to custom tablespace directories │ └── 24601 -> /mnt/nvme/pg/ ← OID 24601 = tablespace "fast_ssd" │ ├── pg_wal/ ← Write-Ahead Log segments (16 MB each by default) │ ├── 000000010000000000000001 ← WAL segment file (timeline 1, segment 1) │ └── archive_status/ ← tracks which WAL files have been archived │ ├── pg_xact/ ← transaction commit status bitmaps (CLOG) ├── pg_multixact/ ← multi-transaction lock state ├── pg_subtrans/ ← subtransaction parent XID log ├── pg_stat_tmp/ ← temporary stats collector files ├── pg_logical/ ← logical replication state & snapshots ├── pg_replslot/ ← replication slot state directories ├── pg_notify/ ← LISTEN/NOTIFY queue ├── PG_VERSION ← cluster major version ├── pg_hba.conf ← host-based authentication rules ├── pg_ident.conf ← OS username → PG role mapping └── postgresql.conf ← all GUC configuration parameters

Default Tablespace (pg_default) — base/ in Detail

The base/ directory is the physical backing of the pg_default tablespace. Every database gets its own numbered subdirectory named after its OID. Inside that directory live both system catalog files and all user-created tables and indexes that were not assigned to a custom tablespace.

How to find any database's OID
SELECT oid, datname FROM pg_database; — the OID is the directory name under $PGDATA/base/.

Custom Tablespace Directory Structure

When you create a custom tablespace at /mnt/nvme/pg/, PostgreSQL creates a versioned subdirectory inside it, then a database-OID subdirectory inside that:

/mnt/nvme/pg/ ← OS directory you provided in CREATE TABLESPACE └── PG_17_202307071/ ← PG_<major>_<catalog_version> — version isolation dir └── 16384/ ← OID of the database whose objects use this tablespace ├── 16400 ← table "orders" segment 0 (up to 1 GB) ├── 16400_fsm ← orders Free Space Map fork ├── 16400_vm ← orders Visibility Map fork ├── 16400.1 ← orders segment 1 (1 GB – 2 GB) ├── 16400.2 ← orders segment 2 (2 GB – 3 GB) ├── 16401 ← index "orders_pkey" segment 0 └── 16401.1 ← orders_pkey segment 1 (if index > 1 GB)
Version Isolation
The PG_17_202307071 subdirectory prevents tablespaces from being accidentally shared across incompatible PostgreSQL major versions. If you upgrade, the new version creates its own subdirectory inside the same OS path.

What Happens When a Table or Index Exceeds 1 GB?

PostgreSQL never makes a single OS file larger than 1 GB (= 8 192 bytes × 131 072 pages). When a relation file reaches that limit, a new segment file is automatically created with a numeric suffix appended to the relfilenode. All segments are the same relfilenode — only the suffix changes.

── Example: table "orders" growing from 0 → 3.5 GB ─────────────────────STEP 1 — table created, first rows inserted (size 0 → 1 GB): 16400 ← segment 0 │ pages 0 – 131071 │ size: 0 → 1 073 741 824 B 16400_fsm ← FSM fork │ created on 1st VACUUM 16400_vm ← VM fork │ created on 1st VACUUMSTEP 2 — segment 0 hits 1 GB; PG auto-creates segment 1 (size 1 GB → 2 GB): 16400 ← segment 0 │ pages 0 – 131071 │ exactly 1 073 741 824 B (full) 16400.1 ← segment 1 │ pages 131072 – 262143 │ size: 0 → 1 073 741 824 BSTEP 3 — segment 1 hits 1 GB; PG auto-creates segment 2 (size 2 GB → 3.5 GB): 16400 ← segment 0 │ pages 0 – 131071 │ 1 073 741 824 B (full) 16400.1 ← segment 1 │ pages 131072 – 262143 │ 1 073 741 824 B (full) 16400.2 ← segment 2 │ pages 262144 – 393215 │ 0 → 536 870 912 B (growing)
── Same rule applies to INDEXES ─────────────────────────────────────────Example: "orders_pkey" B-tree index growing past 1 GB:16401 ← index segment 0 │ B-tree root + internal nodes + leaf pages │ 0 → 1 GB 16401.1 ← index segment 1 │ continuation of B-tree leaf pages │ 1 GB → 2 GB 16401.2 ← index segment 2 │ continuation │ 2 GB → NNote: indexes do NOT have _fsm or _vm forks. Only the main fork segments.
── Full picture: table + index both over 1 GB, inside pg_default ────────$PGDATA/base/16384/ │ ├── ── table: orders (relfilenode 16400) — 2.5 GB total ── ├── 16400 ← segment 0 │ 0 – 1 GB │ 1 073 741 824 B ├── 16400.1 ← segment 1 │ 1 – 2 GB │ 1 073 741 824 B ├── 16400.2 ← segment 2 │ 2 – 2.5 GB │ 536 870 912 B (partial) ├── 16400_fsm ← FSM fork │ free-space map (always one file, no segments) ├── 16400_vm ← VM fork │ visibility map (always one file, no segments) │ ├── ── index: orders_pkey (relfilenode 16401) — 1.3 GB total ─ ├── 16401 ← segment 0 │ 0 – 1 GB │ 1 073 741 824 B ├── 16401.1 ← segment 1 │ 1 – 1.3 GB │ 322 122 547 B (partial) │ └── ── TOAST (out-of-line storage for large column values) ─── ├── 16402 ← TOAST heap │ stores values > ~2 KB (chunked & optionally compressed) └── 16403 ← TOAST index │ B-tree index on (chunk_id, chunk_seq)
Key Rules to Remember
1. Every segment is exactly 1 GB (1 073 741 824 bytes = 8192 B × 131 072 pages) except the last one which is partial.
2. Segment numbering starts at 0 (no suffix), then .1, .2, .3 … — there is no upper limit.
3. FSM and VM forks are never segmented — they always stay as a single file regardless of table size.
4. Indexes only have a main fork (segmented if > 1 GB) — no FSM or VM forks.
5. Check all files for a relation: ls -lh $PGDATA/$(psql -Atc "SELECT pg_relation_filepath('orders')")*
03

How a Table Gets Stored as a Physical File

Every relation has an OID (Object Identifier) in the system catalog. The actual filename on disk is the relation's relfilenode — initially identical to its OID, but may diverge after TRUNCATE, CLUSTER, or VACUUM FULL (all of which rewrite the file).

Step-by-Step: CREATE TABLE → physical file

-- 1. Create a table in a specific tablespace
CREATE TABLE orders (
    id     BIGSERIAL PRIMARY KEY,
    amount NUMERIC(12,2),
    status TEXT
) TABLESPACE fast_ssd;

-- 2. Find its OID and relfilenode
SELECT oid, relname, relfilenode, reltablespace
FROM   pg_class
WHERE  relname = 'orders';

--  oid  | relname | relfilenode | reltablespace
-- ------+---------+-------------+---------------
-- 16400 | orders  |       16400 |         24601

-- 3. Compute the actual file path
SELECT pg_relation_filepath('orders');
-- pg_tblspc/24601/PG_17_202307071/16384/16400

PostgreSQL resolves the path as:

Tablespace path
pg_tblspc/<spc_oid> → symlink → /mnt/nvme/pg/
Version dir
PG_17_202307071/
Database dir
<db_oid>/ — one per database using this tablespace
Relation file
<relfilenode> — the actual heap or index file
Useful Function
pg_relation_filepath(relname) gives you the complete relative path from $PGDATA. For the absolute path, prepend current_setting('data_directory').

What Happens on TRUNCATE or VACUUM FULL?

-- Before TRUNCATE: relfilenode = 16400
TRUNCATE orders;

-- After TRUNCATE: relfilenode = 16450 (new file, old one deleted)
SELECT relfilenode FROM pg_class WHERE relname = 'orders';
-- 16450

The old file 16400 is unlinked from the OS. A brand-new file 16450 is created. The OID stays 16400 but the physical filename changes. This is why relfilenode and oid can diverge.

04

Page / Block Structure Inside the File

Each relation file is divided into fixed-size pages (default 8 KB, set at compile time). A page is the unit of I/O — the smallest amount PostgreSQL reads from or writes to disk.

ONE PAGE = 8192 bytes
PageHeaderData (24 bytes) — LSN, flags, free space pointers
ItemId array (4 bytes × n) — offsets + lengths of tuples
· · · FREE SPACE · · ·
Tuple N (HeapTupleHeader + data)
Tuple 2 (HeapTupleHeader + data)
Tuple 1 (HeapTupleHeader + data) ← grows from bottom up
Special area (indexes only — e.g. B-tree sibling pointers)

Tuples grow upward from the bottom of the page; the ItemId array grows downward from just after the header. The free space in the middle is what INSERT consumes. The Free Space Map (FSM) fork tracks which pages have enough free space to avoid scanning the heap during inserts.

-- Inspect a page directly (requires pageinspect extension)
CREATE EXTENSION pageinspect;

SELECT lower, upper, special, pagesize
FROM   page_header(get_raw_page('orders', 0));

-- lower = end of ItemId array (bytes from start)
-- upper = start of tuple data (bytes from start)
-- free  = upper - lower

SELECT t_ctid, t_xmin, t_xmax, t_infomask, t_data
FROM   heap_page_items(get_raw_page('orders', 0))
LIMIT  5;
05

System Catalog Wiring

The catalog tables that track tablespace storage are:

CatalogKey Column(s)Role
pg_tablespaceoid, spcname, spcowner, spcaclLists all tablespaces; OID is used in filenames
pg_classoid, relfilenode, reltablespace, relkindMaps each relation to its file and tablespace
pg_databaseoid, dattablespaceDefault tablespace for a database
pg_namespaceoid, nspnameSchema — purely logical, no storage impact
-- All tablespaces and their OS paths
SELECT spcname, pg_tablespace_location(oid) AS location
FROM   pg_tablespace;

-- spcname    | location
-- -----------+---------------
-- pg_default |
-- pg_global  |
-- fast_ssd   | /mnt/nvme/pg/

-- All relations in a given tablespace
SELECT n.nspname, c.relname, c.relkind, c.relfilenode
FROM   pg_class c
JOIN   pg_namespace n ON n.oid = c.relnamespace
JOIN   pg_tablespace t ON t.oid = c.reltablespace
WHERE  t.spcname = 'fast_ssd'
ORDER BY c.relname;
Schemas vs Tablespaces
PostgreSQL schemas (namespaces) are purely logical — they are naming containers with no physical meaning. Two tables in different schemas in the same tablespace land in the same OS directory, distinguished only by their relfilenode numbers. This is the opposite of how Oracle users often think of schemas.
06

Segment Files & Fork Files

Every relation is stored across potentially multiple files due to two orthogonal splitting mechanisms: segments and forks.

Segments (Size-Based Splitting)

A single relation file grows until it hits the segment size limit (default 1 GB, defined at compile time as BLCKSZ × RELSEG_SIZE). At that point, PostgreSQL creates a new segment file:

16385 ← segment 0 (0 GB – 1 GB) 16385.1 ← segment 1 (1 GB – 2 GB) 16385.2 ← segment 2 (2 GB – 3 GB)

This segmentation ensures compatibility with filesystems that have file size limits, and also makes backup tools simpler.

Forks (Logical Data Splitting)

Each relation actually has up to four forks — separate files for different types of storage data:

16385 ← Main fork : actual tuple / index data 16385_fsm ← FSM fork : tracks available space per page 16385_vm ← VM fork : tracks all-visible and all-frozen pages 16385_init ← Init fork : unlogged tables only — recovery baseline
ForkSuffixPurposeCreated By
Main(none)Heap tuples or index entriesCREATE TABLE / INDEX
FSM_fsmFree space per page for INSERT routingFirst VACUUM
VM_vmAll-visible bits (skip dead tuple checks); freeze statusFirst VACUUM
Init_initEmpty baseline for crash recovery of UNLOGGED tablesUNLOGGED tables only
-- See all physical files for a relation
SELECT pg_relation_filepath('orders');             -- main
SELECT pg_relation_filepath('orders') || '_fsm';  -- fsm
SELECT pg_relation_filepath('orders') || '_vm';   -- vm

-- Or check from the shell
-- $ ls -lh $PGDATA/base/16384/16385*
-- -rw------- 1 postgres postgres  8.0M  orders main segment 0
-- -rw------- 1 postgres postgres  24K   orders FSM
-- -rw------- 1 postgres postgres   8K   orders VM
07

pg_global and pg_default

PostgreSQL ships with two built-in tablespaces that cannot be dropped:

pg_global
Maps to $PGDATA/global/. Stores cluster-wide system catalogs: pg_database, pg_authid, pg_tablespace, pg_shdepend, etc. Shared across all databases. You cannot place user objects here.
pg_default
Maps to $PGDATA/base/. The default tablespace for all objects unless overridden. Each database has its own subdirectory here named after its OID. Per-database system catalogs live here.
-- These have no filesystem path (returned as empty string)
SELECT spcname, pg_tablespace_location(oid)
FROM   pg_tablespace
WHERE  spcname IN ('pg_default', 'pg_global');

-- Change the default tablespace for a DATABASE
ALTER DATABASE mydb SET TABLESPACE fast_ssd;

-- Change the default for new objects in the SESSION
SET default_tablespace = fast_ssd;
08

PostgreSQL vs Oracle: Storage Model Differences

This is the conceptual crux that trips up DBAs moving between the two systems. They use the same word — "tablespace" — but mean fundamentally different things.

DimensionPostgreSQLOracle
Physical backingSymlink to an OS directory. Files are plain OS files.One or more datafiles (.dbf) — Oracle-managed block devices.
Table → File mapping1 table = 1 OS file (its relfilenode). Segments are separate files too.Many tables share one datafile. Oracle maps extents within that file.
Extent managementNone — PostgreSQL does not manage extents. The OS handles block allocation.Yes — LMT (Locally Managed Tablespaces) or DMT with bitmaps/data dictionary.
Multiple pathsOne directory per tablespace. Cannot span multiple paths.Multiple datafiles in different paths for one tablespace (striping).
Schema = Storage?No. Schemas are purely logical namespaces. No physical meaning.Schema ≈ user ≈ set of segments. Strongly coupled in older designs.
Resize / extendThe OS directory grows automatically. No DBA action needed.ALTER TABLESPACE ADD DATAFILE or AUTOEXTEND ON.
UNDO / TEMP tablespacePostgreSQL uses pg_wal for WAL and temp_tablespace for temp files. No UNDO tablespace concept — MVCC stores old versions in the heap (dead tuples).Dedicated UNDO tablespace for rollback segments. Separate TEMP tablespace.
Inspect on diskls, hexdump, pg_filedump. Fully transparent.Proprietary format. Must use RMAN, dbv, or Oracle tools.
BIGFILE tablespaceN/ASingle very large datafile (up to 128 TB) to reduce file count.
Block sizeDefault 8 KB, set at compile time, uniform cluster-wide.2–32 KB, configurable per tablespace.
Online tablespace opsALTER TABLE ... SET TABLESPACE rewrites the file. Requires ACCESS EXCLUSIVE lock.Can take tablespaces offline, rename datafiles online. More flexible.
Critical Conceptual Shift
In Oracle, a tablespace is a logical volume that aggregates one or more datafiles, and objects are allocated as extents within those files. In PostgreSQL, a tablespace is just a pointer to a directory. Each table IS a file. There is no intermediate "datafile" layer and no extent management — the OS filesystem is the only storage manager below PostgreSQL.
09

Managing Tablespaces

Create a Tablespace

-- Directory must exist, be owned by the OS postgres user, and be empty
-- $ mkdir -p /mnt/nvme/pg && chown postgres:postgres /mnt/nvme/pg

CREATE TABLESPACE fast_ssd
    OWNER postgres
    LOCATION '/mnt/nvme/pg';

Move Objects to a Different Tablespace

-- Move a single table (rewrites file; ACCESS EXCLUSIVE lock required)
ALTER TABLE orders SET TABLESPACE fast_ssd;

-- Move all tables in a database (run while connected to target db)
ALTER DATABASE mydb SET TABLESPACE fast_ssd;

-- Move all objects from one tablespace to another (superuser only)
ALTER TABLESPACE old_ssd RENAME TO archive_hdd;

Drop a Tablespace

-- Tablespace must be empty before dropping
DROP TABLESPACE old_ssd;

-- Check what's still in it
SELECT c.relname, n.nspname
FROM   pg_class c
JOIN   pg_namespace n ON n.oid = c.relnamespace
WHERE  c.reltablespace = (SELECT oid FROM pg_tablespace WHERE spcname = 'old_ssd');

Tablespace-Level Storage Parameters

-- Set random_page_cost hint per tablespace (optimizer advice)
ALTER TABLESPACE fast_ssd
    SET (random_page_cost = 1.1);   -- NVMe: nearly sequential

ALTER TABLESPACE archive_hdd
    SET (random_page_cost = 4.0);   -- spinning disk default

-- seq_page_cost can also be set per tablespace (PG 16+)
ALTER TABLESPACE fast_ssd
    SET (seq_page_cost = 0.1);
10

Performance Patterns & Best Practices

Separate Indexes from Tables

Place indexes on an NVMe tablespace and tables on a slower drive to maximize index scan bandwidth independently of sequential heap scans:

CREATE TABLE events (...) TABLESPACE hdd_bulk;
CREATE INDEX events_ts_idx ON events (created_at) TABLESPACE nvme_fast;

Partition Tables Across Tablespaces

CREATE TABLE logs_2024 PARTITION OF logs
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    TABLESPACE nvme_hot;

CREATE TABLE logs_2022 PARTITION OF logs
    FOR VALUES FROM ('2022-01-01') TO ('2023-01-01')
    TABLESPACE hdd_archive;

TEMP Tablespace

-- Point sort/hash spill temp files to a fast local SSD
SET temp_tablespaces = 'local_nvme';

-- Or set cluster-wide in postgresql.conf
-- temp_tablespaces = 'local_nvme'
Watch Out
If the directory backing a tablespace is deleted, lost, or unmounted, PostgreSQL will crash or error on any query touching objects in that tablespace. There is no automatic failover. Always ensure tablespace directories are on reliable, monitored mounts, and include them in your backup strategy.

Monitoring Tablespace Size

-- Size of each tablespace
SELECT
    spcname,
    pg_size_pretty(pg_tablespace_size(oid)) AS size,
    pg_tablespace_location(oid)             AS location
FROM pg_tablespace
ORDER BY pg_tablespace_size(oid) DESC;

-- Top 10 largest relations in a tablespace
SELECT
    n.nspname || '.' || c.relname AS relation,
    c.relkind,
    pg_size_pretty(pg_relation_size(c.oid)) AS size
FROM   pg_class c
JOIN   pg_namespace n   ON n.oid = c.relnamespace
JOIN   pg_tablespace t  ON t.oid = c.reltablespace
WHERE  t.spcname = 'fast_ssd'
ORDER BY pg_relation_size(c.oid) DESC
LIMIT  10;
Summary Cheatsheet
Tablespace = symlink to OS directory  ·  Table = relfilenode file  ·  Page = 8 KB block  ·  Segment = file per 1 GB  ·  Fork = _fsm / _vm / _init companions  ·  OID → path via pg_relation_filepath()