PostgreSQL Tablespace
Internals
How tables become physical files — from logical abstraction to OS-level storage blocks.
What is a PostgreSQL Tablespace?
A tablespace is a named mapping from a logical label to a directory path on the operating system filesystem. It tells PostgreSQL where to physically store the files that back a database object — tables, indexes, sequences, or entire databases.
Unlike Oracle, where a tablespace is a container for datafiles which are themselves managed by the RDBMS, PostgreSQL tablespaces are essentially symbolic links to directories. The OS filesystem is the real container; PostgreSQL simply points at it.
mydb
orders
orders_pkey
$PGDATA/base/
/mnt/nvme/pg/
heap file
FSM fork
VM fork
segment 2
ls, du, or hexdump.Physical Layout on Disk
The root of everything is $PGDATA — the data directory specified when initdb is run. Inside that directory the following structure exists:
Default Tablespace (pg_default) — base/ in Detail
The base/ directory is the physical backing of the pg_default tablespace. Every database gets its own numbered subdirectory named after its OID. Inside that directory live both system catalog files and all user-created tables and indexes that were not assigned to a custom tablespace.
SELECT oid, datname FROM pg_database; — the OID is the directory name under $PGDATA/base/.Custom Tablespace Directory Structure
When you create a custom tablespace at /mnt/nvme/pg/, PostgreSQL creates a versioned subdirectory inside it, then a database-OID subdirectory inside that:
PG_17_202307071 subdirectory prevents tablespaces from being accidentally shared across incompatible PostgreSQL major versions. If you upgrade, the new version creates its own subdirectory inside the same OS path.What Happens When a Table or Index Exceeds 1 GB?
PostgreSQL never makes a single OS file larger than 1 GB (= 8 192 bytes × 131 072 pages). When a relation file reaches that limit, a new segment file is automatically created with a numeric suffix appended to the relfilenode. All segments are the same relfilenode — only the suffix changes.
2. Segment numbering starts at 0 (no suffix), then .1, .2, .3 … — there is no upper limit.
3. FSM and VM forks are never segmented — they always stay as a single file regardless of table size.
4. Indexes only have a main fork (segmented if > 1 GB) — no FSM or VM forks.
5. Check all files for a relation:
ls -lh $PGDATA/$(psql -Atc "SELECT pg_relation_filepath('orders')")*How a Table Gets Stored as a Physical File
Every relation has an OID (Object Identifier) in the system catalog. The actual filename on disk is the relation's relfilenode — initially identical to its OID, but may diverge after TRUNCATE, CLUSTER, or VACUUM FULL (all of which rewrite the file).
Step-by-Step: CREATE TABLE → physical file
-- 1. Create a table in a specific tablespace
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
amount NUMERIC(12,2),
status TEXT
) TABLESPACE fast_ssd;
-- 2. Find its OID and relfilenode
SELECT oid, relname, relfilenode, reltablespace
FROM pg_class
WHERE relname = 'orders';
-- oid | relname | relfilenode | reltablespace
-- ------+---------+-------------+---------------
-- 16400 | orders | 16400 | 24601
-- 3. Compute the actual file path
SELECT pg_relation_filepath('orders');
-- pg_tblspc/24601/PG_17_202307071/16384/16400PostgreSQL resolves the path as:
pg_tblspc/<spc_oid> → symlink → /mnt/nvme/pg/PG_17_202307071/<db_oid>/ — one per database using this tablespace<relfilenode> — the actual heap or index filepg_relation_filepath(relname) gives you the complete relative path from $PGDATA. For the absolute path, prepend current_setting('data_directory').What Happens on TRUNCATE or VACUUM FULL?
-- Before TRUNCATE: relfilenode = 16400
TRUNCATE orders;
-- After TRUNCATE: relfilenode = 16450 (new file, old one deleted)
SELECT relfilenode FROM pg_class WHERE relname = 'orders';
-- 16450The old file 16400 is unlinked from the OS. A brand-new file 16450 is created. The OID stays 16400 but the physical filename changes. This is why relfilenode and oid can diverge.
Page / Block Structure Inside the File
Each relation file is divided into fixed-size pages (default 8 KB, set at compile time). A page is the unit of I/O — the smallest amount PostgreSQL reads from or writes to disk.
Tuples grow upward from the bottom of the page; the ItemId array grows downward from just after the header. The free space in the middle is what INSERT consumes. The Free Space Map (FSM) fork tracks which pages have enough free space to avoid scanning the heap during inserts.
-- Inspect a page directly (requires pageinspect extension)
CREATE EXTENSION pageinspect;
SELECT lower, upper, special, pagesize
FROM page_header(get_raw_page('orders', 0));
-- lower = end of ItemId array (bytes from start)
-- upper = start of tuple data (bytes from start)
-- free = upper - lower
SELECT t_ctid, t_xmin, t_xmax, t_infomask, t_data
FROM heap_page_items(get_raw_page('orders', 0))
LIMIT 5;System Catalog Wiring
The catalog tables that track tablespace storage are:
| Catalog | Key Column(s) | Role |
|---|---|---|
| pg_tablespace | oid, spcname, spcowner, spcacl | Lists all tablespaces; OID is used in filenames |
| pg_class | oid, relfilenode, reltablespace, relkind | Maps each relation to its file and tablespace |
| pg_database | oid, dattablespace | Default tablespace for a database |
| pg_namespace | oid, nspname | Schema — purely logical, no storage impact |
-- All tablespaces and their OS paths
SELECT spcname, pg_tablespace_location(oid) AS location
FROM pg_tablespace;
-- spcname | location
-- -----------+---------------
-- pg_default |
-- pg_global |
-- fast_ssd | /mnt/nvme/pg/
-- All relations in a given tablespace
SELECT n.nspname, c.relname, c.relkind, c.relfilenode
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_tablespace t ON t.oid = c.reltablespace
WHERE t.spcname = 'fast_ssd'
ORDER BY c.relname;Segment Files & Fork Files
Every relation is stored across potentially multiple files due to two orthogonal splitting mechanisms: segments and forks.
Segments (Size-Based Splitting)
A single relation file grows until it hits the segment size limit (default 1 GB, defined at compile time as BLCKSZ × RELSEG_SIZE). At that point, PostgreSQL creates a new segment file:
This segmentation ensures compatibility with filesystems that have file size limits, and also makes backup tools simpler.
Forks (Logical Data Splitting)
Each relation actually has up to four forks — separate files for different types of storage data:
| Fork | Suffix | Purpose | Created By |
|---|---|---|---|
| Main | (none) | Heap tuples or index entries | CREATE TABLE / INDEX |
| FSM | _fsm | Free space per page for INSERT routing | First VACUUM |
| VM | _vm | All-visible bits (skip dead tuple checks); freeze status | First VACUUM |
| Init | _init | Empty baseline for crash recovery of UNLOGGED tables | UNLOGGED tables only |
-- See all physical files for a relation
SELECT pg_relation_filepath('orders'); -- main
SELECT pg_relation_filepath('orders') || '_fsm'; -- fsm
SELECT pg_relation_filepath('orders') || '_vm'; -- vm
-- Or check from the shell
-- $ ls -lh $PGDATA/base/16384/16385*
-- -rw------- 1 postgres postgres 8.0M orders main segment 0
-- -rw------- 1 postgres postgres 24K orders FSM
-- -rw------- 1 postgres postgres 8K orders VMpg_global and pg_default
PostgreSQL ships with two built-in tablespaces that cannot be dropped:
$PGDATA/global/. Stores cluster-wide system catalogs: pg_database, pg_authid, pg_tablespace, pg_shdepend, etc. Shared across all databases. You cannot place user objects here.$PGDATA/base/. The default tablespace for all objects unless overridden. Each database has its own subdirectory here named after its OID. Per-database system catalogs live here.-- These have no filesystem path (returned as empty string)
SELECT spcname, pg_tablespace_location(oid)
FROM pg_tablespace
WHERE spcname IN ('pg_default', 'pg_global');
-- Change the default tablespace for a DATABASE
ALTER DATABASE mydb SET TABLESPACE fast_ssd;
-- Change the default for new objects in the SESSION
SET default_tablespace = fast_ssd;PostgreSQL vs Oracle: Storage Model Differences
This is the conceptual crux that trips up DBAs moving between the two systems. They use the same word — "tablespace" — but mean fundamentally different things.
| Dimension | PostgreSQL | Oracle |
|---|---|---|
| Physical backing | Symlink to an OS directory. Files are plain OS files. | One or more datafiles (.dbf) — Oracle-managed block devices. |
| Table → File mapping | 1 table = 1 OS file (its relfilenode). Segments are separate files too. | Many tables share one datafile. Oracle maps extents within that file. |
| Extent management | None — PostgreSQL does not manage extents. The OS handles block allocation. | Yes — LMT (Locally Managed Tablespaces) or DMT with bitmaps/data dictionary. |
| Multiple paths | One directory per tablespace. Cannot span multiple paths. | Multiple datafiles in different paths for one tablespace (striping). |
| Schema = Storage? | No. Schemas are purely logical namespaces. No physical meaning. | Schema ≈ user ≈ set of segments. Strongly coupled in older designs. |
| Resize / extend | The OS directory grows automatically. No DBA action needed. | ALTER TABLESPACE ADD DATAFILE or AUTOEXTEND ON. |
| UNDO / TEMP tablespace | PostgreSQL uses pg_wal for WAL and temp_tablespace for temp files. No UNDO tablespace concept — MVCC stores old versions in the heap (dead tuples). | Dedicated UNDO tablespace for rollback segments. Separate TEMP tablespace. |
| Inspect on disk | ls, hexdump, pg_filedump. Fully transparent. | Proprietary format. Must use RMAN, dbv, or Oracle tools. |
| BIGFILE tablespace | N/A | Single very large datafile (up to 128 TB) to reduce file count. |
| Block size | Default 8 KB, set at compile time, uniform cluster-wide. | 2–32 KB, configurable per tablespace. |
| Online tablespace ops | ALTER TABLE ... SET TABLESPACE rewrites the file. Requires ACCESS EXCLUSIVE lock. | Can take tablespaces offline, rename datafiles online. More flexible. |
Managing Tablespaces
Create a Tablespace
-- Directory must exist, be owned by the OS postgres user, and be empty
-- $ mkdir -p /mnt/nvme/pg && chown postgres:postgres /mnt/nvme/pg
CREATE TABLESPACE fast_ssd
OWNER postgres
LOCATION '/mnt/nvme/pg';Move Objects to a Different Tablespace
-- Move a single table (rewrites file; ACCESS EXCLUSIVE lock required)
ALTER TABLE orders SET TABLESPACE fast_ssd;
-- Move all tables in a database (run while connected to target db)
ALTER DATABASE mydb SET TABLESPACE fast_ssd;
-- Move all objects from one tablespace to another (superuser only)
ALTER TABLESPACE old_ssd RENAME TO archive_hdd;
Drop a Tablespace
-- Tablespace must be empty before dropping
DROP TABLESPACE old_ssd;
-- Check what's still in it
SELECT c.relname, n.nspname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.reltablespace = (SELECT oid FROM pg_tablespace WHERE spcname = 'old_ssd');Tablespace-Level Storage Parameters
-- Set random_page_cost hint per tablespace (optimizer advice)
ALTER TABLESPACE fast_ssd
SET (random_page_cost = 1.1); -- NVMe: nearly sequential
ALTER TABLESPACE archive_hdd
SET (random_page_cost = 4.0); -- spinning disk default
-- seq_page_cost can also be set per tablespace (PG 16+)
ALTER TABLESPACE fast_ssd
SET (seq_page_cost = 0.1);Performance Patterns & Best Practices
Separate Indexes from Tables
Place indexes on an NVMe tablespace and tables on a slower drive to maximize index scan bandwidth independently of sequential heap scans:
CREATE TABLE events (...) TABLESPACE hdd_bulk;
CREATE INDEX events_ts_idx ON events (created_at) TABLESPACE nvme_fast;Partition Tables Across Tablespaces
CREATE TABLE logs_2024 PARTITION OF logs
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
TABLESPACE nvme_hot;
CREATE TABLE logs_2022 PARTITION OF logs
FOR VALUES FROM ('2022-01-01') TO ('2023-01-01')
TABLESPACE hdd_archive;TEMP Tablespace
-- Point sort/hash spill temp files to a fast local SSD
SET temp_tablespaces = 'local_nvme';
-- Or set cluster-wide in postgresql.conf
-- temp_tablespaces = 'local_nvme'Monitoring Tablespace Size
-- Size of each tablespace
SELECT
spcname,
pg_size_pretty(pg_tablespace_size(oid)) AS size,
pg_tablespace_location(oid) AS location
FROM pg_tablespace
ORDER BY pg_tablespace_size(oid) DESC;
-- Top 10 largest relations in a tablespace
SELECT
n.nspname || '.' || c.relname AS relation,
c.relkind,
pg_size_pretty(pg_relation_size(c.oid)) AS size
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_tablespace t ON t.oid = c.reltablespace
WHERE t.spcname = 'fast_ssd'
ORDER BY pg_relation_size(c.oid) DESC
LIMIT 10;pg_relation_filepath()