Architecture Overview¶

DecentDB is designed with a modular architecture emphasizing correctness, performance, and simplicity.

Design Principles¶

Correctness First: ACID guarantees are non-negotiable
Fast Reads: Optimized for read-heavy workloads
Simple Model: Single writer, many readers
Testability: Comprehensive testing at all layers
Portability: Single file database, cross-platform

High-Level Architecture¶

┌─────────────────────────────────────────┐
│              SQL Interface              │
│           (Parser, Binder)              │
├─────────────────────────────────────────┤
│           Query Planner                 │
│      (Index selection, Joins)           │
├─────────────────────────────────────────┤
│           Execution Engine              │
│    (Scan, Filter, Join, Sort)          │
├─────────────────────────────────────────┤
│          Storage Manager                │
│   (Tables, Indexes, Constraints)        │
├─────────────────────────────────────────┤
│            B+Tree Layer                 │
│   (Tables, Indexes ordered storage)     │
├─────────────────────────────────────────┤
│             Pager Layer                 │
│   (Page cache, Free space)              │
├─────────────────────────────────────────┤
│               WAL Layer                 │
│   (Write-ahead log, Recovery)           │
├─────────────────────────────────────────┤
│                VFS                      │
│   (File I/O abstraction)                │
└─────────────────────────────────────────┘

Module Responsibilities¶

SQL Module (`sql/`)¶

Parser: Converts SQL text to AST using libpg_query
Binder: Resolves table/column names, validates types
SQL Types: Expression trees, statement representations

Planner (`planner/`)¶

Query optimization
Index selection
Join ordering
Access path selection

Execution Engine (`exec/`)¶

Iterator-based (Volcano) model
Operators: Scan, Filter, Project, Join, Sort, Limit, Aggregate, Window, TvfScan (table-valued functions)
Expression evaluation
Row materialization

Storage (`storage/`)¶

Table operations (scan, insert, update, delete)
Index operations (seek, range scan)
Constraint enforcement (PK, FK, UNIQUE, NOT NULL, CHECK)
Auto-assigned IDs for a single INT64 PRIMARY KEY when omitted
Trigram index management
Bulk load operations

B+Tree (`btree/`)¶

Ordered key-value storage
Page-based tree structure
Cursor iteration
Node split/merge
Overflow handling

Pager (`pager/`)¶

Page cache management
Page allocation/deallocation
Free list management
Database header

WAL (`wal/`)¶

Write-ahead logging
Transaction commit markers
Crash recovery
Checkpointing
Snapshot isolation for readers

VFS (`vfs/`)¶

OS file abstraction
Error injection for testing
Platform independence

Data Flow¶

Query Execution Flow¶

Parse: SQL text → AST
Bind: Resolve names, validate
Plan: Choose access paths
Execute: Run operators
Return: Results to client

Example:

SELECT * FROM users WHERE id = 1

Parser creates SELECT statement with WHERE clause
Binder resolves "users" table and "id" column
Planner sees PK lookup, chooses IndexSeek
Executor runs IndexSeek on users PK index
Returns row to client

Write Flow¶

Begin Transaction: Acquire write lock
Modify Pages: Write to page cache
Log Changes: Append to WAL
Commit: Write commit marker, fsync WAL
Optional Checkpoint: Copy pages to main file

Read Flow¶

Begin Read: Capture snapshot LSN
Query: Read pages from cache or disk
Check WAL: Overlay WAL frames if newer than snapshot
Return: Consistent snapshot view

Concurrency Model¶

Single Writer¶

Only one write transaction at a time
Implemented via WAL write mutex
Simple, no deadlocks
Writers don't block readers

Multiple Readers¶

Many concurrent read transactions
Each sees snapshot from transaction start
Readers use snapshot LSN for consistency
Readers don't block each other

Snapshot Isolation¶

Readers capture snapshot_lsn at start: - See all changes committed before snapshot_lsn - Don't see changes committed after - Uncommitted changes never visible

Storage Format¶

Database File Structure¶

Page 1: Header (128 bytes) + Page 1 data
Page 2-N: B+Tree pages, overflow pages, etc.

Header Contents¶

Magic bytes: "DECENTDB"
Format version
Page size
Schema cookie
Root page IDs
Freelist head
Last checkpoint LSN
Checksum

WAL File Structure¶

Frame 1: [header][payload][checksum]
Frame 2: [header][payload][checksum]
...
Frame N: [commit marker]

Frame types: - PAGE: Modified page data - COMMIT: Transaction commit - CHECKPOINT: Checkpoint completed

Memory Management¶

Page Cache¶

Fixed-size pool of page buffers
LRU eviction policy
Pin/unpin for active pages
Write-through for dirty pages

Sort Buffers¶

External merge sort for large ORDER BY
Spills to disk when memory limit exceeded
Default: 16MB buffer, 64 spill files max

Row Buffers¶

Reusable per-operator buffers
Avoids per-row heap allocation
Bounded memory usage

Testing Strategy¶

Each layer has comprehensive tests:

Unit Tests: Individual module correctness
Property Tests: Invariants hold under random operations
Crash Tests: WAL recovery verified
Differential Tests: Match PostgreSQL behavior
Benchmarks: Performance regression detection

See Testing Guide for details.

Performance Characteristics¶

Read Performance¶

Point lookup: O(log n) via B+Tree
Range scan: O(log n + m) where m = results
Index seek: Direct page access
Table scan: Sequential page reads

Write Performance¶

Insert: O(log n) for index + O(1) for append
Update: O(log n) for lookup + O(log n) for re-index
Delete: O(log n) for lookup + O(log n) for index update
WAL append: O(1) amortized

Space Overhead¶

B+Tree internal fragmentation: ~20-30%
WAL: 0-2x database size (depending on checkpoint frequency)
Indexes: ~20-100% overhead depending on key size

Extension Points¶

Custom VFS¶

Implement Vfs interface for: - Custom storage backends - Encryption - Compression - Network storage

Custom Operators¶

Add to execution engine: - New join algorithms - Custom aggregations - Domain-specific operations

C API and Language Bindings¶

DecentDB exposes a C API (src/c_api.nim) with 35+ functions for use by language bindings:

Binding	Interface	Notes
.NET	ADO.NET + MicroOrm + EF Core	P/Invoke via `DecentDB.Native`
Go	`database/sql` driver + Direct API	CGO linking
Python	DB-API 2.0 + SQLAlchemy	ctypes loading
Node.js	N-API addon + Knex	Dynamic loading

All bindings support: schema introspection (list tables, columns, indexes), checkpoint, all data types, and auto-assigned IDs for a single INT64 primary key when omitted from INSERT.