PostgreSQL Storage Engine Survey

PostgreSQL and RocksDB Overview

System Properties Comparison PostgreSQL vs. RocksDB

Name PostgreSQL RocksDB
Description Widely used open source RDBMS Embeddable persistent key-value store optimized for fast storage (flash and RAM)
Primary database model Relational DBMS Key-value store
Implementation language C C++
Data schema yes schema-free
Typing yes no
Secondary indexes yes no
SQL yes no
Server-side scripts UDF no
Triggers yes no
Partitioning partitioning by range, list and (since PostgreSQL 11) by hash none
Replication methods Master-slave replication none
Foreign keys yes no
Transaction concepts ACID yes
Concurrency yes yes
Durability yes yes
In-memory capabilities no yes

RocksDB

Pros

Performance

  • Multithread compaction
  • Multithread memtable inserts
  • Reduced DB mutex holding
  • Optimized level-based compaction style and universal compaction style
  • Prefix bloom filter
  • Memtable bloom filter
  • Single bloom filter covering the whole SST file
  • Write lock optimization
  • Improved Iter::Prev() performance
  • Fewer comparator calls during SkipList searches
  • Allocate memtable memory using huge page

Features

  • Column Families
  • Transactions and WriteBatchWithIndex
  • Backup and Checkpoints
  • Merge Operators
  • Compaction Filters
  • RocksDB Java
  • Manual Compactions Run in Parallel with Automatic Compactions
  • Persistent Cache
  • Bulk loading
  • Forward Iterators/ Tailing iterator
  • Single delete
  • Delete files in range
  • Pin iterator key/value

Alternative Data Structures And Formats

  • Plain Table format for memory-only use cases
  • Vector-based and hash-based memtable format
  • Clock-based cache (coming soon)
  • Pluggable information log
  • Annotate transaction log write with blob (for replication)

Tunability

  • Rate limiting
  • Tunable Slowdown and Stop threshold
  • Option to keep all files open
  • Option to keep all index and bloom filter blocks in block cache
  • Multiple WAL recovery modes
  • Fadvise hints for readahead and to avoid caching in OS page cache
  • Option to pin indexes and bloom filters of L0 files in memory
  • More Compression Types: zlib, lz4, zstd
  • Compression Dictionary
  • Checksum Type: xxhash
  • Different level size multiplier and compression type for each level.

Manageability

  • Statistics
  • Thread-local profiling
  • More commands in command-line tools
  • User-defined table properties
  • Event listeners
  • More DB Properties
  • Dynamic option changes
  • Get options from a string or map
  • Persistent options to option files

Cons

  • Merging process were manageable hardly, leaded performance unstable and writing amplification
  • Reading and writing needed to access Memtable mutually, R / W mixture mode using multiple threads concurrency had bottleneck
  • WALs were shared among column families which required single-thread writing mode and limited high-speed devices

PostgreSQL

Pros

  • Transaction management
  • SQL support
  • Data importing speed
  • Geographic data type, multi-dimension index

Cons

  • Code complexity and maintain effort
  • Cannot be compiled as static library and need to compile on sepecific OS versions

Postgres based storage engine draft

  • Defined communication specification between client and server based on pg_protocol, get rid of gRPC
  • Postgres background process extension
  • Data import and export
  • Data encryption

Reference

System Properties Comparison PostgreSQL vs. RocksDB
In-Memory Microbenchmark
MyRocks: A space- and write-optimized MySQL database
Benchmarking InnoDB and MyRocks Performance using Sysbench
Pushdown computations in PolyBase
The Internals of PostgreSQL - Process and Memory Architecture
RocksDB introduction:RocksDB use cases and features (Chinese)
RocksDB Pros and Cons in application scenarios (Chinese)

Kick off

This blog used JekyII theme Hyde.

Hyde is a brazen two-column Jekyll theme that pairs a prominent sidebar with uncomplicated content. It’s based on Poole, the Jekyll butler.

Thanks!