Log Structured Storage
Log-structured storage is a powerful approach to organizing data that prioritizes sequential writes for efficiency and durability. In this post, I delve into the inner workings of log-structured storage, focusing on LSM-trees. We'll explore their write and read paths, the mechanics of compaction, and compare leveled and tiered compaction strategies. Additionally, I'll discuss the differences between LSM-trees and B-trees, and talk about variations like Jungle and Bitcask, highlighting their unique optimizations and trade-offs. Basics Write Path Write are first appended to a Write-Ahead-Log (WAL) for durability, then they are written to a MemTable which is basically an in-memory sorted data structure such as a balanced tree (e.g. red-black or AVL tree) or a skip list (I have a detailed post about skip list on this blog). After writing a write to WAL and the MemTable, we can return success to the client. We can have multiple MemTables, but at any given time, ...