NuRaftOnTheRocks: A Replicated Key-value Store using RocksDB and NuRaft
NuRaft is an open source C++ Raft implementation. NuRaft is highly plugable; it allows you to provide your own persistent log store and state manager and integrate it with your state machine. That allows you to integrate NuRaft with any application whenever you need log replication. In this post, I show how you can integrate NuRaft with RocksDB to build a replicated key-value store.
Raft
Raft is a consensus and log replication protocol. Raft allows multiple processes in a distributed system to agree on the entries of a shared log. Using such a guarantee, we can achieve state machine replication as follows: having a set of state machines starting at the same initial state, we record operations on the state machine on a shared log, and have each state machine perform operations specified in the shared log. This way all state machines remain consistent with each other. State machine replication is very useful. We can create replicated data stores. In this post, we want to create a replicated key-value store using Raft. In this example, our state machine is a key-value store for which we use RocksDB.
I have talked in detail about Raft in an
earlier post in this blog. If you like to know more about the internals of Raft you can refer to that
post.
NuRaft
NuRaft is an implementation of Raft in C++ that is open source by my ex-team
at eBay. NuRaft is highly plugable; you can easily integrate it with your own
persistent log implementation, state manager and most importantly your
own state machine. Specifically, to use NuRaft, you need to implement the
following interfaces:
Once you have implementation for these interfaces, you don't need to worry
about replication, as NuRaft takes care of that for you and guarantees
sequential consistency, i.e., all state machines will apply operations in the
log in the exact same order.
RocksDB
RocksDB is a popular embedded key-value store. RocksDB is not a distributed or
replicated key-value store. Instead, you can use it as a storage engine for a
distributed/replicated key-value store. RocksDB is an LSM-tree storage.
Figure 2. An LSM-tree with leveled compaction.
I have talked in detail about log-structured storage and LSM-trees in an earlier post in this blog. If you like to know more about the
internals of LSM-trees you can refer to that post.
Replicating RocksDB with NuRaft
In this post, I want to show how you can integrate NuRaft with RocksDB to
build a replicated key-value store.
I use the in-memory implementations for log store and state manager provided
in the NuRaft repository for examples. Note that with an in-memory log store
we actually don't have the consistency guarantees. In a real system you need
persistent implementations for these interfaces. You can refer to
ClickHouse's log store for an example of real persistent log store.
Source Code
You can access the source code of the replicated key-value store explained in
this post on Github:
Before reading the next section, refer to the repository's Readme and
follow the instructions for building and running the tests.
Code Walkthrough
The source code is organized as follows:
- api: This folder contains a simple gRPC service definition. The service has two sets of operations: 1) key-value CRUD operations (e.g. put or get a key) and 2) Raft operations (e.g., for adding a new server to the raft group)
- scripts: to run tests
- src: The main code.
- tests: includes tests for all implementations
Let's walk through the src folder. I have four different implementations,
under src folder:
- single in-memory: a single in-memory key-value store.
- single RocksDB: a single persistent key-value store backed by RockDB.
- multi in-memory: a replicate in-memory key-value store backed by NuRaft.
- multi RocksDB: a replicated persistent key-value store backed by NuRaft and RocksDB.
All of these implementations share code in the src/common folder. So let's see what we have in src/common.
- grpc_server: Defines GrpcServer class that implements the gRPC service. It has an instance of KeyValueServer interface.
- kv_server_interface: Defines KeyValueServer interface that include the key-value CRUD operations and raft operations.
- raft_kv_server: Implements the KeyValueServer interface and has an instance of KeyValueStateMachine interface.
- state_machine_interface: Defines KeyValueStateMachine which is a virtual class that implements the nuraft::state_machine interface and requires its own pure virtual functions.
- getKey
- scan
- applyPayload
For single key-value stores (i.e. single in-memory and single RocksDB), we
directly implement the KeyValueServer interface. For replicated key-value
stores (i.e., multi in-memory and multi RocksDB), we implement the
KeyValueStateMachine interface.
Figure 3 shows the design and how classes and interfaces are related to each
other.
Figure 3. Interfaces and classes
run_server.hxx file has the factory logic to create the appropriate GrpcServer
instance based on the input arguments passed to the main function.
Conclusion
In this post, we saw how we can use NuRaft to create a replicated state machine system. Here, we used RocksDB as our state machine and built a replicated key-value store by integrating it with NuRaft, but you can plug NuRaft any state machine to achieve strong consistency and log replication.
Comments
Post a Comment