In-memory vs. On-disk Databases
Figure 1 shows the main difference between in-memory and on-disk databases:
But wait! doesn't backing up data on the disk defeat the purpose of using
the memory to have faster operations?! If we synchronously backup the
data, then operations must wait for the disk. On the other hand, if we do it
asynchronously, we might have data loss, when our system crashes after
writing to the memory and returning the operation and before backing up on
disk. So, how an in-memory database enjoys the higher speed of the RAM while
guaranteeing the durability of our data?
An in-memory database stores the data in memory and uses disk for backup,
while
an on-disk database stores the data on disk and uses memory for
caching.
Of course, we all know memory (RAM) is much (multiple order of magnitude)
faster than disk, so the advantage of an in-memory database is clear. However,
the first question that can come up about an in-memory database is what about
the durability of our data? We know the data in the memory is not durable. The
answer is: we use disk for backing up the data.
If we want to guarantee the durability of our data, we have to write it to
non-volatile memory. That is something that we cannot avoid. Thus, even an
in-memory database must make sure that data has been written to some sort of
non-volatile memory such as disk before returning the operation to the client.
However, note that the latency overhead of an on-disk database does not come
only from writing to the disk, rather it comes from maintaining a data
structure such as B-tree on disk. If we simply append data to a file, we are
still faster than an on-disk database. We can batch together multiple writes
and quickly write a whole disk block. Don't think all disk accesses are the
same. There is a huge difference between quickly write blocks one after
another to a file with seeking different locations and updating data in them.
Thus, to guarantee durability, in-memory databases first simply append the
data to an append-only log, and then write it to the memory and return
the operation. Now, when the system recovers from a crash, it fills the memory
with data that it has on the disk.
Figure 2. Durability in an in-memory database |
But doesn't this log become very long overtime? If we simply append data,
instead of updating the existing data, our data will continue growing.
What is the end game?!
That's a valid point. Having a very long log has two big issues: 1) it
wastes storage and 2) it makes recovery time very long, as the system has to
read the whole log to rebuild the database on the memory. To prevent that,
in-memory databases maintain a disk data structure such as B-tree and write
the log entries periodically to the B-tree. After applying log entries to
the B-tree, we purge them from the log. This is called checkpointing.
Now, when the system restarts, we first load the checkpoint and write more
recent log entries that are not included in the checkpoint to the
memory.
Figure 3. Recovery from a checkpoint |
In an on-disk database also we have a WAL and a B-tree, and we
cache the data in the RAM. Can't we conclude an in-memory database is
nothing but a normal on-disk database with a big cache?!
In an on-disk
database, we still have serialization and data layout overhead compared with
an in-memory database. In an in-memory database, we assume the whole data is
always in the RAM. We do not have such an assumption in an on-disk database.
This will change how we design things like data structures and indexing for
our database.
Comments
Post a Comment