Serializability is the strongest isolation level that a transactional system
can offer. This level of isolation removes all possible anomalies due to
concurrently running transactions. It provides a powerful abstraction
for application developers. The letter "I" in ACID, stands for isolation, and
when people talk about ACID transactions, they most likely mean serializable
transactions. However, note that many systems that claim to provide ACID
transactions, in reality, provide weaker isolation levels. In this post, we
want to review serializability and see how a database system can provide
serializability for single-node and distributed transactions.
Listen to the Audio Blog
Serializability
Serializability simply means that the database should appear to run transactions in some sequential order. In other words, although we might run transactions
concurrently, to the clients, it should appear we are running transactions
one-after-another.
Strict Serializability: Serializability + Linearizability
There is something interesting in the definition of serializability;
serializability only requires "some" order. Thus, "any" order is acceptable by
serializability. As a result, if I run transaction T1 that changes the value
of x from 40 to 50, and then run transaction T2 that reads x, it is OK under
serializability to read 40 instead of 50 because we can say the system appears
to first execute T2 and then T1.
This might not be acceptable for some systems. Those systems require a
stronger guarantee called strict serializability. The strict serializability requires the order of serializability to
respect the real-time order of transactions, i.e. if T1 commits before T2
starts, then the system should appear to run T2 after T1. With strict
serializability, in the example above, we are guaranteed to read 50 instead of 40.
Obviously, strict serializability is stronger than serializability. Then,
why people say serializability is the strongest isolation level?
Well, when we talk about isolation,
we are talking about the concurrency of the transactions. With regard to
isolation, serializability is the strongest level that we can achieve, as it
provides ultimate isolation possible--one-after-another. No isolation stronger
than that is possible. What strict serializability adds to serializability
does not concern with isolation. Instead, it concerns the recency of
the data. Strict serializability requires linearizability which
requires the reads to respect the real-time order of the events.
Thus, it is true to say the regarding isolation alone, serializability is the
strongest level possible.
Serializability and Replica Consistency
Note that when we have only a single copy of our data the traditional ways to
achieve serializability such as 2PL (as we will see next) actually provide
strict serializability by default. That's why strict serializability is also
known as strong one-copy serializability. The linearizability
aspect of strict serializability typically becomes important when we want to
manage transactions for replicated systems. In replicated systems, we can have
various levels of consistency between replicas. Linearizability is one of
those consistency levels which is actually the strongest level possible.
Weaker consistency levels include pure eventual consistency, session
guarantees, causal consistency, and sequential consistency. I will try to
cover them in a separate post. On the other hand, we have weaker isolation
levels such as read uncommitted, read committed, snapshot isolation.
|
Figure 1. Isolation and replica consistency are orthogonal requirements. Strict serializability requires the strongest levels of both of these requirements. |
How to achieve Serializability
Concurrency control mechanisms can be classified into pessimistic and optimistic. Pessimistic solutions provides isolation by conservatively preventing transactions from accessing the same items at the same time. To do that, pessimistic solutions usually use some kind of locking mechanism. Optimistic solutions, on the other hand, do not restrict transactions from accessing objects. Instead, they let transactions do whatever they want, and only at the commit time, they check whether committing a transaction violates the desired isolation level or not.
2-Phase Locking (2PL)
One pessimistic way of guaranteeing serializability is to actually run transactions sequentially, i.e., run transactions one by one in a single thread. While this approach is used for some in-memory databases, generally databases do not use this solution as it results in zero parallelism. Imagine you have a machine with multiple cores and you can easily afford running transactions in parallel. By running transactions sequentially, you lose all the processing power that is available to you. It would be great if we could allow transactions run in parallel while guaranteeing serializability. Using 2-Phase Locking (2PL), we can exactly do that. It
works as follows: We assign a lock for each object in the database. This lock
can be acquired in two modes: shared or exclusive.
Operations get the locks as follows, and once obtained a lock, they never
release it until the end of the transaction:
-
A read operation obtains the lock in the shared mode if no other
transaction has already obtained it in the exclusive mode.
-
Note that if another transaction has obtained it in the
shared mode, it is OK, i.e. more than one transaction can obtain a
lock in the shared mode simultaneously.
-
A write operation obtains the lock in the exclusive mode if no other
transaction has already obtained it in either shared or exclusive
mode.
-
When a transaction first obtains a lock in a shared mode, but later
it wants to write it, it must upgrade its lock from shared to
exclusive mode. Note that, just like getting the lock in exclusive mode,
upgrading lock to exclusive mode requires that the lock is not obtained by
any other transaction.
Phantoms
In the method explained above, we assign a lock to each object. To satisfy serializability with 2PL, we have to assign locks to not only objects that already exist in the database, but
also to those that are not in the database yet.
Let's see an example. Suppose two patients are trying to book an appointment
with a physician. They both want to book their appointments for Monday, at 1
pm. The application executes a transaction for each patient. Each transaction
first checks if the physician is available on Monday at 1pm by checking the
rows in the appointments table, then adds a row to this table only if it didn't
find any row for Monday at 1 pm.
Now, let’s see how the system may end up in a state where both patients are booked for the same time slot: the first transaction checks the rows. It does not find anything and does not lock anything. The second transaction also checks the rows and does not find or lock anything. Then, each of them adds a row for Monday 1 pm to the table. Now we have two appointments booked for the same time. That is not desirable and is a violation of serializability. This anomaly is called a phantom write.
We can prevent phantoms by locking non-existing rows. To do that, we can lock ranges using indexes. Suppose we have an index on the date of the
appointments table. Now any record that is for this Monday will fall under the same index entry. Thus, by locking that index entry we are basically locking those
objects whether they already exist or not. Any future object that falls under
this index value is associate with that lock, so we are basically locking
future and non-existing objects too.
Deadlocks
In the example above, suppose both transactions acquire the lock for index "date =
this Monday" in the shared mode to read. Now, they want to write, so they have
to upgrade their shared locks to exclusive locks. However, since the lock is
held in the shared mode by the other transaction, none of them can proceed.
This situation is called a deadlock. There are two general
approaches to deal with deadlocks: 1) deadlock detection and 2) deadlock avoidance.
Deadlock Detection: In this approach, we try to detect and resolve
deadlocks. A simple method falling into this category is to introduce timeouts and retries. When a transaction sees that it cannot make progress, it aborts after
some time, and retries. However, this is not efficient, as anything that a
transaction has done so far will be lost due to the deadlock. Also, our
detection mechanism may have false positive, i.e. we might abort a transaction due to
timeout thinking it is in deadlock, while in reality it is not in deadlock and
it is just taking a long time to finish. In addition, there is no guarantee
that the transaction won't face deadlock again. Our two transactions may get
really unlucky and do the same loop again and again. This is called livelock.
A better method to detect and resolve deadlocks is to use the wait-for
graph. In this graph, we have one vertex for each in-flight transaction. When
transaction A requests a lock held by transaction B, we add an edge from A to
B. Any time we have a cycle in the graph, we have a deadlock. Thus, we have a
way to detect deadlocks. After detection, we have to abort one of the
transactions involved in the cycle to break the cycle. Usually, we abort the
most recent transaction, as we prefer to let older transactions proceed.
Otherwise, a transaction might starve and never commit if it gets really
unlucky. We can do the detection either periodically or
continuously (i.e., on every edge we add to the graph).
Deadlock Avoidance: In this approach, we eliminate the possibility of
deadlocks. One way to avoid deadlocks is through conservative 2PL. In
normal 2PL, transactions gradually get the locks, hold them, and release them
at the end of the transaction. In the conservative 2PL, a transaction gets
all the locks it might ask during its lifetime right at the beginning. This
way, it might wait to start, but once started, it never waits for any lock
again, so it never gets into any deadlock. The conservative 2PL is not very
efficient, as it reduces the concurrency in our system, i.e. there might be
cases where we would be OK if we let a transaction start, but due to the conservative
nature of this approach, we delayed it.
Instead of conservative 2PL, transaction managers usually define priority for
transactions according to their timestamps. We typically honor the older
transactions (to avoid starvation). Thus older transactions (with smaller
timestamps) have higher priority than younger transactions (with larger
timestamps).
Now, suppose transactions T1 requests a lock hold by transaction T2. The
transaction manager can do any of the following approaches to avoid deadlocks:
Wait-die: If T1 is older than T2, T1 blocks and waits for T2 to release the lock. If T1 is younger than T2, T1 aborts.
- Older transactions wait for younger ones.
-
Younger transactions never wait for older ones; they just abort right
away.
Wound-wait: If T1 is older, T2 aborts right away. If T1 is younger it waits for T2.
- Older transactions wound younger ones by aborting them.
- Younger transactions wait for older ones.
A better way to remember these two is: we know lower-priority transactions
must be sacrificed for higher-priority transactions somewhere.
-
In wait-die, lower-priority transactions die when they request
higher-priority transactions' locks.
-
In wound-wait, lower-priority transactions die when higher-priority
transactions request their locks.
Optimistic Concurrency Control
We don't want to go to details of these methods, but basically, OCC systems
work as follows:
-
The database can provide a snapshot of the data, i.e. we can read all of
our objects in the context of a transaction at a frozen point in time. OCC
systems usually provide this feature by keeping multiple versions for each
object. A technique that is known as Multi-Version Concurrency Control
(MVCC). So, all reads of a transaction read from a snapshot.
- All writes are buffered without actually writing to the database.
-
When the client wants to commit, the transaction manager checks what
versions the transaction has read. If it finds out that any of the versions
that the transaction has read has been overwritten by another
transaction, it aborts the transaction. Otherwise, it commits and applies
the buffered writes to the store.
Phantoms
Just like 2PL, with OCC also, there is a subtlety regarding non-existing
objects. As we said above, in 2PL, we have to assign locks not only to the
existing objects but also to the non-existing objects to avoid phantom
anomaly. Similarly, in OCC, we have to keep track of both existing and non-existing objects. For example, if a transaction queries the database to find
any booking row for Monday at 1 pm, and then at the commit time we see there
is such record, we have to abort the transaction. Just as in 2PL, we can use
indexes to estimate predicates.
Serializability for Distributed Transactions
In practice, many applications cannot fit their entire data in a single node. Thus,
they partition their data and host it on multiple machines. In these applications,
a transaction may span over multiple nodes. Such transactions are called
distributed transactions.
We can use the same approaches explained above to provide serializability for
distributed transactions too. For example, to use 2PL, each node can manage
the locks for the keys that it hosts and respond to the requests to acquire and release locks. With regard to isolation, nothing is different
between single-node or distributed transactions. What makes distributed
transactions different than single-node transactions is the way to provide
atomicity which requires a transaction to either apply completely or cancel completely. For single-node transactions, atomicity is usually provided by writing the updates to stable storage (in a write-ahead log) before committing the transactions. For distributed transactions, we need more.
2-Phase Commit (2PC)
A well-known algorithm to provide atomicity for distributed transactions is
called 2-Phase Commit (2PC). Once a client is ready, it requests a coordinator to commit the transaction. The
coordinator works with participants each hosting part of the objects included
in the transaction to commit the transaction. As the name suggests, the
algorithm runs in two phases:
-
Phase 1: In the first phase, the coordinator sends the
Prepare messages to the participants. Each node checks if it can
commit the transaction or not.
The answer to Prepare message by a participant cannot change.
Thus, before a participant sends YES back to the coordinator, it must make
sure that under any circumstance (even crash) it will be able to commit the
transaction. It is crucial for the correctness of the 2PC. Thus, usually,
each participant appends the writes of the transaction to a log on
persistent storage before sending YES to the coordinator.
-
Phase 2: Once the coordinator got OKs from all the participants, it
sends Commit messages to all of them. If any of the participants sends NO,
the coordinator sends Abort to them. The participants commit/abort the
transaction based on the decision by the coordinator.
The final decision by the coordinator cannot change.
Thus, before sending a Commit/Abort message, the coordinator writes its decision to persistent storage to remember it after recovery from a crash.
Mixing 2PC with Serializability Approaches
As explained earlier, 2PC concerns another letter in ACID-- the "A" which stands for atomicity. This requirement is orthogonal to isolation. Thus, we can use 2PC with any of the serializability approaches explained above. For example, Google Spanner uses
2PC+2PL (with wound-wait). Another system called
GRIT uses 2PC + OCC.
Distributed Transactions for Replicated Systems
To provide high availability, many modern systems use replication. As soon as we replicate our data, another dimension adds to our transactional systems. As mentioned earlier, there are various levels of replica consistency. Providing strict serializability for partitions and replicated systems is an interesting and challenging problem. Following systems are examples of transactional systems that solve
this problem using different approaches:
Comments
Post a Comment