Genefication: Generative AI + Formal Verification
Generative AI is a powerful tool that
can significantly enhance developer productivity by generating code
quickly and efficiently. However, a major challenge when using
generative AI is ensuring the correctness of the generated code,
especially in critical systems where errors can have significant
consequences. I have heard of companies dealing with critical
systems have banned the use of any code generation technologies, including generative AI, to minimize
the risk of introducing bugs. In addition, there are
reports suggesting using AI coding assistants has led to a 40%
increase in the bug rate.
A promising way to address the correctness challenges of AI-generated
code, particularly in the context of high-level distributed system
design, is by combining generative AI with formal verification. This
approach could be referred to as Genefication, a term proposed
here to capture the synergy between these two technologies. It involves
using generative AI to draft code or specifications, followed by
applying formal verification to rigorously ensure that the design
satisfies critical safety and correctness properties. This creates a
workflow that merges the efficiency of AI with the dependability of
formal methods. In this post, the interplay between these technologies
is explored, with TLA+ serving as the formal specification language and
ChatGPT providing specification generation.
TLA+ and PlusCal
TLA+ is a formal specification language that allows you to precisely
describe the behavior of your system or program. Once you have written a
specification, you can use a model checker, such as TLC, to exhaustively
explore all possible behaviors and verify the correctness of your design.
TLA+ specs uses mathematical symbols and expressions. For programmers,
PlusCal offers a more intuitive approach, as its syntax resembles
programming languages like C++ or Java. You can write a PlusCal
specification, translate it automatically into TLA+, and use TLC to model
check it.
TLA+ is particularly well-suited for verifying the correctness of
distributed algorithms. It helps ensure that your design satisfies
critical safety and liveness properties under all possible executions,
even in scenarios with unusual timings or failures common in distributed
systems. By identifying design flaws early, TLA+ provides confidence in
the reliability of your system before implementation begins.
To learn more about application of TLA+ and formal methods in building
distributed system, I encourage you read
Formal Methods: Just Good Engineering Practice?
by Marc Brooker. You might also want to check out some Practical Tips on using TLA+ and P that I shared in this blog before.
Genefication
Figure 1 presents the flowchart illustrating the integration of generative
AI tools with formal verification—an approach proposed here as
Genefication. The genefication process begins with an initial or
partial specification, which can be drafted manually by the developer or
generated by a generative AI tool, such as an LLM, from an English
description of the system and its desired behavior. Once the initial draft
is ready, we enter a verification-modification cycle to refine and
validate the specification.
In each iteration of the cycle, a model checker is used to verify the
specification. If the model checker confirms the specification is correct,
the process is complete. However, if the specification is found to violate
the desired behavior, the model checker provides a counterexample
demonstrating the issue. This counterexample is then submitted to the
generative AI tool to generate a revised version of the specification
aimed at resolving the identified problem. The cycle repeats until the
model checker verifies that the specification satisfies all desired
properties, ensuring correctness.
Example: Mutual Exclusion
Let’s apply the above procedure to develop a correct specification that
satisfies a simple mutual exclusion property.
Initial Specification
Spec 1 shows my initial specification. In this specification, we have a race condition between two
concurrent processes A and B over in_critical_section variable. Each
process has an assert statement that verify that when the process wants to
access the variable the other process is not accessing the variable.
Spec 1. Initial specification for mutual exclusion. The failure of
the assertion means the mutual exclusion is violated.
Verification-Modification Cycle
The current specificaiton clearly violates mutual exclusion, as the two
process may access the in_critical_section at the same time. When I run the TLC on my spec I get this error:
Figure 2. The initial spec violate the mutual exclusion assertion |
Now, let's use ChatGPT to fix the spec.
Prompt:
The following is my PlusCal specification. Currently, the assertion fails
with this counterexample
2: A_1 in mutual_exclusion
>>
in_critical_section M
"A"
pc (2)M
<<"A_2", "B_1">>
1 M
"A_2"
2
"B_1"
Fix my spec!
ChatGPT Response:
The issue in your PlusCal specification lies in the lack of mutual
exclusion enforcement. The in_critical_section variable is used to
detect when a process is already in the critical section, but there is
no mechanism to prevent both processes from entering the critical
section simultaneously.
Peterson's algorithm typically employs two flags and a turn variable to
enforce mutual exclusion. Here's how to fix your specification to
implement Peterson's algorithm correctly:
Spec 2 . TLA+ specification generated by ChatGPT after fixing my
initial specification
Note that the ChatGPT generated spec was missing A_3 and B_3 labels shown in
Spec 2. After fixing this minor issue. I ran the TLC, and this time it
passed the model checking. Figure 3 shows the output.
Figure 3. TLC output after revising the specification with
ChatGPT
|
As demonstrated in ChatGPT's response, the well-known Peterson's algorithm was employed to achieve mutual exclusion. I intentionally selected a straightforward problem to illustrate the concept concisely in this blog post. However, a more compelling example would involve generating and verifying a solution to more complex problem—one that lacks solutions in the training data, such as those not commonly found on the web.
Note that having a correct TLA+ specification does not mean we are done. After having the spec, we have to code it in a practical programming language such as C++/Java where we may introduce implementation bugs. Having a correct TLA+ spec, however, we can be sure of the correctness of our high-level design.
Conclusion
In this post, I talked about genefication, an approach to combine generative AI with formal verification to generate correct specifications. I used
an example of creating a simple specification for mutual exclusion. The same process can be used for more complicated cases.
Future Work and Collaboration Opportunity
In this post, I manually executed the genefication proccess shown in
Figure 1. An interesting future work is developing a tool to
automate this process. If you're interested in collaborating to build such a tool, feel free to reach out to me on
LinkedIn.
Comments
Post a Comment