Genefication: Generative AI + Formal Verification

Generative AI is a powerful tool that can significantly enhance developer productivity by generating code quickly and efficiently. However, a major challenge when using generative AI is ensuring the correctness of the generated code, especially in critical systems where errors can have significant consequences. I have heard of companies dealing with critical systems have banned the use of any code generation technologies, including generative AI, to minimize the risk of introducing bugs. In addition, there are reports suggesting using AI coding assistants has led to a 40% increase in the bug rate.

A promising way to address the correctness challenges of AI-generated code, particularly in the context of high-level distributed system design, is by combining generative AI with formal verification. This approach could be referred to as Genefication, a term proposed here to capture the synergy between these two technologies. It involves using generative AI to draft code or specifications, followed by applying formal verification to rigorously ensure that the design satisfies critical safety and correctness properties. This creates a workflow that merges the efficiency of AI with the dependability of formal methods. In this post, the interplay between these technologies is explored, with TLA+ serving as the formal specification language and ChatGPT providing specification generation.

TLA+ and PlusCal

TLA+ is a formal specification language that allows you to precisely describe the behavior of your system or program. Once you have written a specification, you can use a model checker, such as TLC, to exhaustively explore all possible behaviors and verify the correctness of your design. TLA+ specs uses mathematical symbols and expressions. For programmers, PlusCal offers a more intuitive approach, as its syntax resembles programming languages like C++ or Java. You can write a PlusCal specification, translate it automatically into TLA+, and use TLC to model check it.

TLA+ is particularly well-suited for verifying the correctness of distributed algorithms. It helps ensure that your design satisfies critical safety and liveness properties under all possible executions, even in scenarios with unusual timings or failures common in distributed systems. By identifying design flaws early, TLA+ provides confidence in the reliability of your system before implementation begins.

To learn more about application of TLA+ and formal methods in building distributed system, I encourage you read Formal Methods: Just Good Engineering Practice? by Marc Brooker. You might also want to check out some Practical Tips on using TLA+ and P that I shared in this blog before.

Genefication

Figure 1 presents the flowchart illustrating the integration of generative AI tools with formal verification—an approach proposed here as Genefication. The genefication process begins with an initial or partial specification, which can be drafted manually by the developer or generated by a generative AI tool, such as an LLM, from an English description of the system and its desired behavior. Once the initial draft is ready, we enter a verification-modification cycle to refine and validate the specification.

In each iteration of the cycle, a model checker is used to verify the specification. If the model checker confirms the specification is correct, the process is complete. However, if the specification is found to violate the desired behavior, the model checker provides a counterexample demonstrating the issue. This counterexample is then submitted to the generative AI tool to generate a revised version of the specification aimed at resolving the identified problem. The cycle repeats until the model checker verifies that the specification satisfies all desired properties, ensuring correctness.

Figure 1. Genefication Flowchart

Example: Mutual Exclusion

Let’s apply the above procedure to develop a correct specification that satisfies a simple mutual exclusion property.

Initial Specification

Spec 1 shows my initial specification. In this specification, we have a race condition between two concurrent processes A and B over in_critical_section variable. Each process has an assert statement that verify that when the process wants to access the variable the other process is not accessing the variable.

Spec 1. Initial specification for mutual exclusion. The failure of the assertion means the mutual exclusion is violated.

Verification-Modification Cycle

The current specificaiton clearly violates mutual exclusion, as the two process may access the in_critical_section at the same time. When I run the TLC on my spec I get this error:

Figure 2. The initial spec violate the mutual exclusion assertion


Now, let's use ChatGPT to fix the spec. 

Prompt: The following is my PlusCal specification. Currently, the assertion fails with this counterexample 
2: A_1 in mutual_exclusion 
>>
in_critical_section M
"A"
pc (2)M
<<"A_2", "B_1">>
1 M
"A_2"
2
"B_1"

Fix my spec!

ChatGPT Response
The issue in your PlusCal specification lies in the lack of mutual exclusion enforcement. The in_critical_section variable is used to detect when a process is already in the critical section, but there is no mechanism to prevent both processes from entering the critical section simultaneously.

Peterson's algorithm typically employs two flags and a turn variable to enforce mutual exclusion. Here's how to fix your specification to implement Peterson's algorithm correctly:

Spec 2 . TLA+ specification generated by ChatGPT after fixing my initial specification

Note that the ChatGPT generated spec was missing A_3 and B_3 labels shown in Spec 2. After fixing this minor issue. I ran the TLC, and this time it passed the model checking. Figure 3 shows the output.

Figure 3. TLC output after revising the specification with ChatGPT

As demonstrated in ChatGPT's response, the well-known Peterson's algorithm was employed to achieve mutual exclusion. I intentionally selected a straightforward problem to illustrate the concept concisely in this blog post. However, a more compelling example would involve generating and verifying a solution to more complex problem—one that lacks solutions in the training data, such as those not commonly found on the web.

Note that having a correct TLA+ specification does not mean we are done. After having the spec, we have to code it in a practical programming language such as C++/Java where we may introduce implementation bugs. Having a correct TLA+ spec, however, we can be sure of the correctness of our high-level design.

Conclusion

In this post, I talked about genefication, an approach to combine generative AI with formal verification to generate correct specifications. I used an example of creating a simple specification for mutual exclusion. The same process can be used for more complicated cases. 

Future Work and Collaboration Opportunity

In this post, I manually executed the genefication proccess shown in Figure 1. An interesting future work is developing a tool to automate this process. If you're interested in collaborating to build such a tool, feel free to reach out to me on LinkedIn.

Comments

Popular posts from this blog

In-memory vs. On-disk Databases

DynamoDB, Ten Years Later

ByteGraph: A Graph Database for TikTok

Eventual Consistency and Conflict Resolution - Part 1