Model-based Testing Distributed Systems with P Language
P is the name of a domain-specific language for asynchronous event-driven programming. With P, we can specify a system as a set of interacting state machines which talk to each other by sending events. That makes P a suitable language for modeling distributed systems where nodes talk to each other asynchronously via messages. P originally developed by Microsoft and according to its Github repository, it is currently being used extensively inside Amazon AWS for model-checking complex distributed systems. In this post, we want to see how we can use P to catch bugs in our protocols.
The word P is for protocols [2]. Although one of the main goals of P is to integrate the model-checking and code generation, in this post, we only focus on the model-checking part. It is important to note that unlike other model-checking tools for distributed systems like TLA+, P is not designed for exhaustive model-checking that we use to make sure of the correctness of our protocol in every possible execution. Instead, P is designed for model-based testing. Thus, P cannot be used to prove that a given protocol is correct, and just like any other testing method, its goal is actually to prove that our system is incorrect by finding bugs.
Outline
- Concepts
- Interacting State Machines
- Handling Events
- Call Transitions
- Non-determinism
- Modeling Faults
- Modeling the Environment
- Writing P Programs
- Examples
Concepts
Interacting State Machines
Handling Events
Call Transitions
Sketch 2. An example of a call transition. When the state machine enters state Working, it does not leave state Waiting. Thus, it is still responsive to PING events. |
Non-determinism
if ($) {//code for going up} else {//code for going down}
if ($$) {//code for going up} else {//code for going down}
Modeling Faults
Sketch 3. Adding a fault injector machine to the protocol shown in Sketch 2. The fault injector machine non-deterministically chooses one of the machines and sends a RESTART event to it. |
Modeling the Environment
While checking our protocol, often time, we also need to model the
context/environment where our protocol executes. For example, we may
need to model clients or third-party systems. In P, we model those
also as state machines, and they interact with the program state
machines like other state machines.
Sketch 4 adds the timeout mechanism to the protocol shown in Sketch 3. When the main machine transitions to state Waiting, it sends a REQ event to the timer machine. Upon receiving this REQ event, the timer machine transitions to the TimerOn state. In state TimerOn, the timer machine spontaneously sends TIMEOUT events to the main machine and also raises the event which will cause itself to transition back to the Waiting state. Upon receiving a TIMEOUT event, the main machine stops waiting for the worker machines, and transitions back to the SendRequests state.
Specifying Requirements
Temperature and Specifying Liveness
Liveness or safety?
Writing P Programs
Types
Primitive Types
Enums
enum STATUS {SUCCESS,ERROR}
Tuples
var myTuple : (Type1, Type2, Type3);myTuple.0 //an instance of Type1myTuple.1 //an instance of Type2myTuple.2 //an instance of Type3
Structs
var myTuple : (f1: Type1, f2: Type2, f3: Type3);myTuple.f1 //an instance of Type1myTuple.f2 //an instance of Type2myTuple.f3 //an instance of Type3
Sequences
var mySeq : seq[Type1];
mySeq += (index, value)
mySeq -= index
mySeq[i]
To get the size:
sizeof(mySeq)
sizeof(mySeq)
To clear:
mySeq = default(seq[int])
mySeq = default(seq[int])
Maps
var myMap: map[Type1, Type2];
myMap[key] = value//ormyMap += (key, value)
myMap -= key
sizeof(myMap)
To clear:
myMap = default(map[Type1, Type2])
myMap = default(map[Type1, Type2])
Membership Check
if (mykey in myMap) {}
Custom Types
type Message = (value: string, timestamp: int);
Events
event MessageReceived: Message;
event MessageReceived assert N: Message;
UnhandledEvent Exception
halt Event
send MachineTobeHalted, halt;
null Event
on null do { send requester_machine, TIMEOUT, term; raise TIMEOUT, term; }
State Machines
The structure of a state machine is:annotation machine StateMachineName {
- variable defintions
- state definitions
}
line 6:0 extraneous input 'main' expecting {<EOF>, 'enum', 'event', 'eventset', 'machine', 'interface', 'fun', 'spec', 'type', 'module', 'implementation', 'test'}
test Test0 [main=SM1]: {SM1, SM2, SM3, SM4,...};
var myVariable : int;
annotation state StateName {
- entry {}
- defer
- ignore
- event handlers
- exit {}
}
entry (payload: PayLoadType) {}
defer event1, event2, ...;
ignore event1, event2, ...;
on event1 do (payload: PayLoadType) {//code for handling event1}on event2 do (payload: PayLoadType) {//code for handling event2}
Local Variables
entry (writeResp: tWriteTransResp) { var i: int; i = 0;//rest of the code block}
new
We can create a new instance of a machine using
the new keyword:
server = new Server();
We can also pass arguments to the constructor. These arguments will
be received by the entry block of the start state.
server = new Server();
this
return
entry (writeResp: tWriteTransResp) { // if the write was a time out lets not confirm it if(writeResp.status == TIMEOUT)return;//rest of the code block}
If
if (condition) {} else {}
Non-determinism
if ($) {//option 1} else {//option 2}
choose (mySeq) //non-deterministically choose one of the entries of mySeq
choose (5) //choose an int in [0-4]
Fair Non-determinism
if ($$) {//option 1} else {//option 2}
While Loop
while (condition) {}
goto
goto StateName, payload;
goto [with {}]
on EventName goto StateName
Optionally, we can provide a code block to be executed before transitioning to the destination:
on EventName goto StateName with {}
push/pop
on EventName push StateName;
send
send stateMachineInstance, EventName, eventPayload
send stateMachineInstance, EventName, (value = "my message", timestamp = 123456)
send chooose(seqOfStateMachineInstances), EventName, eventPayload
raise
raise EvenName [, payload];
state State1 {entry {raise SUCCESS;}on SUCCESS goto State2;}state State2 {on BEAT goto State1;}
state State1 {entry {goto State2;}}state State2 {on BEAT goto State1;}
state State1 {entry {send this, SUCCESS;}on SUCCESS goto State2;}state State2 {on BEAT goto State1;}
announce
Functions
fun myFunction (param1: Type1, param2: Type2) [: return_type] {}
Foreign functions
Functions without body are foreign functions. Their body will be implemented outside in C#.
fun ChooseTransaction(): tRecord;
print format
print format("Message value is {0} and its timestamp is {1}", myMessage.value, myMessage.timestamp)
assert
assert condition, format("message")
receive case
It causes the current code block to await an event.
receive { case eventName1 : (payload: PayLoadType) {
}
case eventName2 : (payload: PayLoadType) { }}
receive {case eventName1 : (payload: PayLoadType) {}case eventName2 : (payload: PayLoadType) {}}
Safety and Liveness Properties
Specification Machines
spec SpecMachineName observers EventName1, EventName2 {
- variable defintions
- state definitions
}
Enforcing Safety
spec MySafetyProperty observes M_PING, PONG { var pending: map[machine, int]; start state Init { on M_PING do (payload: machine) { if (!(payload in pending)) pending[payload] = 0; pending[payload] = pending[payload] + 1; assert (pending[payload] <= 3); }on PONG do (payload: machine) { assert (payload in pending); assert (0 < pending[payload]); pending[payload] = pending[payload] - 1; } } }
Enforcing Liveness
spec MyLivenessProperty observes M_START, OBS_EVENT_1, OBS_EVENT_2, ... { //variable declaration start state Init { on M_START goto Wait; } hot state Wait { //hot means it is not a good state to remain at forever entry { //initialized your vars } on OBS_EVENT_1 do {//suppose here the desired property is satisfied:raise UNIT;}on OBS_EVENT_2 do {} on UNIT goto Done; //leaving the hot state. } state Done { } //since it is not hot here, it is a good state to remain at forever }
Examples
HelloWorld
machine HelloWorld {start state Init {entry {assert (1==2), format("Hello {0}", "World");}}}test Test0 [main=HelloWorld]: {HelloWorld};
SimpleClientServer
- defining events,
- instantiating new instances of a machine with the new operator,
- using a receive case statement,
- using raise and goto, and
- finally, using functions.
event EVENT1: int;event Request: machine;event Response: int;machine Client {var myMap : map[string, int];var server: machine;start state Init {entry {server = new Server();send server, Request, this;receive {case Response: (payload: int) {raise EVENT1, payload;}}}on EVENT1 do (payload: int) {goto State2, testFunction(payload);}}state State2 {entry (payload: int){assert (1==2), format ("received EVENT1 with param {0}", payload);}}}machine Server {start state init {on Request do (payload: machine){send payload, Response, 34;}}}fun testFunction (num: int) :int {return num;}test Test0 [main=Client]: {Client, Server};
WorkerMachines
event REQ;event WORK_DONE;event ALL_WORK_DONE;event REQ_DONE;machine MainMachine {var workers : seq[machine];var workers_num: int;var received_num: int;start state Init {entry {var i: int;workers_num = 10;i = 0;while (i < workers_num) {workers += (i, new Worker(this));i = i + 1;}goto SendRequests;}}state SendRequests {entry {var i: int;received_num = 0;i = 0;while (i < workers_num) {send workers[i], REQ;i = i + 1;}send this, REQ_DONE;}on REQ_DONE goto Waiting;}state Waiting {on WORK_DONE do {received_num = received_num + 1;assert received_num <= workers_num, format ("unexpected number of WORK_DONES: max {0}, but received {1}", workers_num, received_num);if (received_num == workers_num) {send this, ALL_WORK_DONE;}}on ALL_WORK_DONE goto SendRequests;}}machine Worker {var requester_machine: machine;start state Init {entry (id: machine){requester_machine = id;goto Waiting;}}state Waiting {on REQ goto Working;}state Working {entry {send requester_machine, WORK_DONE;goto Waiting;}}}
<ExceptionLog> PImplementation.MainMachine(2) running action '' in state 'SendRequests' threw exception 'UnhandledEventException'. <ErrorLog> PImplementation.MainMachine(2) received event 'PImplementation.DONE' that cannot be handled.
state SendRequests {entry {var i: int;received_num = 0;i = 0;while (i < workers_num) {send workers[i], REQ;i = i + 1;}send this, REQ_DONE;}on REQ_DONE goto Waiting;on WORK_DONE do {received_num = received_num + 1;}}
WorkerMachines with Call Transitions
event REQ;event WORK_DONE;event CALL_DONE;event ALL_WORK_DONE;event REQ_DONE;event PING;event PONG;machine MainMachine {var workers : seq[machine];var workers_num: int;var received_num: int;start state Init {entry {var i: int;workers_num = 10;i = 0;while (i < workers_num) {workers += (i, new Worker(this));i = i + 1;}goto SendRequests;}}state SendRequests {entry {var i: int;received_num = 0;i = 0;while (i < workers_num) {send workers[i], REQ;i = i + 1;}send this, REQ_DONE;}on REQ_DONE goto Waiting;on WORK_DONE do {received_num = received_num + 1;}ignore PONG;}state Waiting {on WORK_DONE do {received_num = received_num + 1;assert received_num <= workers_num, format ("unexpected number of WORK_DONES: max {0}, but received {1}", workers_num, received_num);if (received_num == workers_num) {send this, ALL_WORK_DONE;}}on ALL_WORK_DONE goto SendRequests;on null do {send workers[0], PING;}ignore PONG;}}machine Worker {var requester_machine: machine;start state Init {entry (id: machine){requester_machine = id;goto Waiting;}}state Waiting {on REQ push Working;on PING do {send requester_machine, PONG;}on WORK_DONE do {send requester_machine, WORK_DONE;}}state Working {entry {send this, CALL_DONE;}on CALL_DONE do {raise WORK_DONE;}}}
WorkerMachines - Faulty
event REQ;event WORK_DONE;event ALL_WORK_DONE;event REQ_DONE;event RESTART;event PUSH_START;event INJECT_FAULT;machine FaultInjector {var machines : seq[machine];var fault_count: int;var max_faults_num: int;start state Init {entry (targets: seq[machine]){var i : int;fault_count = 0;max_faults_num = 1000;i=0;while (i < sizeof(targets)){machines += (i ,targets[i]);i = i + 1;}goto InjectFaults;}}state InjectFaults {entry {send this, INJECT_FAULT;}on INJECT_FAULT do {var m : machine;m = choose (machines);send m, RESTART;fault_count = fault_count + 1;if (fault_count < max_faults_num)send this, INJECT_FAULT;}}}machine MainMachine {var workers : seq[machine];var workers_num: int;var received_num: int;start state Init {entry {var i: int;var targets : seq[machine];targets += (0, this);workers_num = 10;i = 0;while (i < workers_num) {workers += (i, new Worker(this));targets += (i+1, workers[i]);i = i + 1;}new FaultInjector(targets);raise RESTART;}on RESTART do {received_num = 0;raise PUSH_START;}on PUSH_START push SendRequests;}state SendRequests {entry {var i: int;received_num = 0;i = 0;while (i < workers_num) {send workers[i], REQ;i = i + 1;}send this, REQ_DONE;}on REQ_DONE goto Waiting;on WORK_DONE do {received_num = received_num + 1;}}state Waiting {on WORK_DONE do {received_num = received_num + 1;assert received_num <= workers_num, format ("unexpected number of WORK_DONES: max {0}, but received {1}", workers_num, received_num);if (received_num == workers_num) {send this, ALL_WORK_DONE;}}on ALL_WORK_DONE goto SendRequests;ignore REQ_DONE; //bug fix}}machine Worker {var requester_machine: machine;start state Init {entry (id: machine){requester_machine = id;raise RESTART;}on RESTART push Waiting;}state Waiting {on REQ goto Working;}state Working {entry {send requester_machine, WORK_DONE;raise WORK_DONE;}on WORK_DONE goto Waiting;}}
<ErrorLog> Assertion Failed: unexpected number of WORK_DONES: max 10, but received 11
WorkerMachines - Fault-tolerant
event REQ: int;event WORK_DONE: int;event ALL_WORK_DONE: int;event REQ_DONE: int;event RESTART;event PUSH_START;event INJECT_FAULT;machine FaultInjector {var machines : seq[machine];var fault_count: int;var max_faults_num: int;start state Init {entry (targets: seq[machine]){var i : int;fault_count = 0;max_faults_num = 10;i=0;while (i < sizeof(targets)){machines += (i ,targets[i]);i = i + 1;}goto InjectFaults;}}state InjectFaults {entry {send this, INJECT_FAULT;}on INJECT_FAULT do {var m : machine;m = choose (machines);send m, RESTART;fault_count = fault_count + 1;if (fault_count < max_faults_num)send this, INJECT_FAULT;}}}machine MainMachine {var workers : seq[machine];var workers_num: int;var received_num: int;var term : int;start state Init {entry {var i: int;var targets : seq[machine];term = 0;targets += (0, this);workers_num = 10;i = 0;while (i < workers_num) {workers += (i, new Worker(this));targets += (i+1, workers[i]);i = i + 1;}new FaultInjector(targets);raise RESTART;}on RESTART do {term = term + 1 ;received_num = 0;raise PUSH_START;}ignore ALL_WORK_DONE;on PUSH_START push SendRequests;}state SendRequests {entry {var i: int;received_num = 0;i = 0;while (i < workers_num) {send workers[i], REQ, term;i = i + 1;}send this, REQ_DONE, term;}on REQ_DONE do (t: int) {if (t == term)goto Waiting;}on WORK_DONE do (t: int) {if (t == term) //bug fix (2)received_num = received_num + 1;}ignore ALL_WORK_DONE;}state Waiting {on WORK_DONE do (t: int) {if (t == term) {received_num = received_num + 1;assert received_num <= workers_num, format ("unexpected number of WORK_DONES: max {0}, but received {1}", workers_num, received_num);if (received_num == workers_num) {send this, ALL_WORK_DONE, term;}}}on ALL_WORK_DONE do (t: int){if (t == term)goto SendRequests;}ignore REQ_DONE; //bug fix (1)}}machine Worker {var requester_machine: machine;start state Init {entry (id: machine){requester_machine = id;raise RESTART;}on RESTART push Waiting;}state Waiting {on REQ goto Working;}state Working {entry (t: int) {send requester_machine, WORK_DONE, t;raise WORK_DONE, t;}on WORK_DONE goto Waiting;}}
WorkerMachines with Timeout
machine Timer {var requester_machine: machine;var term: int;start state Init {entry (id: machine){requester_machine = id;goto Waiting;}}state Waiting {on REQ goto TimerOn;}state TimerOn {entry (t: int){term = t;}on null do {send requester_machine, TIMEOUT, term;raise TIMEOUT, term;}on TIMEOUT goto Waiting;ignore REQ;}}
state Waiting {entry {send timer, REQ, term;}on WORK_DONE do (t: int) {if (t == term) {received_num = received_num + 1;assert received_num <= workers_num, format ("unexpected number of WORK_DONES: max {0}, but received {1}", workers_num, received_num);if (received_num == workers_num) {send this, ALL_WORK_DONE, term;}}}on ALL_WORK_DONE do (t: int){if (term == t && term < max_term){goto SendRequests;}on TIMEOUT do (t: int){if (term == t){raise RESTART;}}ignore REQ_DONE; //bug fix (1)}
WorkerMachines -- Checking Liveness
spec Liveness observes REQ, ALL_WORK_DONE {start cold state Init {on REQ goto Waiting;}hot state Waiting {on ALL_WORK_DONE goto Init;ignore REQ;}}
test Test0 [main=MainMachine]: assert Liveness in {MainMachine, Worker, ...};
<ErrorLog> PImplementation.Liveness detected liveness bug in hot state 'Waiting_2' at the end of program execution.
state Waiting {entry { //Liveness bug fixif (received_num == workers_num) {send this, ALL_WORK_DONE, term;}}...
Model-Checking P Programs
Configuring P Projects
Following the official tutorial examples, we can organize our P
project files this way:
-
PSrc: the source code of our state machines
describing our protocol.
-
PSpec: the source code for our specification state
machines defining our desired requirements.
-
PTst: the code of the test driver. It basically
describes the deployment and properties to be
model-checked.
-
Foreign code written in C#.
-
ProjeName.pproj file that describes the project
files.
Following is an example for a .pproj file:
<Project>
<ProjectName>TwoPhaseCommit</ProjectName>
<InputFiles>
<PFile>./PSrc/</PFile>
<PFile>./PSpec/</PFile>
<PFile>./PTst/</PFile>
</InputFiles>
<Target>CSharp</Target>
<OutputDir>./PGenerated/</OutputDir>
</Project>
- PSrc: the source code of our state machines describing our protocol.
- PSpec: the source code for our specification state machines defining our desired requirements.
- PTst: the code of the test driver. It basically describes the deployment and properties to be model-checked.
- Foreign code written in C#.
- ProjeName.pproj file that describes the project files.
<Project> <ProjectName>TwoPhaseCommit</ProjectName> <InputFiles> <PFile>./PSrc/</PFile> <PFile>./PSpec/</PFile> <PFile>./PTst/</PFile> </InputFiles> <Target>CSharp</Target> <OutputDir>./PGenerated/</OutputDir> </Project>
Writing Test Scripts
test Test0 [main=MainMachine]:assert SpecMachine1, SpecMachine 2, ... in {MainMachine, Worker};
Using the Compiler and Model-checker
pc -proj:ProjectName.pproj
function pmc { dotnet %USERPROFILE%\.dotnet\tools\microsoft.coyote\1.0.5\lib\netcoreapp3.1\coyote.dll test $args}
coyote.exe test [path to my generated dll] -i 10000 --coverage activity -m PImplementation.Test0.Execut
Summary
References
[2] P: A Domain-Specific Language for Asynchronous Event-Driven Programming [video], https://www.youtube.com/watch?v=qjwrPMwNrFo
[3] P Github repository, https://github.com/p-org/P
[4] P Examples, https://github.com/roohitavaf/P/tree/master/Tutorial
Comments
Post a Comment