This assignment is to complete the high-level design of a directory-based multiprocessor cache coherence protocol, while using Murphi to verify the design.
A completed solution will consist of the following:
I encourage you to discuss this problem and your solutions with others, but I don't want you to share code. So don't borrow from the text of others solutions.
This is hard. Start as soon as you can.
Please direct all questions to me (call 5-3642, drop by, or email me at "dill@cs.stanford.edu").
If there are few processors, one solution is to have all of them "snoop" the memory bus, so everyone can see what everyone else is doing. If a processor asks for a writable copy of a memory value, other processors that have readable copies can invalidate them immediately, because they see the request from the first processor.
In a large multiprocessor, it becomes too expensive to broadcast all requests to all processors (which are no longer sharing a common bus). Instead, a directory-based protocol sends messages to only those processors who need to know about a transaction. For example, when a processor asks for a writable copy, only those processors that have read-only copies are notified (so they can invalidate their copies). To do this, the multiprocessor needs to know which processors have copies of each memory location. This information is kept in a directory. (In reality, "memory locations" are multi-word blocks of memory, called "cache lines" when they are in the cache.)
These protocols can be very tricky, especially when they are optimized. Murphi has been used to verify a number of them, and now is pretty well suited for the task. Your assignment is to take a core protocol and add some missing cases to it, with optimizations.
Each memory location in the cache has some associated state information. It can be invalid (I, meaning the location is not cached), read-only (RO), meaning that the value can be read but not written, or writable (W), meaning that the value can be updated. There are additional states that keep track of transactions in progress; for example, Wait_Read means that the cache has sent out a request for a readable copy of an address, but hasn't received it yet.
When a writable value is updated, it is not automatically written back to the main memory, so the value in the cache may be newer than the main memory.
The main memory has a copy of each memory location and its value. In addition, there is a directory entry for each memory location. The directory entry has a state, which says whether the location is writable in some cache (the state is called Excl, since that cache should have exclusive access to the location), readable (the Shared state), or not cached (Invalid). In addition, the directory entry has a field called "users" holds the set of processor ids that have the location cached read-only, represented as a bit-vector, and a field called "owner," which is the id of the latest processor to have an exclusive copy (I would like to have avoided the "owner" field, but I couldn't find an efficient and correct protocol without it, given the other decisions I made).
This protocol is strongly influenced by the nature of the communications network that handles the messages. It is reliable (no messages are lost or duplicated), but it provides no guarantees about ordering. Two messages can be sent by the same processor, and the first can be received long after the second. This assumption would be appropriate if the communication network could route consecutive messages differently. (Many multiprocessors assume message ordering will be preserved, though.)
Unfortunately, this is a hard problem. Instead, we will specify some simple invariant conditions which will catch many errors. The most important conditions are assertions and error messages about illegal combinations of states (in the processors or directory) and messages. There should also be global invariants to check that there is no more than one exclusive copy of an address at any time, and that there are never both shared and exclusive copies, that the directory state correctly represents the states of the caches, etc.
Another major limitation of Murphi is that it can't check "liveness" properties. For example, we would like to be able to prove that every transaction eventually completes, but there is no way to do that. Murphi can check for deadlocks, whether the protocol gets totally stuck (you'll probably have some of these), but it will miss problems where a message is ping-ponged back and forth forever, or part of the system freezes while the rest keeps going, even though some activity will fail to make further progress. The problem is that these conditions can be fatal design errors. They can also hide other problems: e.g., the protocol would have fail catastrophically, but it didn't get to that point because something else got stuck.
The assignment is to do the following:
Work on one transaction at a time, until you get it working.
Think of Murphi as an astute critic who can always find the example where your protocol doesn't work. Fixing it by trial and error only works for minor problems. At several points, you should stop and re-think using pencil and paper.
At first, verify with the smallest scaling parameters you can get away with. There are several advantages to this:
You also need to run large instances of the description to uncover errors that happen, say, with 3 processors instead of 2. You should try this as soon as you can (I found such a bug that caused me to rewrite a lot of the protocol). Then go back to the smaller scaling parameters when you are revising the protocol.
Even when you think you understand an error, look through the counterexample carefully (at least, whenever you get a new error). You will probably see a number of weird things that will increase your understanding of the protocol, uncover conditions you need to check, or reveal bugs that you need to fix.
More on the previous point: remember that Murphi doesn't check every property your protocol should have. It can't check for "livelock", where part of the system is making no progress but there is some other activity that prevents it from being a deadlock. It can't detect most performance problems in your protocol. You need to keep thinking, and not rely on Murphi for everything. Specification suggestions:
Don't use the multiset type for anything except the message network. There are a number of surprising restrictions that will cause you problems.
Murphi sometimes reports "internal error". This looks like a system problem but usually isn't. Run-time type checks, bounds checks, etc. are treated like invariants, and Murphi is reporting that they are violated. One such message is something about the network overflowing. If your protocol goes berserk and generates lots of messages, or if you forget to remove messages from the network after processing them, you get this problem (or if you make your network too small).
I've gotten it running on the saga systems. I will make this publically accessible right after the lecture on Monday.
Murphi sources are in ~dill/Murphi3.1 on DCC (e.g. epic) machines. It only runs on Solaris at the moment.
~dill/list5.mto your directory. Run
~dill/mu list5.mRun
g++ -I ~dill/murphi_include list5.CIgnore the warning about an abstract declarator.
Run
a.out -tfCheck
a.out -hfor more information about options.
Core description of the protocol
Copyright © by David L. Dill, 2001, 2002, 2003, 2004. All rights reserved. This material may be freely distributed for educational purposes, so long as (1) this copyright notice remains, (2) the notes are not modified in any way, and (3) no fee is charged. Other uses require the express written consent of the authors.