Here are some preliminary benchmarks for running a model checking application implemented in Haskell (GHC) on Intel's Manycore-Testing-Lab. The diagrams show the relative speed-ups when running the program compiled for multi-threading compared to the same program compiled with the non-threaded GHC runtime system.
These benchmarks were run as a quick test of how much speed-up is possible for our application. We were just trying to pick the low hanging fruit, here. The project under consideration, consists of ca 9000 lines of Haskell code and it can easily make up something in the order of one working year or more. (This is hard to estimate, but this is not a toy program) On the other hand, extending our program for multi-threading took less than three days.
Given how easy it was to parallelize the program, the achieved speed-ups are very reasonable. For our application, it is fair to say that Haskell and parallelism work together really well.
- The model checker enumerates the state space of a specification. The expensive part is to compute the transition relation, i.e. how to move from one state to the next. Computing the transitions of several states can easily run in parallel. To parallelize the model checker we just had to replace the depth-first traversal of the state space with a breadth-first traversal and add some small annotations.
- The breadth-first traversal can only provide speed-ups if the state space of the specification offers enough branching. The specifications that were selected for the diagrams all have highly branching state spaces. We did not see much speed-up for single threaded specifications. On the other hand, the state explosion problem, i.e. highly branching state spaces is exactly what makes model checking expensive.
- In our application the transition relation can naturally be modeled as a pure function and indeed the specification language underlying our model checker (CSP-M) is itself a pure functional programming language. When writing the model checker we did anticipate that using pure functions would facilitate parallelization.
- Our model checker is a niche application and it is hard to estimate the base performance of the program. Still, we have compared the performance of our program with two competitors and it overall seems to be reasonable. It is not the case that we start with an extra slow program in first place.
The underlying formalism of our model checker is called CSP (Communicating Sequential Processes). The heart of the model checker is an interpreter for a programming language called CSP-M (machine readable CSP), which is basically a syntax for CSP extended with a functional sub-language.
Currently there exist four different tools which implement CSP-M.
The example specifications are the hanoi puzzle, a specification of a rail road crossing and a specification of a scheduler (Note: The specifications are NOT Haskell syntax, but CSP-M).
The benchmarks were run for GHC-7.0.20101021. The implementation uses so called semi-explicit parallelism. Just to show some source code, here is the actual parallel breadth-first search, that was written for the benchmark:
mkLtsPar :: Sigma INT-> Process INT -> LTS
mkLtsPar events process
= wave [mkLtsNode process] Map.empty
wave :: [LtsNode] -> LTS -> LTS
wave  lts = lts
wave w lts = wave (Set.toList uniqueProcesses) newLts
!transitions = parRules $ map processNext w
processes = concatMap (\(_,_,r) -> r) transitions
processNext :: LtsNode -> (LtsNode, [Rule INT], [LtsNode])
processNext p = (p, rules, map (mkLtsNode . viewProcAfter) rules)
where rules = computeTransitions events $ nodeProcess p
!newLts = List.foldl' insertTransition lts transitions
insertTransition :: LTS -> (LtsNode, [Rule INT], [LtsNode]) -> LTS
insertTransition l (p, rules, _) = Map.insert p rules l
uniqueProcesses :: Set LtsNode
!uniqueProcesses = List.foldl' insertProcess Set.empty processes
insertProcess :: Set LtsNode -> LtsNode -> Set LtsNode
insertProcess s p = if Map.member p newLts
else p `Set.insert` s
[(LtsNode, [Rule INT], [LtsNode ])]
-> [(LtsNode, [Rule INT], [LtsNode ])]
parRules = withStrategy $ parList $ seqTriple r0 (parList rwhnf) (parList rwhnf)
The full source code can be found on hackage.