Tool for model checking large, distributed C++ projects such as KDE?

Asked 10/11, 2010 at 6:42 Answered 10/2, 2011 at 1:45

Solved c++static-analysis formal-methods formal-verification model-checking

Is there a tool which can handle model checking large, real-world, mostly-C++, distributed systems, such as KDE?

(KDE is a distributed system in the sense that it uses IPC, although typically all of the processes are on the same machine. Yes, by the way, this is a valid usage of "distributed system" - check Wikipedia.)

The tool would need to be able to deal with intraprocess events and inter-process messages.

(Let's assume that if the tool supports C++, but doesn't support other stuff that KDE uses such as moc, we can hack something together to workaround that.)

I will happily accept less general (e.g. static analysers specialised for finding specific classes of bugs) or more general static analysis alternatives, in lieu of actual model checkers. But I am only interested in tools that can actually handle projects of the size and complexity of KDE.

Vestpocket answered 10/11, 2010 at 6:42 Comment(3)

Are there large non-real-world mostly C++ distributed systems? :-) – Laryngotomy 10/11, 2010 at 6:54

Can you clarify what you mean by model checking? – Heidi 10/11, 2010 at 6:54

I'm sure he means state-space checking for properties (some specific instance of en.wikipedia.org/wiki/Model_checking), and he has given us a hint by telling us he has a distributed system using messaging primitives, so there's an implied set of states formed by process interactions. But the details matter. – Mccandless 10/11, 2010 at 11:21

You're obviously looking for a static analysis tool that can

parse C++ on scale
locate code fragments of interest
extract a model
pass that model to a model checker
report that result to you

A significant problem is that everybody has a different idea about what model they'd like to check. That alone likely kills your chance of finding exactly what you want, because each model extraction tool has generally made a choice as to what it wants to capture as a model, and the chances that it matches what you want precisely are IMHO close to zero.

You aren't clear on what specifically you want to model, but I presume you want to find the communication primitives and model the process interactions to check for something like deadlock?

The commercial static analysis tool vendors seem like a logical place to look, but I don't think they are there, yet. Coverity would seem like the best bet, but it appears they only have some kind of dynamic analysis for Java threading issues.

This paper claims to do this, but I have not looked at in any detail: Compositional analysis of C/C++ programs with VeriSoft. Related is [PDF] Computer-Assisted Assume/Guarantee Reasoning with VeriSoft. It appears you have to hand-annotate the source code to indicate the modelling elements of interest. The Verifysoft tool itself appears to be proprietary to Bell Labs and is likely hard to obtain.

This paper also makes interesting claims, but doesn't process C++ in spite of the title: Runtime Model Checking of Multithreaded C/C++ Programs.

While all the parts of this are difficult, an issue they all share is parsing C++ (as exemplified by the previously quoted paper) and finding the code patterns that provide the raw information for the model. You also need to parse the specific dialect of C++ you are using; its not nice that the C++ compilers all accept different languages. And, as you have observed, processing large C++ codes is necessary. Model checkers (SPIN and friends) are relatively easy to find.

Our DMS Software Reengineering Toolkit provides for general purpose parsing, with customizable pattern matching and fact extraction, and has a robust C++ Front End that handles many dialects of C++ (EDIT Feb 2019: including C++17 in Ansi, GCC and MS flavors). It could likely be configured to find and extract the facts that correspond to the model you care about. But it doesn't do this this off the shelf.

DMS with its C front end have been used to process extremely large C applications (19,000 compilation units!). The C++ front end has been used in anger on a variety of large-scale C++ projects (EDIT Feb 2019: including large scale refactoring of APIs across 3000+ compilation units). Given DMS's general capability, I think it likely capable of handling fairly big chunks of code. YMMV.

Mccandless answered 10/11, 2010 at 11:18 Comment(0)

Static code analyzers when used against large code base first time usually produce so many warnings and alerts that you won't be able to analyze all of them in reasonable amount of time. It is hard to single out real problems from code that just look suspicious to a tool.

You can try to use automatic invariant discovery tools like "Daikon" that capture perceived invariants at run time. You can validate later if discovered invariants (equivalence of variables "a == b+1" for example) make sense and then insert permanent asserts into your code. This way when invariant is violated as result of your change you will get a warning that perhaps you broke something by your change. This method helps to avoid restructuring or changing your code to add tests and mocks.

Kilohertz answered 17/12, 2010 at 6:51 Comment(0)

The usual way of applying formal techniques to large systems is to modularise them and write specifications for the interfaces of each module. Then you can verify each module independently (while verifying a module, you import the specifications - but not the code - of the other modules it calls). This approach makes verification scalable.

Holierthanthou answered 10/2, 2011 at 1:45 Comment(1)

This would be better as a comment on the question than an answer. I appreciate that this is one approach, but it's actually kind of an expensive approach, and anyway it doesn't answer the concrete question: which tool would be appropriate for this kind of real-world use? – Vestpocket 12/2, 2011 at 9:38

Recommended topics

Hot tags