C# Static Analysis, possible values for a variable/parameter
Asked Answered
N

1

8

In code similar to each of the following examples, I would like to be able to statically analyze code to determine the list of possible values that are passed into SpecialFunction().

SpecialFunction(5); // A

int x = 5;
SpecialFunction(x); // B

int x = 5;
x = condition ? 3 : 19;
SpecialFunction(x); // C

I can already parse the C# into an abstract syntax tree and I can already handle cases like A, and I guess I could keep track of initial assignments of values to guess case B, but cases as easy as C seem to get complicated quickly.

I'm almost certain that we won't be able to solve for x in all cases statically and that's OK. I would like to know strategies for attempting it, ways to recognize when it can't be done. What if we need to include class-level fields and multi-threading? Closures? Would it help if we know that for the set X of all possible values for x, |X| < 50?

From @Vladimir Perevalov's suggestion, how could the concepts in Pex be applied to finding possible values for targeted code points (rather than what Pex seems to do which is to discover code paths and values that lead to unchecked(?) exceptional cases)?

Nectarine answered 13/4, 2012 at 17:33 Comment(2)
+1. Don't think it's a "duty" of static analysis to guess possible failure/success of the specified parameter value range, it's most likely the "duty" of dynamic analysis. But even if so, in this case you deal with only a couple of parameters, what if you deal with a function, ike, say, IEnumerable<int>GetValuesForx(...) ?Gomulka
@Gomulka - Running or simulating the program (or fragment) isn't an option. Also, clearly there are SOME cases where static analysis can provide the answer - and clearly there are SOME cases where it could not. I'm trying to identify and achieve the cases where it could.Nectarine
S
4

What you want is a both global data flow analysis ("what value assignments/side effects reach what usage points") [which requires control flow analysis as a precursor] and some kind of range analysis ("summarizing the set of values that can reach a point").

Computing data flow requires having a full C# front end, local control and data flow analysis, and then stitching those answers together into global data flow analysis.

Range analysis requires you first define how you intend to encode the set of possible values; what system of specifications is allowed? The simplest, just a set of values, tends to explode. An intermediate specification scheme would be something like the OP's single-relational-to-constant, e.g, "x < 50" . The trouble with any such limited scheme is that the richness of the set of values may cause you to get useless answers especially if there are other predicates of interest (if x is always odd, the single-relational-to-constant can only model this as "x < infinity" which is clearly not helpful. So, you want to choose a specification scheme which is complicated enough to model that kinds of values interest you. However, as your specification scheme gets more sophisticated, the machinery to infer those facts correctly get more complicated, so you can't make it too complicated.

Mostly the analysis tools available do not have such analyses, let alone exposed for you to you. PEX may indeed have such machinery; if you are lucky it is exposed, too.

Our DMS Software Reengineering Toolkit has generic parsing, symbol table building, control/data flow analysis and indeed even range analysis machinery (specification: x < k1*a+k2*b where k1 and k2 are constants, a and b are other program variables visibile where x is consumed). DMS has C#, Java, GNU C and COBOL front ends, and we have in fact instantiated this machinery for GNU C and IBM Enterprise COBOL (and partially for Java 7) by collecting (static analysis!) facts specific to those languages and feeding these facts to the generic machinery. We have not instantiated this machinery for C#, yet. But if you can't get a good answer from another source, this is likely pretty close.

Simony answered 13/4, 2012 at 23:13 Comment(4)
I was able to get pretty far with MSR Roslyn. I used it just as far as parsing to the AST and did my own (flawed) reasoning without much effort. I think Roslyn can do more for me, but it isn't documented well. It was enough for a proof of concept. I might check with your company if this project goes forward.Nectarine
I would guess that Roslyn does little global flow analysis. As I understand it, C# is mostly a JIT compiler.Simony
What does JITed-ness have to do with global flow analyzability?Nectarine
@Jason: 1) If you're JITing, the tool ("C# compiler", which I thought Roslyn was) that converts source code to pCode (CIL) doesn't need to compile highly optimized pCode, 2) often you get really good separate compilation, which works against global optimization. One expects the runtime JIT'er to do more sophisticated analysis based on dynamic measurements; it may or may not have global optimizations but it at least has access to the "whole" program (modulo dynamically loaded stuff).Simony

© 2022 - 2024 — McMap. All rights reserved.