Extracting the complete call graph of a scala project (tough one)

Asked 22/4, 2015 at 0:15 Answered 27/4, 2015 at 20:28

scala scalac scala-compiler scala-macro-paradise

I would like to extract from a given Scala project, the call graph of all methods which are part of the project's own source.

As I understand, the presentation compiler doesn't enable that, and it requires going down all the way down to the actual compiler (or a compiler plugin?).

Can you suggest complete code, that would safely work for most scala projects but those that use the wackiest dynamic language features? for the call graph, I mean a directed (possibly cyclic) graph comprising class/trait + method vertices where an edge A -> B indicates that A may call B.

Calls to/from libraries should be avoided or "marked" as being outside the project's own source.

EDIT:

See my macro paradise derived prototype solution, based on @dk14's lead, as an answer below. Hosted on github at https://github.com/matanster/sbt-example-paradise.

Rossner answered 22/4, 2015 at 0:15 Comment(5)

If you really like, to augment it with code for separately extracting type/class hierarchies and mixin relationships, it is also, very much appreciated. – Rossner 24/4, 2015 at 8:57

The tool Degraph does this on class level, not (yet) on method level. But I see no conceptual problem to extend it to method level. But the comment/edit of @Andrey Tyukin make me think I am missing something. (Note: I'm the author of Degraph and Degraph analyses class files using ASM) – Remains 24/4, 2015 at 9:5

The Scala.js linker contains a call-graph analyzer with method-level granularity. However, it acts on JVM style classes as of now (single class, multiple interface inheritance), but has knowledge of Scala modules (aka objects). Further, it is unable to handle files from your project only (but a-posteriori name matching could be applied). Lastly, it will not be able to handle cyclic *.java/*.scala dependencies. If this is acceptable, I'll draft a full response on how to do this. – Coca 25/4, 2015 at 7:53

In what sense can't it handle cyclic dependencies? will it never halt? – Rossner 27/4, 2015 at 10:54

@matt Your link does not work – Sjoberg 29/4, 2015 at 15:0

Here's the working prototype, which prints the necessary underlying data to the console as a proof of concept. http://goo.gl/oeshdx.

How This Works

I have adapted the concepts from @dk14 on top boilerplate from macro paradise.

Macro paradise lets you define an annotation that will apply your macro over any annotated object in your source code. From there you have access to the AST that the compiler generates for the source, and scala reflection api can be used to explore the type information of the AST elements. Quasiquotes (the etymology is from haskell or something) are used to match the AST for the relevant elements.

More about Quasiquotes

The generally important thing to note is that quasiquotes work over an AST, but they are a strange-at-first-glance api and not a direct representation of the AST (!). The AST is picked up for you by paradise's macro annotation, and then quasiquotes are the tool for exploring the AST at hand: you match, slice and dice the abstract syntax tree using quasiquotes.

The practical thing to note about quasiquotes is that there are fixed quasiquote templates for matching each type of scala AST - a template for a scala class definition, a template for a scala method definition, etc. These tempaltes are all provided here, making it very simple to match and deconstruct the AST at hand to its interesting constituents. While the templates may look daunting at first glance, they are mostly just templates mimicking the scala syntax, and you may freely change the $ prepended variable names within them to names that feel nicer to your taste.

I still need to further hone the quasiquote matches I use, which currently aren't perfect. However, my code seems to produce the desired result for many cases, and honing the matches to 95% precision may be well doable.

Sample Output

found class B
class B has method doB
found object DefaultExpander
object DefaultExpander has method foo
object DefaultExpander has method apply
  which calls Console on object scala of type package scala
  which calls foo on object DefaultExpander.this of type object DefaultExpander
  which calls <init> on object new A of type class A
  which calls doA on object a of type class A
  which calls <init> on object new B of type class B
  which calls doB on object b of type class B
  which calls mkString on object tags.map[String, Seq[String]](((tag: logTag) => "[".+(Util.getObjectName(tag)).+("]")))(collection.this.Seq.canBuildFrom[String]) of type trait Seq
  which calls map on object tags of type trait Seq
  which calls $plus on object "[".+(Util.getObjectName(tag)) of type class String
  which calls $plus on object "[" of type class String
  which calls getObjectName on object Util of type object Util
  which calls canBuildFrom on object collection.this.Seq of type object Seq
  which calls Seq on object collection.this of type package collection
  .
  .
  .

It is easy to see how callers and callees can be correlated from this data, and how call targets outside the project's source can be filtered or marked out. This is all for scala 2.11. Using this code, one will need to prepend an annotation to each class/object/etc in each source file.

The challenges that remain are mostly:

Challenges remaining:

This crashes after getting the job done. Hinging on https://github.com/scalamacros/paradise/issues/67
Need to find a way to ultimately apply the magic to entire source files without manually annotating each class and object with the static annotation. This is rather minor for now, and admittedly, there are benefits for being able to control classes to include and ignore anyway. A preprocessing stage that implants the annotation before (almost) every top level source file definition, would be one nice solution.
Honing the matchers such that all and only relevant definitions are matched - to make this general and solid beyond my simplistic and cursory testing.

Alternative Approach to Ponder

acyclic brings to mind a quite opposite approach that still sticks to the realm of the scala compiler - it inspects all symbols generated for the source, by the compiler (as much as I gather from the source). What it does is check for cyclic references (see the repo for a detailed definition). Each symbol supposedly has enough information attached to it, to derive the graph of references that acyclic needs to generate.

A solution inspired by this approach may, if feasible, locate the parent "owner" of every symbol rather than focus on the graph of source files connections as acyclic itself does. Thus with some effort it would recover the class/object ownership of each method. Not sure if this design would not computationally explode, nor how to deterministically obtain the class encompassing each symbol.

The upside would be that there is no need for macro annotations here. The downside is that this cannot sprinkle runtime instrumentation as the macro paradise rather easily allows, which could be at times useful.

Rossner answered 27/4, 2015 at 20:28 Comment(2)

Btw the etymology of the word "quasiquote" in programming languages goes way back to Lisps. – Niobe 8/5, 2015 at 11:30

I later implemented it all as a compiler plugin. In retrospect, it was really silly to choose macro paradise. – Rossner 20/10, 2015 at 16:9

It requires more precise analysis, but as a start this simple macro will print all possible applyies, but it requires macro-paradise and all traced classess should have @trace annotation:

class trace extends StaticAnnotation { 
  def macroTransform(annottees: Any*) = macro tracerMacro.impl
}

object tracerMacro {

  def impl(c: Context)(annottees: c.Expr[Any]*): c.Expr[Any] = {

    import c.universe._
    val inputs = annottees.map(_.tree).toList
    def analizeBody(name: String, method: String, body: c.Tree) = body.foreach {
      case q"$expr(..$exprss)" => println(name + "." + method + ": " + expr)
      case _ =>
    }


    val output = inputs.head match {
      case q"class $name extends $parent { ..$body }" =>
        q"""
            class $name extends $parent {
              ..${
                   body.map {
                       case x@q"def $method[..$tt] (..$params): $typ = $body" =>
                         analizeBody(name.toString, method.toString, body)
                         x

                       case x@q"def $method[..$tt]: $typ = $body" =>
                         analizeBody(name.toString, method.toString, body)
                        x

                   }
                 }
            }
          """
      case x => sys.error(x.toString)
    }




    c.Expr[Any](output)
  }
}

Input:

  @trace class MyF {
    def call(param: Int): Int = {
      call2(param)
      if(true) call3(param) else cl()
    }
    def call2(oaram: Int) = ???
    def cl() = 5
    def call3(param2: Int) = ???
  }

Output (as compiler's warnings, but you may output to file intead of println):

  Warning:scalac: MyF.call: call2
  Warning:scalac: MyF.call: call3
  Warning:scalac: MyF.call: cl

Of course, you might want to c.typeCheck(input) it (as now expr.tpe on found trees is equals null) and find which class this calling method belongs to actually, so the resulting code may not be so trivial.

P.S. macroAnnotations give you unchecked tree (as it's on earlier compiler stage than regular macroses), so if you want something typechecked - the best way is surround the piece of code you want to typecheck with call of some your regular macro, and process it inside this macro (you can even pass some static parameters). Every regular macro inside tree produced by macro-annotation - will be executed as usual.

Tracheostomy answered 24/4, 2015 at 12:12 Comment(16)

If I understand correctly, this macro matches on method definition syntax (?) but only catches a subset of the ways a method may be defined in Scala. E.g. double parameter lists, a method that skips the equals sign, and other cases that even simple scala projects (e.g. my own) use. So this is more like reverse engineering the syntax rather than safely using the wisdom of a compiler/plugin isn't it? the macro annotations are nice though, good to know. – Rossner 24/4, 2015 at 15:16

Can you use typeCheck inside a macro? does it work at compile time? – Rossner 24/4, 2015 at 15:20

compiler/plugin will deal with same AST as the macro. Actually macro deals with small subset of compiler-API, so there is no unified way to do that - you have to deal with all nuances by yourself. I think this macro will be fine with "no equals sign" situation – Tracheostomy 24/4, 2015 at 15:21

2) yes it should, but the AST should be trully compilable - you can't just pass a piece of class' AST – Tracheostomy 24/4, 2015 at 15:21

Thanks :-) which versions of Scala and macro-paradise did you use to run this? nice approach. – Rossner 24/4, 2015 at 15:22

Ideally though there should be an approach that uses the compiler architecture, I assume that is how IDE's that are half-adapted to scala try to come up with the requested intelligence. Beyond AST, the compiler ultimately writes bytecode for methods - so I would assume it knows more than just an AST (along its 27 steps of processing!). – Rossner 24/4, 2015 at 15:27

Paradise 2.0.1 + scala 2.10/2.11, I used my other project as prototype: github.com/dk14/println-tracer – Tracheostomy 24/4, 2015 at 15:27

AST is result of parsing, which goes to next levels of compilers, macros is a handler on one of these levels. After that (maybe few levels more) - AST goes to the stage, which generates bytecode, but as far as I know this stage is unavailable even for compiler-plugin. If you want to work directly with untyped bytecode - asm may help – Tracheostomy 24/4, 2015 at 15:31

just to mention - in order to be fast, scala's compiler is pretty messy, mutable, unsafe and not pure. It reminds me programming on C/C++, so, just to mention – Tracheostomy 24/4, 2015 at 15:33

Well if it's not the slow pacing elephant in the room of Scala philosophy becoming noticeable here and there... I wonder if macro-paradise won't crash larger beasts like libraries that themselves use macros (for providing DSL or otherwise). Since here you only annotate each class as the hookpoint, maybe there is no adverse interaction. Will library macros be expanded by the compiler before or after your own? – Rossner 24/4, 2015 at 15:47

It first expands macro-annotation (it's on earlier compiler level) - that's why you have unchecked tree here (in comparision with regular macro). Actually typechecking might be a problem here - but it's solvable by surrounding the pieces of code you want to process typechecked with your regular macro (just surround it with application of your macro-function). After macro-annotation expansion - all regular macroses (including added by you) will be executed. This trick should work - at least I've recommnded it to jscala project and they still use it – Tracheostomy 24/4, 2015 at 15:53

Boy are quasiquotes one cute api for manipulating AST. I guess this here shows off how to deconstruct a method definition - from which you draw the confidence that the match in your code will catch all method definitions. – Rossner 24/4, 2015 at 22:56

you may also want to catch finction definitions, like val a = (z: Int) => call4 and bodies of the val/var in general (maybe even bodies of the inner objects and classes). P.S. One more surprisely good thing about is Tree.foreach (and also Tree.find) use deep traversion so can find a requested structure everywhere in your code. – Tracheostomy 25/4, 2015 at 0:41

I'm a bit confused about scala 2.11 support in github.com/dk14/println-tracer and macro paradise. As cloned, print-tracer seems to build against 2.10. Switching to build against 2.11.5 yields few compilation errors. How much can the 2.10 build result of it analyze 2.11 projects? I'd prefer trying macro-paradise in scala 2.11 code, but see no stable version of macro-paradise built with 2.11 on sonatype. – Rossner 27/4, 2015 at 10:48

oops. I've commited scala 2.11.0-compatible version now. 2.0.1-paradise is still fine as it's crossbuild (cross CrossVersion.full) here, you may try to upgrade to newer version, but I didn't check them yet. – Tracheostomy 27/4, 2015 at 11:52

answering question, I don't think macro compiled for 2.10.0 can be normally executed in 2.11.0 compiler. At least, my println-tracer wouldn't work at all without the last commit. – Tracheostomy 27/4, 2015 at 12:5

Edit
The basic idea in this answer was to bypass the (pretty complex) Scala compiler completely, and extract the graph from the generated .class files in the end. It appeared that a decompiler with sufficiently verbose output could reduce the problem to basic text manipulation. However, after a more detailed examination it turned out that this is not the case. One would just get back to square one, but with obfuscated Java code instead of the original Scala code. So this proposal does not really work, although there is some rationale behind working with the final .class files instead of intermediate structures used internally by the Scala compiler.
/Edit

I don't know whether there are tools out there that do it out of the box (I assume that you have checked that). I have only a very rough idea what the presentation compiler is. But if all that you want is to extract a graph with methods as nodes and potential calls of methods as edges, I have a proposal for a quick-and-dirty solution. This would work only if you want to use it for some sort of visualization, it doesn't help you at all if you want to perform some clever refactoring operations.

In case that you want to attempt building such a graph-generator yourself, it might turn out much simpler than you think. But for this, you need to go all the way down, even past the compiler. Just grab your compiled .class files, and use something like the CFR java decompiler on it.

When used on a single compiled .class file, CFR will generate list of classes that the current class depends on (here I use my little pet project as example):

import akka.actor.Actor;
import akka.actor.ActorContext;
import akka.actor.ActorLogging;
import akka.actor.ActorPath;
import akka.actor.ActorRef;
import akka.actor.Props;
import akka.actor.ScalaActorRef;
import akka.actor.SupervisorStrategy;
import akka.actor.package;
import akka.event.LoggingAdapter;
import akka.pattern.PipeToSupport;
import akka.pattern.package;
import scala.Function1;
import scala.None;
import scala.Option;
import scala.PartialFunction;
...
(very long list with all the classes this one depends on)
...
import scavenger.backend.worker.WorkerCache$class;
import scavenger.backend.worker.WorkerScheduler;
import scavenger.backend.worker.WorkerScheduler$class;
import scavenger.categories.formalccc.Elem;

Then it will spit out some horribly looking code, that might look like this (small excerpt):

public PartialFunction<Object, BoxedUnit> handleLocalResponses() {
    return SimpleComputationExecutor.class.handleLocalResponses((SimpleComputationExecutor)this);
}

public Context provideComputationContext() {
    return ContextProvider.class.provideComputationContext((ContextProvider)this);
}

public ActorRef scavenger$backend$worker$MasterJoin$$_master() {
    return this.scavenger$backend$worker$MasterJoin$$_master;
}

@TraitSetter
public void scavenger$backend$worker$MasterJoin$$_master_$eq(ActorRef x$1) {
    this.scavenger$backend$worker$MasterJoin$$_master = x$1;
}

public ActorRef scavenger$backend$worker$MasterJoin$$_masterProxy() {
    return this.scavenger$backend$worker$MasterJoin$$_masterProxy;
}

@TraitSetter
public void scavenger$backend$worker$MasterJoin$$_masterProxy_$eq(ActorRef x$1) {
    this.scavenger$backend$worker$MasterJoin$$_masterProxy = x$1;
}

public ActorRef master() {
    return MasterJoin$class.master((MasterJoin)this);
}

What one should notice here is that all methods come with their full signature, including the class in which they are defined, for example:

Scheduler.class.schedule(...)
ContextProvider.class.provideComputationContext(...)
SimpleComputationExecutor.class.fulfillPromise(...)
SimpleComputationExecutor.class.computeHere(...)
SimpleComputationExecutor.class.handleLocalResponses(...)

So if you need a quick-and-dirty solution, it might well be that you could get away with just ~10 lines of awk,grep,sort and uniq wizardry to get nice adjacency lists with all your classes as nodes and methods as edges.

I've never tried it, it's just an idea. I cannot guarantee that Java decompilers work well on Scala code.

Nickolenicks answered 22/4, 2015 at 0:53 Comment(4)

Thanks, but the idea is to get a graph of calls, not a graph that maps classes to their methods. For correct results you really need to use the compiler, to come up with the class to which each method is applied (or get that from the class files as an alternative, following your approach) – Rossner 22/4, 2015 at 8:18

I think I understood your question correctly, I hoped that it would be possible for each method f to generate a list of methods g1,...,gN, such that f is implemented in terms of g1,...,gN (that is, that f can call each gK at runtime). However, I took a closer look at what the decompiler outputs, and now I have to admit that my proposal probably would not work. – Nickolenicks 22/4, 2015 at 11:23

it was a good direction of thought (!) I highly appreciate both the attitude and the spirit of further re-examination!! – Rossner 22/4, 2015 at 14:7

If you want to work at the bytecode level, two tools that analyze the call graph are Classycle and ProGuard. Both of them work fine on the bytecode generated by the Scala compiler. – Tassel 26/4, 2015 at 14:17

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++