"@Golo: So what you want is the ability to specify how whitespace occurs between every kind of language construct, in every kind of context? (e.g., how if-then-else is laid out inside a do loop vs. inside the top level of a function)?
Golo: That's correct :-)"
Then what you need is access to the structure of the language at each point in the code, and precise position information of each language element (starting/ending line/column). For linting, you want a way to write tests against combinations of those things. For repair, you want a way to regenerate text that meets your constraints. You obviously want all the to be easy to configure.
The "structure" you want is what is produced by a parser in a syntax tree. The context is the syntax structure around the structure of interest. You don't want an abstract syntax tree, because that loses the concrete tokens whose positions you want to inspect/control, so you want a full concrete parse tree.
Parsers aren't interested in precise source positions, but a lexer (needed to break input streams into language tokens to feed to the parser) is in the position to collect this precise information. You have worry about some complicating issues on "what constitutes column adjustments and by how much". Some examples: Tab characters: tab to next 8 character boundary? 4 characters? to prespecifed tab columns? On linux, "LF" advances the line number, and resets the column count to 1. On Windows, it is "CR/LF" as a pair. On other OS systems I have encounterd, it is "CR" only; on really modern systems, the Unicode newline character should do this. So, if on linux, how should you treat CR? How about null characters found in the text? ^Z? Other control characters (e.g., ^L [formfeed])?
Given a source file, precisely parsed into a CST with captured source positions, now you want to check that a structure is aligned the way you want. First, you need to specify the structure; do loop? constructor? data declaration? Then you need predicates on the column position to give you precise control.
Virtually all tools that provide syntax trees do not provide any easy way to refer to such structures. Pretty much you are stuck writing classic compiler-like procedural code that knows the shape of the syntax tree and climbs over it looking for a tree node of interest, and then looking around see if other relevant tree nodes are present. Once you are in this mode, you can recognize the trees you want, and then write more procedural code to check the spacing conventions.
Program transformation systems (PTS) often provide "source-to-source" rewrites, in which you can directly write patterns using the surface syntax of the language. That's far more convenient than climbing around the tree procedurally. Some only do source-to-source pattern pairs; some offer the ability to specify just a single pattern. The PT system must also be able to parse the language of interest, and enable you to add custom checks for your specific task.
As an example, our DMS Software Reengineering Toolkit parses ECMAScript, and offers such source-pattern specifications, along with the ability to attach custom conditions and actions. As an example:
domain ECMAScript;
pattern ideal_if_statement_layout(e:expression,s:statement):statement =
" if (\e)
\s" if diagnose_not_equal(column(s),parentheses_column(e));
expresses the interest in "if then" statements (you'd use a different pattern for "if then else"), and a constraint over custom column comparison functions that check the position of the statement elements. The "diagnose_not_equal" custom function would produce lint-complaints. The quote marks are meta-quotes; they are part of the pattern matching language, not the underlying language. e and s are metavariables, and match any language structure expression and statement respectively. Because these are being applied to the CST, they cannot mismatch their intended targets. The custom function "column" merely picks up the starting column information associated with the left-most subtree of s; the tree management APIs in DMS make this essentially trivial to get. "parenthesescolumn" is needed because the pattern tells you where e is; the "(" is in the tree node above the e so some slight navigation of the tree is needed to find the "(", and then extract its rightmost column, also easily done with the DMS tree API.
You can build arbitrarily complex patterns; you can also make a condition in one patter, depend on the match of another. So, with a modest number of custom column extraction functions, you could write a variety of linting checks.
What this won't get you is a check that the "if" keyword is one space to the left of the "(" keyword, easily. You could express to some degree with addition custom checks, e.g., "statement_keyword_column", etc. but this is starting to get awkward.
You might notice the layout of the pattern; it would be nice to use that as constraints, too. DMS doesn't provide a direct way to do this. However, it is perfectly capable of reading its own pattern descriptions as trees. Using that, one could extract the apparant layout of the pattern, and use that to check the structural layout. This requires some sophistication in the use of DMS, but is a matter of sweat, not theory or missing mechanisms.
I personally don't like linting on layout much; I'd prefer the file simply get reshaped.
DMS does have prettyprinting rules that will convert your CST, whatever its layout was, into a layout controlled by its prettyprinting rules. At the moment, those rules are specific to tree nodes, and encoded with the grammar, so they are somewhat limited. One can write (in the grammar):
stmt = 'if' expression stmt ';'
<<PrettyPrinter>>: { V(H('if,expression),I(stmt[1])) }
This will cause all if-then statements to be regenerated as:
if expresssion
stmt
[V means "vertical box" of two subboxes; H means "horizontal box", I means "indented box"]
Careful use of such prettyprinting rules can do a pretty nice job of reformatting code. It isn't perfect, because you can't control the layout of multiple statements this way. But this is part of DMS and actually pretty easy to modify.
An ideal solution would be to use the pattern language, and to use the layout within the pattern to control the prettyprinting. This is in our plans, but alas, not yet in DMS.
I think other PTS can express patterns to some degree as above, and most of them have some way to specify prettyprinting something like DMS has. So the good news is these tools do much of what you want. The not so good news is it quite the effort to pick one of the tools up and learn to use it; an afternoon doesn't cut it, by a long shot.