R language aware code reformatting/refactoring tools?
Asked Answered
S

3

30

Recently I've found myself working with R code that is all over the map in terms of coding style - multiple authors and individual authors who aren't rigorous about sticking to a single structure. There are certain tasks that I'd like to automate better than I currently do.

I'm looking for a tool (or tools) that manage the following tasks - listed in increasing order of desire but also somewhat in increasing order of skepticism of existence.

  • Basic formatting. Things like converting "if( foo )" to "if (foo)" and achieving uniformity in terms of brace location and that sort of thing.

  • Converting "foo$blah" to "foo[["blah"]]" for list access. Ideally it'd be able to at least make a guess if an object was really a list and not a data.frame and only convert lists.

  • Converting '=' to '<-'. Yes, this is a simple search and replace - but not really. The tool (or regexp) needs to be language aware such that it knows to convert "x = 5" but not "foo(x=5)". It'd also be really nice to not simply replace the symbol but also to ensure a single whitespace on both sides of the assignment operator.

  • Variable renaming, particularly across functions & files. For instance, suppose a list has an element "foo", I'd love to be able to change it to "foobar" once and not have to track down every usage of that list throughout the entire code flow. I'd imagine this would require the tool to be able the entire flow of control in order to identify things such as that list existing as another name in a different function.

  • Naming conventions. I'd love to be able to define some standard naming convention (e.g. Google's or whatever) and have it identify all of the functions, variables, etc and convert them. Note that this ties in with the previous entry for things like list elements.

Feel free to list basic unix processing commands (e.g. sed) as long as it'll really be smart enough to at least usually not screw things up (e.g. converting "foo(x=5)" to "foo(x<-5)").

I'm guessing that if such a tool already existed in a perfect state that I'd have heard of it by now, and I'm also realizing that with a language like R it's difficult to do some of these sorts of changes automagically, but one can dream, right? Does anyone have pointers on some/all of these?

Suited answered 2/2, 2012 at 0:12 Comment(8)
Reformatting was discussed here: https://mcmap.net/q/443144/-any-r-style-guide-checkerAdkins
IIRC formatR is one package that cleans up R code and covers a few points in your list.Giselagiselbert
The second goal could be a tad messy: a data.frame returns TRUE for is.list(). You could try something like is.list(myObject) & (!is.data.frame(myObject)).Doby
The fourth item, which I highly desire, could be next to impossible, as a list item's name could be a value provided by a character string variable. E.g. myItem <- "address"; myIx <- 4; myContacts[[myIx]][[myItem]] <- "123 Main St."Doby
as baptiste mentioned, the formatR package can do the first and third tasks: github.com/yihui/formatR/wikiPaderewski
Cool, I hadn't seen the formatR package. It's definitely a start - those are the tasks least likely to hose something up when doing them manually but it's still a PITA to do them manually.Suited
@Doby - yeah, I realize that's not likely to happen, at least not in a foolproof way. Combined w/ R's feature of simply returning NULL when one asks for a non-existent list element I find it introduces a lot of subtle bugs when changing list element names so I end up having to balance being annoyed by inconsistent naming schemes or being annoyed by subtle bugs.Suited
The StatET plugin for Eclipse will rename across files ("Source > Rename in workspace" and friends), provides a very helpful outline of R files (e.g. linking to the line that defined the selected variable), stepped debugging, an expand/collapse object browser, and str() tooltips for the variable at your cursor. I have never written entire packages myself, but StatET has vastly simplified my writing and debugging of R scripts. screenshotCurie
T
4

Since this is still seem relevant I thought to mention styler which reformats r code according to the tidyverse style.

It ticks some of your boxes e.g. basic formatting but doesn't rename variables (although the linter lintr at least is able to show those).

Styler comes as an R package with functions the accept code (e.g. style_text(), but it can be used on the command line as well:

for example this code in tmp.r

a <-c(1,2,3) 
if(foo) {
  b=2 }
myVar=2

and running:

Rscript -e 'styler::style_file("tmp.r")'

would overwrite tmp.r into this:

a <- c(1, 2, 3)
if (foo) {
  b <- 2
}
myVar <- 2
Teishateixeira answered 15/4, 2018 at 11:34 Comment(0)
P
2

IMHO, write your own. Writing a pretty printer is actually quite difficult. It requires understanding tokenizing, parsing, building ASTs or other IRs, tracking symbol tables and scopes, templating, etc.

But if you can do it, you'll really learn a lot about programming languages in general. You'll also look pretty impressive to your coworkers and it's amazing to put on a resume. It's also a lot of fun.

I'd recommend "Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages" by Terence Parr. It's a little rough to read, but the content is pretty good. It's written at an introductory level to parsers and it's pretty short, but it contains all the parts you'd need to write this tool yourself.

If you do build it, open source it, come back here and tell us about it, and put up a site with a few ads to make yourself a few bucks. That way everyone can use your awesome creation and you'll get a few dollars in the process.

Best of luck...

Psychotomimetic answered 28/3, 2012 at 7:57 Comment(6)
I build language-accurate prettyprinters for dozens of languages doing exactly what you say (well, I use DMS instead of ANTLR). I can tell you from bitter experience that unless your formatter includes every formatting option imaginable, is free and runs on Mac/Linux/Windows, that people will slam it for all those reasons. Best of luck indeed.Jackjackadandy
Sorry to hear about that. People can be lame sometimes. Would you agree, though, that it's a useful programming exercise for learning? By the way, Ira is one of the godfathers of parsers and language translations. His opinion carries more weight than Gates or Torvalds on this subject. Honestly.Psychotomimetic
Ira, do you have any other book recommendations?Psychotomimetic
yes, it is fine learning exercise if you want to learn about parsing and prettyprinting :-} You can learn about parsing from almost any compiler book, and you need that background to start doing it well. But the best learning about parsers comes, like most learning, from getting down in the mud and wrestling with the alligators. I heartily recommend Yacc, Bison, or ANTLR for basic learning. If you want insights while you learn, I've never encountered a better reference than the 1963 (not a typo) paper in my SO here: stackoverflow.com/a/1142034/120163Jackjackadandy
What you won't find in the books or hardly even in the literature is how to go about doing prettyprinting. Fortunately SO'ers can benefit from experience from another SO answer of mine: https://mcmap.net/q/83019/-compiling-an-ast-back-to-source-code. This latter is based on our design and 15 years of experience with DMS, which is IMHO [but I'm biased] a heck of a device for building such tools. I thought about responding to OP's question with, "If you want you own language-specific prettyprinter, DMS would be the fastest way to get one that was reliable."Jackjackadandy
PS: I didn't know I had a fan club, especially of venerated elders :-}Jackjackadandy
K
1

Two that I'm aware of are :

https://github.com/ropensci/Rclean

https://github.com/moodymudskipper/refactor

Kippie answered 13/9, 2021 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.