What to do with imperfect-but-useful functions?

V

3

13

I could equally have titled this question, "Is it good enough for CRAN?"

I have a collection of functions that I've built up for specific tasks. Some of these are convenience functions:

# Returns odds/evens from a vector
odds=function(vec) {
    stopifnot(class(vec)=="integer")
    ret = vec[fpart(vec/2)!=0]
    ret
}
evens=function(vec) {
    stopifnot(class(vec)=="integer")
    ret = vec[fpart(vec/2)==0]
    ret
}

Some are minor additions that have proven useful in answering common SO question:

# Shift a vector over by n spots
# wrap adds the entry at the beginning to the end
# pad does nothing unless wrap is false, in which case it specifies whether to pad with NAs
shift <- function(vec,n=1,wrap=TRUE,pad=FALSE) {
    if(length(vec)<abs(n)) { 
        #stop("Length of vector must be greater than the magnitude of n \n") 
    }
    if(n==0) { 
        return(vec) 
    } else if(length(vec)==n) { 
        # return empty
        length(vec) <- 0
        return(vec)
    } else if(n>0) {
        returnvec <- vec[seq(n+1,length(vec) )]
        if(wrap) {
            returnvec <- c(returnvec,vec[seq(n)])
        } else if(pad) {
            returnvec <- c(returnvec,rep(NA,n))
        }
    } else if(n<0) {
        returnvec <- vec[seq(1,length(vec)-abs(n))]
        if(wrap) {
            returnvec <- c( vec[seq(length(vec)-abs(n)+1,length(vec))], returnvec )
        } else if(pad) {
            returnvec <- c( rep(NA,abs(n)), returnvec )
        }

    }
    return(returnvec)
}

The most important are extensions to existing classes that can't be found anywhere else (e.g. a CDF panel function for lattice plots, various xtable and LaTeX output functions, classes for handling and converting between geospatial object types and performing various GIS-like operations such as overlays).

I would like to make these available somewhere on the internet in R-ized form (e.g. posting them on a blog as plain text functions is not what I'm looking for), so that maintenance is easier and so that I and others can access them from any computer that I go to. The logical thing to do is to make a package out of them and post them to CRAN--and indeed I already have them packaged up. But is this collection of functions suitable for a CRAN package?

I have two main concerns:

The functions don't seem to have any coherent overlay. It's just a collection of functions that do lots of different things.
My code isn't always the prettiest. I've tried to clean it up as I learned better coding practices, but producing R Core-worthy beautiful code is not in the cards.

The CRAN webpage is surprisingly bereft of guidelines on posting. Should I post to CRAN, given that some people will find it useful but that it will in some sense forever lock R into having some pretty basic function names taken up? Or is there another place I can use an install.packages-like command to install from? Note I'd rather avoid posting the package to a webpage and having people have to memorize the URL to install the package (not least for version control issues).

Viceroy answered 26/7, 2011 at 10:53 Comment(0)

R

4

Most packages should be collections of related functions with an obvious purpose, so a useful thing to do would be to try and group what you have together, and see if you can classify them. Several smaller packages are better than one huge incoherent package.

That said, there are some packages that are collections of miscellaneous utility functions, most notably Hmisc and gregmisc, so it is okay to do that sort of thing. If you just have a few functions like that, it might be worth contacting the author of some of the misc packages and seeing if they'll let you include your code in their package.

As for writing pretty code, the most important thing you can do is to use a style guide.

Rousing answered 26/7, 2011 at 11:2 Comment(2)

I've already been grouping the functions when I created help files, so I could add another level of aggregation and release them as a series of small packages to CRAN, and write some of the other packages which my classes extend and see if they'd add one or two functions in. I'd worry that I'd be flooding CRAN with small irrelevant packages, however, but I can see how that might be preferred. – Viceroy 26/7, 2011 at 15:25

@AriB.Friedman I wouldn't worry about smallness of packages. Small packages are easier to maintain and less likely to break in the future. The biggest hassle for CRAN is code that breaks and, especially, breaks code in the package's reverse dependencies. – Fibre 17/8, 2016 at 16:25

S

5

I would use http://r-forge.r-project.org/. From the top of the page:

R-Forge offers a central platform for the development of R packages, R-related software and further projects. It is based on FusionForge offering easy access to the best in SVN, daily built and checked packages, mailing lists, bug tracking, message boards/forums, site hosting, permanent file archival, full backups, and total web-based administration.

Schroder answered 26/7, 2011 at 11:18 Comment(7)

This looks like it would nicely take care of the versioning issues. Here's my key concern. When I teach someone, I often will say, "And then you just use this function, in this package. To install it, type ____." How complicated is ____ if my package is on R-Forge? – Viceroy 26/7, 2011 at 15:23

install.packages("mypackage",repos="http://r-forge.r-project.org"). The only issue I have encountered in using R-forge for teaching packages etc. is that changes to r-forge don't propagate to the built packages for 24 hours, so I have sometimes resorted to posting the very most recent versions of the packages on my own repository. – Lime 26/7, 2011 at 15:35

@Ben: Definitely better than plopping it on a server somewhere. Thanks. I'll either do that or just release to CRAN, possibly dividing into smaller packages as per Richie's hint. – Viceroy 26/7, 2011 at 18:4

@Ben Some time ago I found that hosting the code in googlecode and linking the svn to r-forge was a great option. You could provide binaries on googlecode when r-forge was taking too long; otherwise you got the convenience of install.packages(). However, it seems that r-forge does not offer this option anymore. Perhaps the developers could re-enable it? – Kosiur 26/7, 2011 at 21:4

@gsk3: I have mixed feelings about division into a horde of smaller packages. Gigantic incoherent packages are annoying, but so (to me) is installing a whole bunch of tiny cross-dependent packages (although that's better than packages that pull in a whole series of big fat dependencies, especially those that require platform-specific binary components ...) – Lime 26/7, 2011 at 21:7

@Ben: I suspect there's a good balance in between the two. I don't quite have the heart to go through and divide it up into pieces anyway. I've been sitting on this code for years and would just like to get it out there. There's much to be said for Stata's approach where you can release one function at a time (but even more to be said against it as well!). – Viceroy 26/7, 2011 at 21:16

Frankly, I recommend liberal submission. Whether it's 1 or 2 or 3 packages isn't a big concern. In time, things will get cleaned up through communal refactoring. Access to good code is a great time saver. I have 10-20 helper functions that I'll eventually release in a package. Some of these things should be in base R, but may not have occurred to the R Core developers. At least having the code available in one (or two or three) central repositories makes it easier to review and incorporate. – Aruwimi 1/8, 2011 at 14:29

R

4

Most packages should be collections of related functions with an obvious purpose, so a useful thing to do would be to try and group what you have together, and see if you can classify them. Several smaller packages are better than one huge incoherent package.

That said, there are some packages that are collections of miscellaneous utility functions, most notably Hmisc and gregmisc, so it is okay to do that sort of thing. If you just have a few functions like that, it might be worth contacting the author of some of the misc packages and seeing if they'll let you include your code in their package.

As for writing pretty code, the most important thing you can do is to use a style guide.

Rousing answered 26/7, 2011 at 11:2 Comment(2)

I've already been grouping the functions when I created help files, so I could add another level of aggregation and release them as a series of small packages to CRAN, and write some of the other packages which my classes extend and see if they'd add one or two functions in. I'd worry that I'd be flooding CRAN with small irrelevant packages, however, but I can see how that might be preferred. – Viceroy 26/7, 2011 at 15:25

@AriB.Friedman I wouldn't worry about smallness of packages. Small packages are easier to maintain and less likely to break in the future. The biggest hassle for CRAN is code that breaks and, especially, breaks code in the package's reverse dependencies. – Fibre 17/8, 2016 at 16:25

A

1

In my opinion it is not a good idea to make this type material into packages.
Misc-packages do exist, but mostly for historical reason and/or due to their authoritative contributors, see Frank Harrell Hmisc .

I see three main reason why this choice does non fit for disparate collection of functions.

There are by and large 7000 packages on CRAN only. It is unlikely that your package will be chosen if it does not target a specific field and, even when this happens, it is very possible that other established packages do the same. Therefore your package should also sport an original/better solution to the problem it deals with.
Repositories, and CRAN in particular, are task-oriented, which suggests packages' functions should address a coherent task. And for a good reason: there is no point in downloading a whole package with say, 50 autonomous functions, when I need just a couple of them. Instead, if a package solves a specific data problem of mine, than I will most likely need most (if not all) of them.
R repositories tend to mask the content. Contrary to tech blogs, you do not immediately see the functions' source. You need to download a separate source package and there is a lot of overhead due to the package structure, which buries the actual functions you are willing to show and the others need to read.

In my opinion the best place for general convenience functions, are sites like GitHub. In fact:

One immediately reads them with the comfort of syntax highlight. If they are interesting, they can be pasted in R to give a try and possibly keep them, otherwise one simply steps over to read next function.
There is the possibility of organising code, but without all the constraints of an actual package. Similar functions might go in the same file and coherent files in the same subfolder.
You can show your ideas to the others in a simple way. The readme file can immediately become a sort of mini webpage (via markdown). In comparison CRAN is quite rigid.

There are a lot of other benefits (revision history, accepting contributions, GitHub pages), which may or may not interest you.

Of course, after several functions grow in a stable coherent direction, you will turn them into an actual CRAN package. Also because the copy and paste method to try them becomes then inconvenient.

EDIT: Nowadays there are alternatives to GitHub, which can be taken into consideration too and GitHub has become a common way to distribute packages not yet ready for CRAN or to integrate the official CRAN distribution page.

Astound answered 14/8, 2016 at 14:28 Comment(0)

Recommended topics

Hot tags