To celebrate the 100,000th question in the r tag, I'd like to create a list of the names of all package authors on CRAN.
Initially, I thought I could do this using available.packages()
but sadly this doesn't contain a column of the authors.
pdb <- available.packages()
colnames(pdb)
[1] "Package" "Version" "Priority"
[4] "Depends" "Imports" "LinkingTo"
[7] "Suggests" "Enhances" "License"
[10] "License_is_FOSS" "License_restricts_use" "OS_type"
[13] "Archs" "MD5sum" "NeedsCompilation"
[16] "File" "Repository"
This information is available in the DESCRIPTION
file for each package. So I can think of two brute force ways, neither of which are very elegant:
Download each of the 6,878 packages and read the
DESCRIPTION
file usingbase::read.dcf()
Scrape each of the package pages on CRAN. For example, https://cran.r-project.org/web/packages/MASS/index.html tells me that Brian Ripley is the author of MASS.
I don't want to download all of CRAN to answer this question. And I don't want to scrape the HTML either, since the information in the DESCRIPTION file is a neatly formatted list of person
objects (see ?person
).
How can I use the information on CRAN to easily build a list of package authors?