Zip Code Demographics in R
Asked Answered
R

5

6

I could get at my goals "the long way" but am hoping to stay completely within R. I am looking to append Census demographic data by zip code to records in my database. I know that R has a few Census-based packages, but, unless I am missing something, these data do not seem to exist at the zip code level, nor is it intuitive to merge onto an existing data frame.

In short, is it possible to do this within R, or is my best approach to grab the data elsewhere and read it into R?

Any help will be greatly appreciated!

Rind answered 1/6, 2011 at 0:7 Comment(0)
S
6

In short, no. Census to zip translations are generally created from proprietary sources.

It's unlikely that you'll find anything at the zipcode level from a census perspective (privacy). However, that doesn't mean you're left in the cold. You can use the zipcodes that you have and append census data from the MSA, muSA or CSA level. Now all you need is a listing of postal codes within your MSA, muSA or CSA so that you can merge. There's a bunch online that are pretty cheap if you don't already have such a list.

For example, in Canada, we can get income data from CRA at the FSA level (the first three digits of a postal code in the form A1A 1A1). I'm not sure what or if the IRS provides similar information, I'm also not too familiar with US Census data, but I imagine they provide information at the CSA level at the very least.

If you're bewildered by all these acronyms:

  1. MSA: http://en.wikipedia.org/wiki/Metropolitan_Statistical_Area
  2. CSA: http://en.wikipedia.org/wiki/Combined_statistical_area
  3. muSA: http://en.wikipedia.org/wiki/Micropolitan_Statistical_Area
Superman answered 1/6, 2011 at 1:13 Comment(2)
Although, if someone knows of a non-proprietary zip-msa list I'd be more than happy to see it.Superman
The census bureau likes to say "we don't touch zip codes, don't ask us", but check out census.gov/population/www/metroareas/metroarea.html - at the very bottom is a mapping of zip codes to CBSAs (metro + micro statistical areas), though a few years old. It's still messy as hell though, since there aren't clean boundaries in which zips map directly to MSAs, but it's a start. Ahh, fond memories of when I used to muck with this for a living...Tenaille
T
3

As others in this thread have mentioned, the Census Bureau American FactFinder is a free source of comprehensive and detailed data. Unfortunately, it’s not particularly easy to use in its raw format.

We’ve pulled, cleaned, consolidated, and reformatted the Census Bureau data. The details of this process and how to use the data files can be found on our team blog.

None of these tables actually have a field called “ZIP code.” Rather, they have a field called “ZCTA5”. A ZCTA5 (or ZCTA) can be thought of as interchangeable with a zip code given following caveats:

  • There are no ZCTAs for PO Box ZIP codes - this means that for 42,000 US ZIP Codes there are 32,000 ZCTAs.
  • ZCTAs, which stand for Zip Code Tabulation Areas, are based on zip codes but don’t necessarily follow exact zip code boundaries. If you would like to read more about ZCTAs, please refer to this link. The Census Bureau also provides an animation that shows how ZCTAs are formed.
Tea answered 11/12, 2018 at 23:27 Comment(0)
K
2

I just wrote a R package called totalcensus (https://github.com/GL-Li/totalcensus), with which you can extract any data in decennial census and ACS survey easily.

For this old question if you still care, you can get total population (by default) and population of other races from national data of decennial census 2010 or 2015 ACS 5-year survey.

From 2015 ACS 5-year survey. Download national data with download_census("acs5year", 2015, "US") and then:

zip_acs5 <- read_acs5year(
    year = 2015,
    states = "US",
    geo_headers = "ZCTA5",
    table_contents = c(
        "white = B02001_002",
        "black = B02001_003",
        "asian = B02001_005"
    ),
    summary_level = "860"
)

#               GEOID        lon      lat ZCTA5 state population white black asian GEOCOMP SUMLEV        NAME
#     1: 86000US01001  -72.62827 42.06233 01001    NA      17438 16014   230   639     all    860 ZCTA5 01001
#     2: 86000US01002  -72.45851 42.36398 01002    NA      29780 23333  1399  3853     all    860 ZCTA5 01002
#     3: 86000US01003  -72.52411 42.38994 01003    NA      11241  8967   699  1266     all    860 ZCTA5 01003
#     4: 86000US01005  -72.10660 42.41885 01005    NA       5201  5062    40    81     all    860 ZCTA5 01005
#     5: 86000US01007  -72.40047 42.27901 01007    NA      14838 14086   104   330     all    860 ZCTA5 01007
# ---                                                                                                     
# 32985: 86000US99923 -130.04103 56.00232 99923    NA         13    13     0     0     all    860 ZCTA5 99923
# 32986: 86000US99925 -132.94593 55.55020 99925    NA        826   368     7     0     all    860 ZCTA5 99925
# 32987: 86000US99926 -131.47074 55.13807 99926    NA       1711   141     0     2     all    860 ZCTA5 99926
# 32988: 86000US99927 -133.45792 56.23906 99927    NA        123   114     0     0     all    860 ZCTA5 99927
# 32989: 86000US99929 -131.60683 56.41383 99929    NA       2365  1643     5    60     all    860 ZCTA5 99929

From Census 2010. Download national data with download_census("decennial", 2010, "US") and then:

zip_2010 <- read_decennial(
    year = 2010,
    states = "US",
    table_contents = c(
        "white = P0030002", 
        "black = P0030003",
        "asian = P0030005"
    ),
    geo_headers = "ZCTA5",
    summary_level = "860"
)

#               lon      lat ZCTA5 state population white black asian GEOCOMP SUMLEV
#     1:  -66.74996 18.18056 00601    NA      18570 17285   572     5     all    860
#     2:  -67.17613 18.36227 00602    NA      41520 35980  2210    22     all    860
#     3:  -67.11989 18.45518 00603    NA      54689 45348  4141    85     all    860
#     4:  -66.93291 18.15835 00606    NA       6615  5883   314     3     all    860
#     5:  -67.12587 18.29096 00610    NA      29016 23796  2083    37     all    860
# ---                                                                            
# 33116: -130.04103 56.00232 99923    NA         87    79     0     0     all    860
# 33117: -132.94593 55.55020 99925    NA        819   350     2     4     all    860
# 33118: -131.47074 55.13807 99926    NA       1460   145     6     2     all    860
# 33119: -133.45792 56.23906 99927    NA         94    74     0     0     all    860
# 33120: -131.60683 56.41383 99929    NA       2338  1691     3    33     all    860
Kaitlin answered 5/12, 2017 at 14:0 Comment(0)
B
0

Your best bet is probably with the U.S. Census Bureau TIGER/Line shapefiles. They have ZIP code tabulation area shapefiles (ZCTA5) for 2010 at the state level which may be sufficient for your purposes.

Census data itself can be found at American FactFinder. For example, you can get population estimates at the sub-county level (i.e. city/town), but not straight-forward population estimates at the zip-code level. I don't know the details of your data set, but one solution might require the use of relationship tables that are also available as part of the TIGER/Line data, or alternatively spatially joining the place names containing the census data (subcounty shapefiles) with the ZCTA5 codes.

Note from the metadata: "These products are free to use in a product or publication, however acknowledgement must be given to the U.S. Census Bureau as the source."

HTH

Bufordbug answered 1/6, 2011 at 3:12 Comment(0)
L
0

simple for loop to get zip level population. you need to get a key though. it is for US now.

masterdata <- data.table()

    for(z in 1:length(ziplist)){
      print(z)
      textt <- paste0("http://api.opendatanetwork.com/data/v1/values?variable=demographics.population.count&entity_id=8600000US",ziplist[z],"&forecast=3&describe=false&format=&app_token=YOURKEYHERE")

      errorornot <- try(jsonlite::fromJSON(textt), silent=T) 
      if(is(errorornot,"try-error")) next

      data <- jsonlite::fromJSON(textt)
      data <- as.data.table(data$data)
      zipcode <- data[1,2]
      data <- data[2:nrow(data)]
      setnames(data,c("Year","Population","Forecasted"))
      data[,ZipCodeQuery:=zipcode]
      data[,ZipCodeData:=ziplist[z]]
      masterdata <- rbind(masterdata,data)

    }
Labroid answered 12/1, 2019 at 1:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.