Simple way to subset SpatialPolygonsDataFrame (i.e. delete polygons) by attribute in R
Asked Answered
S

5

80

I would like simply delete some polygons from a SpatialPolygonsDataFrame object based on corresponding attribute values in the @data data frame so that I can plot a simplified/subsetted shapefile. So far I haven't found a way to do this.

For example, let's say I want to delete all polygons from this world shapefile that have an area of less than 30000. How would I go about doing this?

Or, similarly, how can I delete Antartica?

require(maptools)

getinfo.shape("TM_WORLD_BORDERS_SIMPL-0.3.shp") 
# Shapefile type: Polygon, (5), # of Shapes: 246
world.map <- readShapeSpatial("TM_WORLD_BORDERS_SIMPL-0.3.shp")

class(world.map)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"

head(world.map@data)
#   FIPS ISO2 ISO3 UN                NAME   AREA  POP2005 REGION SUBREGION     LON     LAT
# 0   AC   AG  ATG 28 Antigua and Barbuda     44    83039     19        29 -61.783  17.078
# 1   AG   DZ  DZA 12             Algeria 238174 32854159      2        15   2.632  28.163
# 2   AJ   AZ  AZE 31          Azerbaijan   8260  8352021    142       145  47.395  40.430
# 3   AL   AL  ALB  8             Albania   2740  3153731    150        39  20.068  41.143
# 4   AM   AM  ARM 51             Armenia   2820  3017661    142       145  44.563  40.534
# 5   AO   AO  AGO 24              Angola 124670 16095214      2        17  17.544 -12.296

If I do something like this, the plot does not reflect any changes.

world.map@data = world.map@data[world.map@data$AREA > 30000,]
plot(world.map)

same result if I do this:

world.map@data = world.map@data[world.map@data$NAME != "Antarctica",]
plot(world.map)

Any help is appreciated!

Sulcate answered 18/11, 2012 at 18:50 Comment(0)
B
77

looks like you're overwriting the data, but not removing the polygons. If you want to cut down the dataset including both data and polygons, try e.g.

world.map <- world.map[world.map$AREA > 30000,]
plot(world.map)

[[Edit 19 April, 2016]] That solution used to work, but @Bonnie reports otherwise for a newer R version (though perhaps the data has changed too?): world.map <- world.map[world.map@data$AREA > 30000, ] Upvote @Bonnie's answer if that helped.

Borchers answered 18/11, 2012 at 19:31 Comment(3)
So simple, but now I know. Thanks!Sulcate
Exactly what I spent the last two days trying to figure out. Thanks!Baucis
both solutions (yours & Bonnie's) work for me (R version 3.2.2)Purifoy
P
45

When I tried to do this in R 3.2.1, tim riffe's technique above did not work for me, although modifying it slightly fixed the problem. I found that I had to specifically reference the data slot as well before specifying the attribute to subset on, as below:

world.map <- world.map[world.map@data$AREA > 30000, ]
plot(world.map)

Adding this as an alternative answer in case others come across the same issue.

Pervious answered 25/8, 2015 at 0:4 Comment(3)
With the latest R version I'm losing my @data when I subset the data in this way. The Formal Class SpatialPolygonsDataFrame becomes Formal Class SpatialPolygons. Is this something that happens to you as well?Unicellular
Same problem here. Did you find a solution?Amphibole
I haven't used the newer R versions, but you might try Erick Chacon's suggestion below (using subset)Pervious
S
17

Just to mention that subset also makes the work avoiding to write the data's name in the condition.

world.map <- subset(world.map, AREA > 30000)
plot(world.map)
Scarrow answered 5/8, 2016 at 15:11 Comment(2)
yeah exactly that command didn't work for me. Does it still for you?Gerrigerrie
using the subset command for me was the only way I was able to subset my spatial data under a parallelized (snowfall) environment.Snick
S
12

I used the above technique to make a map of just Australia:

australia.map < - world.map[world.map$NAME == "Australia",]
plot(australia.map)

The comma after "Australia" is important, as it turns out.

One flaw with this method is that it appears to retain all of the attribute columns and rows for all of the other countries, and just populates them with zeros. I found that if I wrote out a .shp file, then read it back in using readOGR (rgdal package), it automatically removes the null geographic data. Then I could write another shape file with only the data I want.

writeOGR(australia.map,".","australia",driver="ESRI Shapefile")
australia.map < - readOGR(".","australia")
writeOGR(australia.map,".","australia_small",driver="ESRI Shapefile")

On my system, at least, it's the "read" function that removes the null data, so I have to write the file after reading it back once (and if I try to re-use the filename, I get an error). I'm sure there's a simpler way, but this seems to work good enough for my purposes anyway.

Sufficient answered 13/8, 2013 at 1:36 Comment(3)
As noted in this comment R doesn't automatically drop unused factor levels, instead you have to re-factor: feature$ATTR <- factor(feature$ATTR).Huddersfield
The comma is important, since it implies a selection: Before the comma we select rows (all rows where NAME equals Austrialia), after the comma we select columns, here: all columns since no particular column is specified.Pact
@adifferentben wouldn't droplevels() be a better choice?Paulenepauletta
H
1

As a second pointer: this does not work for shapefiles with "holes" in the shapes, because it is subsetting by index.

Hardship answered 30/1, 2017 at 23:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.