spatial clustering in R (simple example)
Asked Answered
C

3

11

I have this simple data.frame

 lat<-c(1,2,3,10,11,12,20,21,22,23)
 lon<-c(5,6,7,30,31,32,50,51,52,53)
 data=data.frame(lat,lon)

The idea is to find the spatial clusters based on the distance

First, I plot the map (lon,lat) :

plot(data$lon,data$lat)

enter image description here

so clearly I have three clusters based in the distance between the position of points.

For this aim, I've tried this code in R :

d= as.matrix(dist(cbind(data$lon,data$lat))) #Creat distance matrix
d=ifelse(d<5,d,0) #keep only distance < 5
d=as.dist(d)
hc<-hclust(d) # hierarchical clustering
plot(hc)
data$clust <- cutree(hc,k=3) # cut the dendrogram to generate 3 clusters

This gives :

enter image description here

Now I try to plot the same points but with colors from clusters

plot(data$x,data$y, col=c("red","blue","green")[data$clust],pch=19)

Here the results

enter image description here

Which is not what I'm looking for.

Actually, I want to find something like this plot

enter image description here

Thank you for help.

Concertgoer answered 23/2, 2015 at 11:11 Comment(2)
I've tried to follow : #21096143Concertgoer
I'm not quite sure why you are clustering the distances that way... If you use hc <- hclust(dist(data)); clust <- cutree(hc, 3) it works as expected.Unrealizable
I
12

What about something like this:

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)

km <- kmeans(cbind(lat, lon), centers = 3)
plot(lon, lat, col = km$cluster, pch = 20)

enter image description here

Indorse answered 23/2, 2015 at 11:26 Comment(0)
C
10

Here's a different approach. First it assumes that the coordinates are WGS-84 and not UTM (flat). Then it clusters all neighbors within a given radius to the same cluster using hierarchical clustering (with method = single, which adopts a 'friends of friends' clustering strategy).

In order to compute the distance matrix, I'm using the rdist.earth method from the package fields. The default earth radius for this package is 6378.388 (the equatorial radius) which might not be what one is looking for, so I've changed it to 6371. See this article for more info.

library(fields)
lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284)
lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378)
threshold.in.km <- 40
coors <- data.frame(lon,lat)

#distance matrix
dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371)

#clustering
fit <- hclust(as.dist(dist.in.km.matrix), method = "single")
clusters <- cutree(fit,h = threshold.in.km)

plot(lon, lat, col = clusters, pch = 20)

This could be a good solution if you don't know the number of clusters (like the k-means option), and is somewhat related to the dbscan option with minPts = 1.

---EDIT---

With the original data:

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)

dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM
fit <- hclust(as.dist(dist), method = "single")
clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM
plot(lon, lat, col = clusters, pch = 20)
Copyholder answered 6/10, 2015 at 9:12 Comment(2)
I found your answer very very helpful, could you add the same for k-means method also?Sterilize
@user3875610 The transition to k-means is not very intuitive, since one can't use a distance matrix as an input to k-means (it can't compute means with only distances). Additionally, in this use case you often don't know the number of clusters, and you prefer to use a density based approach like hclust or dbscan. Having said that, if you want to use k-medoids (similar to kmeans) check out this answer : stats.stackexchange.com/questions/32925/…Copyholder
C
6

As you have a spatial data to cluster, so DBSCAN is best suited for you data. You can do this clustering using dbscan() function provided by fpc, a R package.

library(fpc)

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)

DBSCAN <- dbscan(cbind(lat, lon), eps = 1.5, MinPts = 3)
plot(lon, lat, col = DBSCAN$cluster, pch = 20)

Plot of DBSCAN Clustering

Colloquialism answered 12/4, 2015 at 17:25 Comment(1)
How do you get/guess eps parameter?Unmeriting

© 2022 - 2024 — McMap. All rights reserved.