My task is to perform clustering on a data set.
The variables have been scaled and centered. I am using the following code to find the optimal number of clusters:
d <- dist(df, method = "euclidean")
library(cluster)
pamk.best <- pamk(d)
plot(pam(d, pamk.best$nc))
str(df)
161976 obs. of 11 variables
R version: 3.2.4
RStudio version: 0.99.893
Windows 7
Intel i7
480 GB RAM
I have noticed that the system never uses more than 22% of the CPU's processing power.
I have taken the following actions so far:
- Unsuccessfully tried to change the Set Priority and Set Affinity setting for
rsession.exe
in the Processes tab of the Windows Task Manager. But, for some reason, it always comes back to low even when I set it to High or Realtime or anything else on that list. The Set Affinity setting shows that the system is allowing R to use all of the cores. - I have adjusted the
High Performance
settings on my machine by going into Control Panel -> Power Options -> Change advance power settings -> Processor Power Management to 100%. - I have read up the parallel processing
CRAN Task View for High Performance Computing
. I may be wrong but I don't think that calculating distance between observations in a data set is a task that should be parallelized, in the sense of, dividing up the data set in subsets and performing the distance calculations on subsets in parallel on different cores. Please correct me if I am wrong.
One option I have is to perform clustering on a subset of the data set and then predict cluster membership for the rest of the data set. But, I am thinking that if I have the processing power and the memory available, why can't I perform the clustering on the whole data set!
Is there a way to have the machine or R
use higher percentage of the processing power and complete the task quicker?
EDIT: I think that my issue is different from the one described in Multithreading in R because I am not trying to run different functions in R. Rather, I am running only one function on one dataset and would like the machine to use more processing power that is available to it.