correlation of one variable to all the other in R
Asked Answered
W

2

15

I want to calculate the correlation between my dependent variable y and all my x. I use the code below,

   cor(loan_data_10v[sapply(loan_data_10v, is.numeric)],use="complete.obs")

the result is a correlation matrix. How can i just get one column with my variable y.

Weeks answered 26/8, 2017 at 5:37 Comment(7)
cor(load_data_10v)[,1]?Plonk
it throw me an error "'x' must be numeric", since not all the variables in my data set are numericWeeks
My point: add [,1] to the outside/end of your command, add in cor(...)[,1].Plonk
Yes this works too. Thank you!!!Weeks
Since my y is the 10, i used [,10], it works. what if i dont know my y is the 10 th, but the last one, i tried [,-1], but this gave me everything. Why i that, i thought it is going to slice the last columnWeeks
Sorry, I have no idea what you are talking about. [,-1] gives you all columns except 1, that's what negatives are intended to do.Plonk
okay got it thank you!Weeks
R
24

If we are looking for cor between 'x' and 'y', both argument can be either a vector or matrix. using a reproducible example, say mtcars and suppose 'y' is 'mpg' and 'x' the other variables ('mpg' is the first column, so we used mtcars[-1] for 'x')

cor(mtcars[-1], mtcars$mpg) 
#          [,1]
#cyl  -0.8521620
#disp -0.8475514
#hp   -0.7761684
#drat  0.6811719
#wt   -0.8676594
#qsec  0.4186840
#vs    0.6640389
#am    0.5998324
#gear  0.4802848
#carb -0.5509251

If we have numeric/non-numeric columns, create an index of numeric columns ('i1'), get the names of 'x' and 'y' variables using this index and apply the cor

i1 <- sapply(loan_data_10v, is.numeric)
y1 <- "dep_column" #change it to actual column name
x1 <- setdiff(names(loan_data_10v)[i1], y1)
cor(loan_data_10v[x1], loan_data_10v[[y1]])
Raimondo answered 26/8, 2017 at 5:54 Comment(2)
Thank you! it work. I add use="complete.obs" in the cor, since i have na value in each of my varaibleWeeks
Hey akrun, would there be anyway to achieve this solution with data.table? I posted a question about it here: #56182771Defelice
G
1

Another option is the corrr package where you can specify the variable you want to focus on easily which returns a data.frame:

library(tidyverse)
library(corrr)
mtcars %>% 
  correlate() %>% 
  focus(mpg) 
# Correlation computed with
# • Method: 'pearson'
# • Missing treated using: 'pairwise.complete.obs'
# # A tibble: 10 × 2
#    term     mpg
#    <chr>  <dbl>
#  1 cyl   -0.852
#  2 disp  -0.848
#  3 hp    -0.776
#  4 drat   0.681
#  5 wt    -0.868
#  6 qsec   0.419
#  7 vs     0.664
#  8 am     0.600
#  9 gear   0.480
# 10 carb  -0.551

Its also useful if you want to remove other non-numeric variables first e.g.:

iris %>% 
  select_if(~!is.factor(.)) %>% 
  correlate() %>% 
  focus(Petal.Width)
# Correlation computed with
# • Method: 'pearson'
# • Missing treated using: 'pairwise.complete.obs'
# # A tibble: 3 × 2
#   term         Petal.Width
#   <chr>              <dbl>
# 1 Sepal.Length       0.818
# 2 Sepal.Width       -0.366
# 3 Petal.Length       0.963
Goldsberry answered 7/6, 2023 at 13:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.