Getting "NA" when I run a standard deviation
Asked Answered
F

4

19

Quick question. I read my csv file into the variable data. It has a column label var, which has numerical values.

When I run the command

sd(data$var)

I get

[1] NA 

instead of my standard deviation.

Could you please help me figure out what I am doing wrong?

Frieze answered 21/4, 2011 at 4:28 Comment(0)
A
35

Try sd(data$var, na.rm=TRUE) and then any NAs in the column var will be ignored. Will also pay to check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like head(data), tail(data), and str(data) should help with that.

Addieaddiego answered 21/4, 2011 at 4:29 Comment(3)
I added str() to your answer as I find it helpful for these sorts of debugging tasks, but didn't feel it warranted it's own answer. Hope you don't mind, feel free to roll back.Barbed
summary(data) is probably the easiest way to see if there are NA in the data.Len
Sometimes, as 'is.numeric()' may help.Botha
K
13

I've made the mistake a time or two of reusing variable names in dplyr strings which has caused issues.

mtcars %>%
  group_by(gear) %>%
  mutate(ave = mean(hp)) %>%
  ungroup() %>%
  group_by(cyl) %>%
  summarise(med = median(ave),
            ave = mean(ave), # should've named this variable something different
            sd = sd(ave)) # this is the sd of my newly created variable "ave", not the original one.
Krys answered 29/5, 2020 at 3:44 Comment(1)
My problem was I named my variable "mean" which seemed like a good idea at the time! I was wondering why na.rm=T wasn't working.Messy
C
6

You probably have missing values in var, or the column is not numeric, or there's only one row.

Try removing missing values which will help for the first case:

sd(dat$var, na.rm = TRUE)

If that doesn't work, check that

class(dat$var)

is "numeric" (the second case) and that

nrow(dat)

is greater than 1 (the third case).

Finally, data is a function in R so best to use a different name, which I've done here.

Citrine answered 21/4, 2011 at 4:32 Comment(0)
S
0

There may be Inf or -Inf as values in the data.

Try

is.finite(data)

or

min(data, na.rm = TRUE)
max(data, na.rm = TRUE)

to check if that is indeed the case.

Stamata answered 2/1, 2019 at 20:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.