Warning message: In `...` : invalid factor level, NA generated
Asked Answered
S

5

146

I don't understand why I got this warning message.

> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))
> fixed[1, ] <- c("lunch", 100)
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "lunch") :
  invalid factor level, NA generated
> fixed
  Type Amount
1 <NA>    100
2           0
3           0
Scrope answered 29/5, 2013 at 17:5 Comment(0)
K
223

The warning message is because your "Type" variable was made a factor and "lunch" was not a defined level. Use the stringsAsFactors = FALSE flag when making your data frame to force "Type" to be a character.

> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))
> str(fixed)
'data.frame':   3 obs. of  2 variables:
 $ Type  : Factor w/ 1 level "": NA 1 1
 $ Amount: chr  "100" "0" "0"
> 
> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3),stringsAsFactors=FALSE)
> fixed[1, ] <- c("lunch", 100)
> str(fixed)
'data.frame':   3 obs. of  2 variables:
 $ Type  : chr  "lunch" "" ""
 $ Amount: chr  "100" "0" "0"
Knowling answered 29/5, 2013 at 17:9 Comment(2)
@Knowling Why does R convert it into Factor?Clack
Because that is the default setting in the data.frame() function (and it is default because that is the what most users want the vast majority of the time).Knowling
E
49

If you are reading directly from CSV file then do like this.

myDataFrame <- read.csv("path/to/file.csv", header = TRUE, stringsAsFactors = FALSE)
Enwrap answered 4/3, 2016 at 8:16 Comment(2)
stringAsFactors is throwing an error: unused argument (stringAsFactors=FALSE)Webster
stringsAsFactors - strings needs to be plural (@Coliban)Lasseter
L
28

Here is a flexible approach, it can be used in all cases, in particular:

  1. to affect only one column, or
  2. the dataframe has been obtained from applying previous operations (e.g. not immediately opening a file, or creating a new data frame).

First, un-factorize a string using the as.character function, and, then, re-factorize with the as.factor (or simply factor) function:

fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))

# Un-factorize (as.numeric can be use for numeric values)
#              (as.vector  can be use for objects - not tested)
fixed$Type <- as.character(fixed$Type)
fixed[1, ] <- c("lunch", 100)

# Re-factorize with the as.factor function or simple factor(fixed$Type)
fixed$Type <- as.factor(fixed$Type)
Libation answered 2/8, 2016 at 16:38 Comment(0)
C
7

The easiest way to fix this is to add a new factor to your column. Use the levels function to determine how many factors you have and then add a new factor.

    > levels(data$Fireplace.Qu)
    [1] "Ex" "Fa" "Gd" "Po" "TA"
    > levels(data$Fireplace.Qu) = c("Ex", "Fa", "Gd", "Po", "TA", "None")
    [1] "Ex"   "Fa"   "Gd"   "Po"   " TA"  "None"
Custodian answered 27/7, 2017 at 5:14 Comment(0)
A
0

I have got similar issue which data retrieved from .xlsx file. Unfortunately, I could not find the proper answer here. I handled it on my own with dplyr as below which might help others:

#install.packages("xlsx")
library(xlsx)
extracted_df <- read.xlsx("test.xlsx", sheetName='Sheet1', stringsAsFactors=FALSE)
# Replace all NAs in a data frame with "G" character
extracted_df[is.na(extracted_df)] <- "G"

However, I could not handle it with the readxl package which does not have similar parameter to the stringsAsFactors. For the reason, I have moved to the xlsx package.

Auburn answered 11/6, 2020 at 14:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.