Any way to automatically correct all variable classes in a dataframe

I have a dataframe with about ~250 variables. Unfortunately, all of these variables were imported as character classes from a sql database using sqldf. The problem: all of them should not be character classes. There are numeric variables, integers, as well as dates. I'd like to build a model that runs over all the variables and to do this I need to make sure that variables have the right classes. Doing it one by one is probably best, but still very manual.

How could I automatically correct all classes? Perhaps a way to detect whether there are alphabet characters in the column or only number characters?

I don't think it's possible for an automatic approach to be perfect in correcting all classes. But it might correct most of the classes, then those that are not good, I can take care of them manually.

I am adding a sqldf tag in case anybody knows of any way to correct this when importing the data, but I assume it's not sqldf's fault but rather the database's.

The closest thing to "automatic" type conversion on a data frame would probably be

df[] <- lapply(df, type.convert)

where df is your data set. The function type.convert()

Converts a character vector to logical, integer, numeric, complex or factor as appropriate.

Have a read of help(type.convert), it might be just what you want.

In my experience, type.convert() is very reliable. You can use as.is = TRUE if you don't want characters coerced to factors. Plus it's used internally in many important R functions (like read.table), so it's definitely safe.

Here's a quick example of it working on iris. First we'll change all columns to character, then run type.convert() on it.

## Original column classes in iris
sapply(iris, class)
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#    "numeric"    "numeric"    "numeric"    "numeric"     "factor" 

## Change all columns to character
iris[] <- lapply(iris, as.character)
sapply(iris, class)
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#  "character"  "character"  "character"  "character"  "character" 

## Run type.convert()
iris[] <- lapply(iris, type.convert)
sapply(iris, class)
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#    "numeric"    "numeric"    "numeric"    "numeric"     "factor"

We can see that the columns were returned to their original classes. This is because type.convert() coerces columns to the "most appropriate" type.

Recommended topics

Hot tags