I have been trying to do stepwise selection on my variables with R. This is my code:
library(lattice)#to get the matrix plot, assuming this package is already installed
library(ftsa) #to get the out-of sample performance metrics, assuming this package is already installed
library(car)
mydata=read.csv("C:/Users/jgozal1/Desktop/Multivariate Project/Raw data/FINAL_alldata_norowsunder90_subgroups.csv")
names(mydata)
str(mydata)
mydata$country_name=NULL
mydata$country_code=NULL
mydata$year=NULL
mydata$Unemployment.female....of.female.labor.force...modeled.ILO.estimate.=NULL
mydata$Unemployment.male....of.male.labor.force...modeled.ILO.estimate.=NULL
mydata$Life.expectancy.at.birth.male..years.= NULL
mydata$Life.expectancy.at.birth.female..years. = NULL
str(mydata)
Full_model=lm(mydata$Fertility.rate.total..births.per.woman. + mydata$Immunization.DPT....of.children.ages.12.23.months. + mydata$Immunization.measles....of.children.ages.12.23.months. + mydata$Life.expectancy.at.birth.total..years. + mydata$Mortality.rate.under.5..per.1000.live.births. + mydata$Improved.sanitation.facilities....of.population.with.access. ~ mydata$Primary.completion.rate.female....of.relevant.age.group. + mydata$School.enrollment.primary....gross. + mydata$School.enrollment.secondary....gross. + mydata$School.enrollment.tertiary....gross. + mydata$Internet.users..per.100.people. + mydata$Primary.completion.rate.male....of.relevant.age.group. + mydata$Mobile.cellular.subscriptions..per.100.people. + mydata$Foreign.direct.investment.net.inflows..BoP.current.US.. + mydata$Unemployment.total....of.total.labor.force...modeled.ILO.estimate., data= mydata)
summary(Full_model) #this provides the summary of the model
Reduced_model=lm(mydata$Fertility.rate.total..births.per.woman. + mydata$Immunization.DPT....of.children.ages.12.23.months. + mydata$Immunization.measles....of.children.ages.12.23.months. + mydata$Life.expectancy.at.birth.total..years. + mydata$Mortality.rate.under.5..per.1000.live.births. + mydata$Improved.sanitation.facilities....of.population.with.access. ~1,data= mydata)
step(Reduced_model,scope=list(lower=Reduced_model, upper=Full_model), direction="forward", data=mydata)
step(Full_model, direction="backward", data=mydata)
step(Reduced_model,scope=list(lower=Reduced_model, upper=Full_model), direction="both", data=mydata)
This is the link to the dataset that I am using: http://speedy.sh/YNXxj/FINAL-alldata-norowsunder90-subgroups.csv
After setting the scope for my stepwise I get this error:
Error in step(Reduced_model, scope = list(lower = Reduced_model, upper = Full_model), : number of rows in use has changed: remove missing values? In addition: Warning messages: 1: In add1.lm(fit, scope$add, scale = scale, trace = trace, k = k, : using the 548/734 rows from a combined fit 2: In add1.lm(fit, scope$add, scale = scale, trace = trace, k = k, : using the 548/734 rows from a combined fit
I have looked at other posts with the same error and the solutions usually is to omit the NAs from the data used, but that hasn't solved my problem and I am still getting exactly the same error.
mydata<-na.omit(mydata)
before fitting the full model makes the error go away in my test. (Though it does reduce the number of observations from 825 to 548.) – Principality