I looked into the the post on the same thing in Python, but I want a solution in R. I'm working on the Titanic dataset from Kaggle, and it looks like this:
'data.frame': 891 obs. of 13 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : num 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
$ Age : num 22 38 26 35 35 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Child : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
$ Embarked.C : num 0 1 0 0 0 0 0 0 0 1 ...
$ Embarked.Q : num 0 0 0 0 0 1 0 0 0 0 ...
$ Embarked.S : num 1 0 1 1 1 0 1 1 1 0 ...
$ Sex.female : num 0 1 1 1 0 0 0 0 1 1 ...
$ Sex.male : num 1 0 0 0 1 1 1 1 0 0 ...
This is after I used dummy variables. My test set:
'data.frame': 418 obs. of 12 variables:
$ PassengerId: int 892 893 894 895 896 897 898 899 900 901 ...
$ Pclass : Factor w/ 3 levels "1","2","3": 3 3 2 3 3 3 3 2 3 3 ...
$ Age : num 34.5 47 62 27 22 14 30 26 18 21 ...
$ SibSp : int 0 1 0 0 1 0 0 1 0 2 ...
$ Parch : int 0 0 0 0 1 0 0 1 0 0 ...
$ Fare : num 7.83 7 9.69 8.66 12.29 ...
$ Child : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Embarked.C : num 0 0 0 0 0 0 0 0 1 0 ...
$ Embarked.Q : num 1 0 1 0 0 0 1 0 0 0 ...
$ Embarked.S : num 0 1 0 1 1 1 0 1 0 1 ...
$ Sex.female : num 0 1 0 0 1 0 1 0 1 0 ...
$ Sex.male : num 1 0 1 1 0 1 0 1 0 1 ...
I ran xgboost using the following code:
> param <- list("objective" = "multi:softprob",
+ "max.depth" = 25)
> xgb = xgboost(param, data = trmat, label = y, nround = 7)
[0] train-rmse:0.350336
[1] train-rmse:0.245470
[2] train-rmse:0.171994
[3] train-rmse:0.120511
[4] train-rmse:0.084439
[5] train-rmse:0.059164
[6] train-rmse:0.041455
trmat is:
trmat = data.matrix(train)
and temat is:
temat = data.matrix(test)
and y is the survived variable:
y = train$Survived
But wen i run the predict function:
> x = predict(xgb, newdata = temat)
> x[1:10]
[1] 0.9584613 0.9584613 0.9584613 0.9584613 0.9584613 0.9584613 0.9584613
[8] 0.9584613 0.9584613 0.9584613
All probabilities are being predicted to be the same. In the python question, someone said increasing max.depth would work, but it didn't. What am I doing wrong?