xgboost : The meaning of the base_score parameter
Asked Answered
S

1

22

In the documentation of xgboost I read:

base_score [default=0.5] : the initial prediction score of all instances, global bias

What is the meaning of this phrase? Is the base score the prior probability of the Event of Interest in the Dataset? I.e. in a dataset of 1,000 observations with 300 Positives and 700 Negatives the base score would be 0.3?

If not, what it would be?

Your advice will be appreciated.

Sphagnum answered 1/12, 2017 at 15:24 Comment(1)
That's only the interpretation for two-class/binary. It wouldn't make any sense in multiclass.Hageman
D
16

I think your understanding is correct, in your example the base score could be set to 0.3, or you can simply leave it to be the default 0.5. For highly imbalanced data you can initialize it to a more meaningful base score for an improved learning process. Theoretically, as long as you choose the right learning rate and give it enough steps to train, the starting base score shouldn't affect the result. Look at the author's answer in this issue.

Reference: https://github.com/dmlc/xgboost/issues/799

Dissipate answered 20/3, 2018 at 20:27 Comment(1)
Your answer is only for the two-class (binary) case, this wouldn't make any sense for multiclass. See the discussion they linked to on the equivalent base_margin default in multiclass #1380, where xgboost (pre-2017) used to make the default assumption that base_score = 1/nclasses, which is a-priori really dubious if there's a class imbalance, but they say "if you use enough training steps this goes away", which is not good for out-of-the-box performance in data exploration, etc. Anyway they fixed that in 2017.Hageman

© 2022 - 2024 — McMap. All rights reserved.