How is the parameter "weight" (DMatrix) used in the gradient boosting procedure (xgboost)?
Asked Answered
O

2

22

In xgboost it is possible to set the parameter weight for a DMatrix. This is apparently a list of weights wherein each value is a weight for a corresponding sample. I can't find any information on how these weights are actually used in the gradient boosting procedure. Are they related to eta ?

For example, if I would set weight to 0.3 for all samples and eta to 1, would this be the same as setting eta to 0.3 and weight to 1?

Octet answered 14/3, 2016 at 9:20 Comment(1)
the docs are really lacking on this but I have been using instance weights gently a little bit and dug up a few links.. good questionTestis
T
28

xgboost allows for instance weighting during the construction of the DMatrix, as you noted. This weight is directly tied the instance and travels with it throughout the entire training. Thus it is included in the calculations of the gradients and hessians, and directly impacts the split points and traing of an xgboost model.

see here and here

Instance Weight File

XGBoost supports providing each instance an weight to differentiate the importance of instances. For example, if we provide an instance weight file for the "train.txt" file in the example as below:

train.txt.weight

1

0.5

0.5

1

0.5

It means that XGBoost will emphasize more on the first and fourth instance, that is to say positive instances while training. The configuration is similar to configuring the group information. If the instance file name is "xxx", XGBoost will check whether there is a file named "xxx.weight" in the same directory and if there is, will use the weights while training models.

It is very different from eta

eta simply tells xgboost how much the blend the last tree trained into the ensemble. A measure of how greedy the ensemble should be at each iteration.

For example, if I would set weight to 0.3 for all samples and eta to 1, would this be the same as setting eta to 0.3 and weight to 1?

  • A constant weight of 1 for all instances is the default, so changing that to a constant of .3 for all instances would still be equal weighting, so this shouldn't impact things too much. However, setting eta up to 1, from .3, would make the training much more aggressive.

Testis answered 17/3, 2016 at 13:22 Comment(2)
When you say This weight is directly tied the instance and travels with it throughout the entire training what do you mean in detail? Is there any reference to better understand how the weights impact the training procedure?Strikebreaker
do weights have to be between 0 and 1?Siegbahn
P
1

weight in xgboost's DMatrix is the only correct way to enter exposure variable (e.g. insurance policy duration) for Poisson-distributed targets, e.g. insurance claims frequency (i.e. when 'objective': 'count:poisson').

More info

So what are the incorrect ways? Contrary to replies on Stack Exchange, base_margin and set_base_margin. I benchmarked all these three options against GLM's sample_weight (exposed by PoissonRegressor's fit method), and only weight produced similarly unbiased models (as seen from actual vs. predicted plots for individual features). Not to mention the early stopping metrics (Poisson negative log-likehood or RMSE) being markedly improved (lower) by switching to weight (down to the level that improves over GLM).

Papillote answered 1/8, 2021 at 12:11 Comment(1)
Thanks! that's very interesting! considering I'm using the exposure in the XGB weights (as you suggested). How should I use the weights to emphasize the frequency? I calculated weights based on my dataset's occurrence rate ( e.g claims rate in the past year) Do you have an idea for how to use those weights given I'm using the sample weights for the exposure (which makes a lot of sense)?Santana

© 2022 - 2024 — McMap. All rights reserved.