Vowpal Wabbit - How to get prediction probabilities from contextual bandit model on a test sample

Asked 16/1, 2017 at 4:17 Answered 27/7, 2022 at 17:14

Given a trained contextual bandit model, how can I retrieve a prediction vector on test samples?

For example, let's say I have a train set named "train.dat" containing lines formatted as below

1:-1:0.3 | a b c  # <action:cost:probability | features> 
2:2:0.3 | a d d 
3:-1:0.3 | a b e
....

And I run below command.

vw -d train.dat --cb 30 -f cb.model --save_resume

This produces a file, 'cb.model'. Now, let's say I have a test dataset as below

| a d d 
| a b e

I'd like to see probabilities as below

0.2 0.7 0.1

The interpretation of these probabilities would be that action 1 should be picked 20% of the time, action 2 - 70%, and action 3 - 10% of the time.

Is there a way to get something like this?

Honestly answered 16/1, 2017 at 4:17 Comment(2)

I'm not sure about the answer to this since I haven't used --cb, but the vowpal-wabbit source tree on github has several --cb examples in test/RunTests with data-sets and results, so perhaps you should start there? Another trick that I often use is the option -a (aka --audit) which outputs the weights of features on stderr as vw runs. This can help gain deep visibility into the model in real-time. HTH. – Triolet 17/1, 2017 at 19:30

@Triolet Thank you for your reply as always! I will check out the --audit option. The relevant test seemed to be Test #121, where they use "--cb_explore k" with -p flag to output predictions, but I'm not sure what exactly the predictions are. More precisely, i'm not sure if the predictions represent probabilities over each of "k" actions, or probabilities over each "k" policies. – Honestly 21/1, 2017 at 20:56

When you use "--cb K", the prediction is the optimal arm/action based on argmax policy, which is a static policy.

When using "--cb_explore K", the prediction output contains the probability for each arm/action. Depending the policy you pick, the probabilities are calculated differently.

Allusion answered 5/2, 2018 at 2:35 Comment(1)

With "--cb_explore K" what estimator does VW use to predict the rewards for each arm ? Is it dr or ips or iwr or dm How to specify the type of estimator to use in the "--cb_explore K"? can I specify the --cb_type flag ? something like "--cb_explore k --cb_type dm" ? – Brain 17/3, 2022 at 17:28

If you send those lines to a daemon running your model, you'd get just that. You send a context, and the reply is a probability distribution across the number of allowed actions, presumably comprising the "recommendation" provided by the model.

Say you have 3 actions, like in your example. Start a contextual bandits daemon:

vowpalwabbit/vw -d train.dat --cb_explore 3 -t --daemon --quiet --port 26542

Then send a context to it:

| a d d

You'll get just what you want as the reply.

Centimeter answered 25/3, 2018 at 16:14 Comment(0)

In the Workspace Class, initialize the object and then call the method predict(prediction_type: int). Below are the corresponding parameter values

class PredictionType(IntEnum):
SCALAR = pylibvw.vw.pSCALAR
SCALARS = pylibvw.vw.pSCALARS
ACTION_SCORES = pylibvw.vw.pACTION_SCORES
ACTION_PROBS = pylibvw.vw.pACTION_PROBS
MULTICLASS = pylibvw.vw.pMULTICLASS
MULTILABELS = pylibvw.vw.pMULTILABELS
PROB = pylibvw.vw.pPROB
MULTICLASSPROBS = pylibvw.vw.pMULTICLASSPROBS
DECISION_SCORES = pylibvw.vw.pDECISION_SCORES
ACTION_PDF_VALUE = pylibvw.vw.pACTION_PDF_VALUE
PDF = pylibvw.vw.pPDF
ACTIVE_MULTICLASS = pylibvw.vw.pACTIVE_MULTICLASS
NOPRED = pylibvw.vw.pNOPRED

Sufferable answered 27/7, 2022 at 17:14 Comment(0)

Recommended topics

Hot tags