Questions
Is there a best value to stay on so that I win the greatest percentage of games possible? If so, what is it?
Edit: Is there an exact probability of winning that can be calculated for a given limit, independent of whatever the opponent does? (I haven't done probability & statistics since college). I'd be interested in seeing that as an answer to contrast it with my simulated results.
Edit: Fixed bugs in my algorithm, updated result table.
Background
I've been playing a modified blackjack game with some rather annoying rule tweaks from the standard rules. I've italicized the rules that are different from the standard blackjack rules, as well as included the rules of blackjack for those not familiar.
Modified Blackjack Rules
- Exactly two human players (dealer is irrelevant)
- Each player is dealt two cards face down
- Neither player _ever_ knows the value of _any_ of the opponent's cards
- Neither player knows the value of the opponent's hand until _both_ have finished the hand
- Goal is to come as close to score of 21 as possible. Outcomes:
- If player's A & B have identical score, game is a draw
- If player's A & B both have a score over 21 (a bust), game is a draw
- If player A's score is <= 21 and player B has busted, player A wins
- If player A's score is greater than player B's, and neither have busted, player A wins
- Otherwise, player A has lost (B has won).
- Cards are worth:
- Cards 2 through 10 are worth the corresponding amount of points
- Cards J, Q, K are worth 10 points
- Card Ace is worth 1 or 11 points
- Each player may request additional cards one at a time until:
- The player doesn't want any more (stay)
- The player's score, with any Aces counted as 1, exceeds 21 (bust)
- Neither player knows how many cards the other has used at any time
- Once both players have either stayed or busted the winner is determined per rule 3 above.
- After each hand the entire deck is reshuffled and all 52 cards are in play again
What is a deck of cards?
A deck of cards consists of 52 cards, four each of the following 13 values:
2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A
No other property of the cards are relevant.
A Ruby representation of this is:
CARDS = ((2..11).to_a+[10]*3)*4
Algorithm
I've been approaching this as follows:
- I will always want to hit if my score is 2 through 11, as it is impossible to bust
- For each of the scores 12 through 21 I will simulate N hands against an opponent
- For these N hands, the score will be my "limit". Once I reach the limit or greater, I will stay.
- My opponent will follow the exact same strategy
- I will simulate N hands for every permutation of the sets (12..21), (12..21)
- Print the difference in wins and losses for each permutation as well as the net win loss difference
Here is the algorithm implemented in Ruby:
#!/usr/bin/env ruby
class Array
def shuffle
sort_by { rand }
end
def shuffle!
self.replace shuffle
end
def score
sort.each_with_index.inject(0){|s,(c,i)|
s+c > 21 - (size - (i + 1)) && c==11 ? s+1 : s+c
}
end
end
N=(ARGV[0]||100_000).to_i
NDECKS = (ARGV[1]||1).to_i
CARDS = ((2..11).to_a+[10]*3)*4*NDECKS
CARDS.shuffle
my_limits = (12..21).to_a
opp_limits = my_limits.dup
puts " " * 55 + "opponent_limit"
printf "my_limit |"
opp_limits.each do |result|
printf "%10s", result.to_s
end
printf "%10s", "net"
puts
printf "-" * 8 + " |"
print " " + "-" * 8
opp_limits.each do |result|
print " " + "-" * 8
end
puts
win_totals = Array.new(10)
win_totals.map! { Array.new(10) }
my_limits.each do |my_limit|
printf "%8s |", my_limit
$stdout.flush
opp_limits.each do |opp_limit|
if my_limit == opp_limit # will be a tie, skip
win_totals[my_limit-12][opp_limit-12] = 0
print " --"
$stdout.flush
next
elsif win_totals[my_limit-12][opp_limit-12] # if previously calculated, print
printf "%10d", win_totals[my_limit-12][opp_limit-12]
$stdout.flush
next
end
win = 0
lose = 0
draw = 0
N.times {
cards = CARDS.dup.shuffle
my_hand = [cards.pop, cards.pop]
opp_hand = [cards.pop, cards.pop]
# hit until I hit limit
while my_hand.score < my_limit
my_hand << cards.pop
end
# hit until opponent hits limit
while opp_hand.score < opp_limit
opp_hand << cards.pop
end
my_score = my_hand.score
opp_score = opp_hand.score
my_score = 0 if my_score > 21
opp_score = 0 if opp_score > 21
if my_hand.score == opp_hand.score
draw += 1
elsif my_score > opp_score
win += 1
else
lose += 1
end
}
win_totals[my_limit-12][opp_limit-12] = win-lose
win_totals[opp_limit-12][my_limit-12] = lose-win # shortcut for the inverse
printf "%10d", win-lose
$stdout.flush
end
printf "%10d", win_totals[my_limit-12].inject(:+)
puts
end
Usage
ruby blackjack.rb [num_iterations] [num_decks]
The script defaults to 100,000 iterations and 4 decks. 100,000 takes about 5 minutes on a fast macbook pro.
Output (N = 100 000)
opponent_limit
my_limit | 12 13 14 15 16 17 18 19 20 21 net
-------- | -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- --------
12 | -- -7666 -13315 -15799 -15586 -10445 -2299 12176 30365 65631 43062
13 | 7666 -- -6962 -11015 -11350 -8925 -975 10111 27924 60037 66511
14 | 13315 6962 -- -6505 -9210 -7364 -2541 8862 23909 54596 82024
15 | 15799 11015 6505 -- -5666 -6849 -4281 4899 17798 45773 84993
16 | 15586 11350 9210 5666 -- -6149 -5207 546 11294 35196 77492
17 | 10445 8925 7364 6849 6149 -- -7790 -5317 2576 23443 52644
18 | 2299 975 2541 4281 5207 7790 -- -11848 -7123 8238 12360
19 | -12176 -10111 -8862 -4899 -546 5317 11848 -- -18848 -8413 -46690
20 | -30365 -27924 -23909 -17798 -11294 -2576 7123 18848 -- -28631 -116526
21 | -65631 -60037 -54596 -45773 -35196 -23443 -8238 8413 28631 -- -255870
Interpretation
This is where I struggle. I'm not quite sure how to interpret this data. At first glance it seems like always staying at 16 or 17 is the way to go, but I'm not sure if it's that easy. I think it's unlikely that an actual human opponent would stay on 12, 13, and possibly 14, so should I throw out those opponent_limit values? Also, how can I modify this to take into account the variability of a real human opponent? e.g. a real human is likely to stay on 15 just based on a "feeling" and may also hit on 18 based on a "feeling"