There are many different ways of thinking about the XOR operation. Here are two of them and how they'd lead to the solution.
Intuition 1: XOR as Self-Inverting Addition
Let's suppose you have an array of numbers where all values but one appear twice and one value appears once. What happens if you add up all those numbers? You'd end up with something of the form 2a + b, where b is the value that appears once and a is the sum of all the values that appear twice. We'd like to somehow go from the sum 2a + b to just b, which means we need the 2a term to disappear. How can we do this?
With regular addition, you can't. But what if we had some sort of magical addition operator that had the property that x + x = 0 for any choice of x? In that case, the expression 2a + b would magically turn into b, the value you were looking for.
A very useful, helpful intuition to have for XOR is that it behaves kinda sorta ish like addition, except that adding the same value twice cancels out. This is a supremely useful property of XOR, and once you know it exists, you'll see it pop up from time to time. Here's sampler of where else it arises:
- In explaining the solution to the impossible chessboard problem, Grant Sanderson of 3blue1brown fame describes solving the problem by recognizing he needed a weird form of addition where adding or subtracting a value are the same operation.
- In the cuckoo filter data structure, this property of XOR is used to move items between different slots without necessarily being able to compute what those two slots are.
- In an XOR linked list, this property is used to advance forward or backward in the linked list.
- In cryptography, one-time pads, a theoretically unbreakable form of encryption, are often implemented using XOR (the Vernam cipher). XORing the message with the key generates the ciphertext, and XORing again produces the plaintext.
- Fountain codes are a way of transmitting a message to a collection of receivers who aren't listening the whole time. XOR is used to pack multiple blocks together and then extract individual blocks when enough messages are received.
In that sense, if you're familiar with this particular perspective on the XOR operation, it's somewhat natural to arrive at the idea to apply it to this problem.
Intuition 2: XOR as Bit Parities
Here's another way to solve the problem. Let's try a concrete example. Suppose I have this list of numbers, where, as in the problem statement, all numbers appear twice except for one, which appears once.
45, 21, 21, 89, 41, 89, 41,
32, 96, 52, 30, 42, 76, 17,
36, 59, 96, 25, 15, 52, 36,
59, 76, 32, 15, 85, 95, 35,
25, 85, 30, 45, 95, 35, 17
For convenience, I've made everything exactly two digits long. How might we find which number appears exactly once? Well, since we know the number is two digits long, perhaps we could work it out one digit at a time. For example, what's its first digit? The main insight we need is to build a frequency histogram of the first digits of these numbers. If we do that, we get the following:
Digit: 0 1 2 3 4 5 6 7 8 9
Frequency: 0 2 4 8 5 4 0 2 4 6
Interesting - all of these frequencies are even except for the frequency of the digit 4. That must mean that the repeated number has a first digit of 4. Why? Well, whichever number appears only once will count once toward the frequency of its first digit. All other numbers, which appear twice, will count twice toward their digits. So the number that appears an odd number of times must be the one that corresponds to our mystery number. Neat! So that means our mystery number starts with a 4.
What about the second digit? We can work that out as well using the same trick. Here's the frequency histogram:
Digit: 0 1 2 3 4 5 6 7 8 9
Frequency: 2 4 5 0 0 12 6 2 0 4
Since 2 is the odd one out (literally!), it's the digit we want. So our number must be 42. Very cool!
This insight gives us an algorithm we can use to figure out what the mystery number is. Let's assume all our numbers are d digits long. We can then do the following:
for i = 1 to d:
make a frequency histogram of the ith digits of each number
find the one that appears an odd number of times
write that down as the ith digit of the number
return the number we made this way
This algorithm has a runtime of O(nd) and uses O(d) auxiliary storage space. (There are d passes through the array, each of which visits n numbers, and overall we write down d digits.) If d is a constant - say, the numbers are all 10 digits or fewer, this runs in O(n) time and uses O(1) space.
This seems to bear little resemblance to our XOR solution, but it turns out that with just a little bit of tweaking we'll arrive at exactly our XOR algorithm. For starters, I was working with decimal digits in the example because that's how we typically think of numbers. But computers do all their work in binary (base 2), so perhaps we should meet the hardware where it is and use that instead. In that case, our algorithm looks like this, assuming our numbers are b bits long
for i = 1 to b:
look at the ith bit of each number.
if there's an odd number of 1 bits,
write down that the ith bit of our number is 1
else
write down that the ith bit of our number is 0
return the number we made
This algorithm is basically the same as the one from before, just specialized for binary.
This is where XOR comes in, because another powerful intuition for XOR is that the XOR of a collection of bits is 1 if there's an odd number of 1 bits and 0 otherwise. So to implement the above algorithm, we could start by XORing together all the first bits of the numbers in our array, which will give us the first bit of the unique number, then XOR the second bits of our array, which will give us the second bit of our number, etc.
But we have a neat trick up our sleeves - we don't need to work one bit at a time! The bitwise XOR operator basically does this for all the bits of the number in a single go, and so we can just XOR all the numbers together and we'll be left with our mystery number.
This technique - using a bitwise XOR to determine the parity of a collection of bits - is another common application of XOR. You can see this same idea used in the following other places:
- Parity bits are a simple way of checking for errors in data transmission. They work by writing down the XOR of all the bits in a message, which can detect if a single bit has been flipped. (This generalizes to Hamming codes), which uses a similar idea in a more elaborate way.
- The odd sketch is a space-efficient data structure for estimating how many items are in a set by tracking whether an odd or even number of balls have landed in various bins. XORing them together combines the sketch together for two sets by adding the parities of the number of balls in each bin.
Hope this helps!
result
that is an illegal value. – Afterdinner[2,3,4,5]
, for example, gives 0 but does not consist of pairs. – Sleave