How can I find the missing value more concisely?
Asked Answered
M

11

75

The following code checks if x and y are distinct values (the variables x, y, z can only have values a, b, or c) and if so, sets z to the third character:

if x == 'a' and y == 'b' or x == 'b' and y == 'a':
    z = 'c'
elif x == 'b' and y == 'c' or x == 'c' and y == 'b':
    z = 'a'
elif x == 'a' and y == 'c' or x == 'c' and y == 'a':
    z = 'b'

Is is possible to do this in a more, concise, readable and efficient way?

Mertiemerton answered 9/1, 2012 at 17:21 Comment(2)
The short answer is "Yes!" Python's sets are excellent for checking distinctness and for computing unused elements.Custody
Thanks all for the answers ,I guess i'll use the solution using set as its reasonably fast and readable,The lookup table based answer by Óscar López is also intresting.Mertiemerton
T
62
z = (set(("a", "b", "c")) - set((x, y))).pop()

I am assuming that one of the three cases in your code holds. If this is the case, the set set(("a", "b", "c")) - set((x, y)) will consist of a single element, which is returned by pop().

Edit: As suggested by Raymond Hettinger in the comments, you could also use tuple unpacking to extract the single element from the set:

z, = set(("a", "b", "c")) - set((x, y))
Tales answered 9/1, 2012 at 17:23 Comment(7)
If you're using Python 2.7/3.1 or later, you can write it even more concisely using set literals, like so: z = ({'a', 'b', 'c'} - {x, y}).pop()Labdanum
The pop() is unnecessary and slow. Use tuple unpacking instead. Also, the set(("a", "b", "c")) is invariant so it can be precomputed one time, leaving only the set differencing to be used in a loop (if it isn't used in a loop, then we don't care as much about speed).Custody
OP Bunny made it clear, by "checks if distinct values" and "if so", that you can't count on x and y being different.Axe
@Ed: I know, but the OP did not specify what to do when x == y, so I omitted the test. It's easy enough to add if x != y: if needed.Tales
Personally, I would opt for the first one as it's more obvious than a random comma.Mammary
@EdStaub: I might actually have strengthened the question slightly since I added the "if so" part in my first edit.Trailblazer
You may want to replace set(("a", "b", "c")) by set("abc").Saurel
S
47

The strip method is another option that runs quickly for me:

z = 'abc'.strip(x+y) if x!=y else None
Slashing answered 9/1, 2012 at 19:15 Comment(3)
+1 It's also very transparent, and unlike most of the answers, it deals with x==y.Axe
Nice idea, +1; though I actually think that "a", "b" and "c" in the original post are just placeholders for the real values. This solution does not generalise to any other value type than strings of length 1.Tales
@Slashing thats creative! Thanks for answering chepner.Mertiemerton
C
27

Sven's excellent code did just a little too much work and chould have used tuple unpacking instead of pop(). Also, it could have added a guard if x != y to check for x and y being distinct. Here is what the improved answer looks like:

# create the set just once
choices = {'a', 'b', 'c'}

x = 'a'
y = 'b'

# the main code can be used in a loop
if x != y:
    z, = choices - {x, y}

Here are the comparative timings with a timing suite to show the relative performance:

import timeit, itertools

setup_template = '''
x = %r
y = %r
choices = {'a', 'b', 'c'}
'''

new_version = '''
if x != y:
    z, = choices - {x, y}
'''

original_version = '''
if x == 'a' and y == 'b' or x == 'b' and y == 'a':
    z = 'c'
elif x == 'b' and y == 'c' or x == 'c' and y == 'b':
    z = 'a'
elif x == 'a' and y == 'c' or x == 'c' and y == 'a':
    z = 'b'
'''

for x, y in itertools.product('abc', repeat=2):
    print '\nTesting with x=%r and y=%r' % (x, y)
    setup = setup_template % (x, y)
    for stmt, name in zip([original_version, new_version], ['if', 'set']):
        print min(timeit.Timer(stmt, setup).repeat(7, 100000)),
        print '\t%s_version' % name

Here are the results of the timings:

Testing with x='a' and y='a'
0.0410830974579     original_version
0.00535297393799    new_version

Testing with x='a' and y='b'
0.0112571716309     original_version
0.0524711608887     new_version

Testing with x='a' and y='c'
0.0383319854736     original_version
0.048309803009      new_version

Testing with x='b' and y='a'
0.0175108909607     original_version
0.0508949756622     new_version

Testing with x='b' and y='b'
0.0386209487915     original_version
0.00529098510742    new_version

Testing with x='b' and y='c'
0.0259420871735     original_version
0.0472128391266     new_version

Testing with x='c' and y='a'
0.0423510074615     original_version
0.0481910705566     new_version

Testing with x='c' and y='b'
0.0295209884644     original_version
0.0478219985962     new_version

Testing with x='c' and y='c'
0.0383579730988     original_version
0.00530385971069    new_version

These timings show that the original-version performance varies quite a bit depending on which if-statements are triggered by the various the input values.

Custody answered 9/1, 2012 at 19:6 Comment(6)
Your test seems biased. The so-called "set_version" is only sometimes faster because it is guarded by an additional if statement.Marcellusmarcelo
@Marcellusmarcelo That was what the problem specification called for: "checks if x and y are distinct values and if so, sets z to the third character". The fastest (and most straight-forward) way to check whether values are distinct is x != y. Only when they are distinct do we the set-difference to determine the third-character :-)Custody
The point I was making is that your tests don't show that the set_version performs better because it is based on sets; it only performs better because of the guarding if statement.Marcellusmarcelo
@Marcellusmarcelo That's a weird reading of the test results. The code does what the OP asked for. The timings show the comparative performance with all possible groups of inputs. It's up to you how you want to interpret those. The timings for the version using if x != y: z, = choices - {x, y} fares reasonably well compared to the OP's original code. I don't know where your notion of bias comes in -- the timings are what they are, and AFAICT, this is still the best of the answers that has been posted. It is both clean and fast.Custody
Several optimisations were added to Sven's "set-version", but the same was not done for the "if-version". Adding an if x != y guard to the "if-version" would probably make it more consistent and better performing than all of the other solutions that have been offered so far (although obviously not as readable and concise). Your "set_version" is a very good solution - it's just not quite as good as the tests make it seem ;-)Marcellusmarcelo
@All I already had a if x != y in my code , I think i should have included that in the questionMertiemerton
A
18
z = (set('abc') - set(x + y)).pop()

Here are all of the scenarios to show that it works:

>>> (set('abc') - set('ab')).pop()   # x is a/b and y is b/a
'c'
>>> (set('abc') - set('bc')).pop()   # x is b/c and y is c/b
'a'
>>> (set('abc') - set('ac')).pop()   # x is a/c and y is c/a
'b'
Arri answered 9/1, 2012 at 17:24 Comment(0)
T
15

If the three items in question weren't "a", "b" and "c", but rather 1, 2 and 3, you could also use a binary XOR:

z = x ^ y

More generally, if you want to set z to the remaining one of three numbers a, b and c given two numbers x and y from this set, you can use

z = x ^ y ^ a ^ b ^ c

Of course you can precompute a ^ b ^ c if the numbers are fixed.

This approach can also be used with the original letters:

z = chr(ord(x) ^ ord(y) ^ 96)

Example:

>>> chr(ord("a") ^ ord("c") ^ 96)
'b'

Don't expect anyone reading this code to immediately figure out what it means :)

Tales answered 9/1, 2012 at 18:2 Comment(2)
+1 This solution seems nice and elegant; and if you make it its own function and give the magic number 96 a name the logic is pretty easy to follow/maintain (xor_of_a_b_c = 96 # ord('a') ^ ord('b') ^ ord('c') == 96). However, in terms of raw speed this is about 33% slower than the long chain of if / elifs; but 500% faster than the set method.Molly
@sven Thanks for introducing the XOR operator to me,Your solution is clean and elegant, I think this example will make it stick to my brain,Thanks again :)Mertiemerton
T
13

I think the solution by Sven Marnach and F.J is beautiful, but it's not faster in my little test. This is Raymond's optimized version using a pre-computed set:

$ python -m timeit -s "choices = set('abc')" \
                   -s "x = 'c'" \
                   -s "y = 'a'" \
                      "z, = choices - set(x + y)"
1000000 loops, best of 3: 0.689 usec per loop

This is the original solution:

$ python -m timeit -s "x = 'c'" \
                   -s "y = 'a'" \
                      "if x == 'a' and y == 'b' or x == 'b' and y == 'a':" \
                      "    z = 'c'" \
                      "elif x == 'b' and y == 'c' or x == 'c' and y == 'b':" \
                      "    z = 'a'" \
                      "elif x == 'a' and y == 'c' or x == 'c' and y == 'a':" \
                      "    z = 'b'"
10000000 loops, best of 3: 0.310 usec per loop

Note that this is the worst possible input for the if-statements since all six comparisons will have to be tried out. Testing with all values for x and y gives:

x = 'a', y = 'b': 0.084 usec per loop
x = 'a', y = 'c': 0.254 usec per loop
x = 'b', y = 'a': 0.133 usec per loop
x = 'b', y = 'c': 0.186 usec per loop
x = 'c', y = 'a': 0.310 usec per loop
x = 'c', y = 'b': 0.204 usec per loop

The set-based variant shows the same performance for different inputs, but it is consistently between 2 and 8 times slower. The reason is that the if-based variant runs much simpler code: equality tests compared to hashing.

I think both types of solutions are valuable: it's important to know that creating "complicated" data structures like sets cost you something in performance — while they give you a lot in readability and development speed. The complex data types are also much better when the code change: it's easy to extend the set-based solution to four, five, ... variables whereas the if-statements quickly turn into a maintenance nightmare.

Trailblazer answered 9/1, 2012 at 17:39 Comment(7)
@martinGeisler thanks a lot for your reply i'd had no idea that we can time things like this in python.I have a gut feeling that Chessmasters solution would just work fine and efficient, I'll try to test it like you did with other answers and let you know.Mertiemerton
agreed that readability counts , but i am just trying to solve a programming challenge that might land me job somewhere , so for now its efficiency first readability second me for now :PMertiemerton
The set-based solution optimize conciseness and readability (and elegance). But efficiency was also mentioned, so I went and investigated the performance of the proposed solutions.Trailblazer
@MartinGeisler: Yes, when I noticed this, I removed my comment. And I usually do find it interesting at least to know what is faster.Tales
@BunnyRabbit: the timeit module is great for micro-benchmarks like this. You should of course profile your overall program first to determine where the bottlenecks are, but when they're identified, then timeit can be a great way to quickly try different implementations against each other.Trailblazer
+1 -- benchmark test that proofs simple series of compares is logical and fast.Ampliate
I can't imagine the performance difference would make any noticeable difference, unless this is being called in a tight-loop.Presentationism
P
8
z = 'a'*('a' not in x+y) or 'b'*('b' not in x+y) or 'c'

or less hackish and using Conditional Assignment

z = 'a' if ('a' not in x+y) else 'b' if ('b' not in x+y) else 'c'

but probably the dict solution is faster... you'd have to time it.

Progressive answered 9/1, 2012 at 17:44 Comment(0)
E
8

Try this option, using dictionaries:

z = {'ab':'c', 'ba':'c', 'bc':'a', 'cb':'a', 'ac':'b', 'ca':'b'}[x+y]

Of course, if the x+y key is not present in the map, it'll produce a KeyError which you'll have to handle.

If the dictionary is precomputed a single time and stored for future use, the access will be much faster, since no new data structures will have to be created for each evaluation, only a string concatenation and a dictionary lookup are needed:

lookup_table = {'ab':'c', 'ba':'c', 'bc':'a', 'cb':'a', 'ac':'b', 'ca':'b'}
z = lookup_table[x+y]
Enfilade answered 9/1, 2012 at 17:45 Comment(5)
Just for fun, here is another dict option: {1: 'c', 2: 'b', 3: 'a'}[ord(x)+ord(y)-ord('a')*2], extra complexity probably isn't worth the saved space though.Arri
@F.J: z = {1: 'a', 2: 'b', 3: 'c'}[2*('a' in x+y)+('b' in x+y)] this is fun...Progressive
Is it faster than the OP original code? if so, why? How can calculation of hash values be faster than simple comparison?Ashbaugh
@Ashbaugh it a single hash calculation, not a whole bunch of comparisons and conditional expressionsSoubrette
Cool.. Didn't realize how fast the hash function is!Ashbaugh
P
2

I think it should looks like that:

z = (set(("a", "b", "c")) - set((x, y))).pop() if x != y else None
Proptosis answered 9/1, 2012 at 17:38 Comment(2)
len(set((x, y))) == 2 is the most unreadable way to write x != y I've ever seen :)Tales
Yes, Sven))) Thanks about your comment. This script had an another basic idea when I started write it)) An finally I forgot to edit that.Proptosis
A
1

Using list comprehension, assuming like others that one of the three cases in your code holds:

l = ['a', 'b', 'c']
z = [n for n in l if n not in [x,y]].pop()

Or, like in the accepted answer, taking advantage of the tuple to unpack it,

z, = [n for n in l if n not in [x,y]]
Arzola answered 9/1, 2012 at 18:57 Comment(0)
P
0

See if this works

if a not in xy
    z= 'a'
if b not in xy
    z='b'
if c not in xy
    z='c'
Predict answered 10/1, 2012 at 0:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.