You really don't need to use two loops.
Correct way to use dicts
Let's say you have a dict
:
my_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 5, 'g': 6}
Your code is basically equivalent to:
for (key, value) in my_dict.items():
if key == 'c':
print(value)
break
#=> 3
But the whole point of dict
(and set
, Counter
, ...) is to be able to get the desired value directly:
my_dict['c']
#=> 3
If your dict has 1000 values, the first example will be 500 times slower than the second, on average. Here's a simple description I've found on Reddit:
A dict is like a magic coat check room. You hand your coat over and
get a ticket. Whenever you give that ticket back, you immediately get
your coat. You can have a lot of coats, but you still get your coat
back immediately. There is a lot of magic going on inside the coat
check room, but you don't really care as long as you get your coat
back immediately.
Refactored code
You just need to find a common signature between "Today is a good day!"
and "Is today a good day?"
. One way would be to extract the words, convert them to lowercase, sort them and join them. What's important is that the output should be immutable (e.g. tuple
, string
, frozenset
). This way, it can be used inside sets, Counters or dicts directly, without needing to iterate over every key.
from collections import Counter
sentences = ["Today is a good day", 'a b c', 'a a b c', 'c b a', "Is today a good day"]
vocab = Counter()
for sentence in sentences:
sorted_words = ' '.join(sorted(sentence.lower().split(" ")))
vocab[sorted_words] += 1
vocab
#=> # Counter({'a day good is today': 2, 'a b c': 2, 'a a b c': 1})
or even shorter:
from collections import Counter
sentences = ["Today is a good day", 'a b c', 'a a b c', 'c b a', "Is today a good day"]
def sorted_words(sentence):
return ' '.join(sorted(sentence.lower().split(" ")))
vocab = Counter(sorted_words(sentence) for sentence in sentences)
# Counter({'a day good is today': 2, 'a b c': 2, 'a a b c': 1})
This code should be much faster than what you've tried until now.
Yet another alternative
If you want to keep the original sentences in a list, you can use setdefault
:
sentences = ["Today is a good day", 'a b c', 'a a b c', 'c b a', "Is today a good day"]
def sorted_words(sentence):
return ' '.join(sorted(sentence.lower().split(" ")))
vocab = {}
for sentence in sentences:
vocab.setdefault(sorted_words(sentence), []).append(sentence)
vocab
#=> {'a day good is today': ['Today is a good day', 'Is today a good day'],
# 'a b c': ['a b c', 'c b a'],
# 'a a b c': ['a a b c']}
to to
matchto to to
? – Spinifexset
and test if the two sets are equal. – Spinifex