If you use Numpy, the unique
function can tell you how many times each value appeared by passing return_counts=True
:
>>> data = ['apple', 'red', 'apple', 'red', 'red', 'pear']
>>> np.unique(data, return_counts=True)
(array(['apple', 'pear', 'red'], dtype='<U5'), array([2, 1, 3]))
The counts are in the same order as the distinct elements that were found; thus we can use the usual trick to create the desired dictionary (passing the two elements as separate arguments to zip
):
>>> dict(zip(*np.unique(data, return_counts=True)))
{'apple': 2, 'pear': 1, 'red': 3}
If you specifically have a large input Numpy array of small integers, you may get better performance from bincount
:
>>> data = np.random.randint(10, size=100)
>>> data
array([1, 0, 0, 3, 3, 4, 2, 4, 4, 0, 4, 8, 7, 4, 4, 8, 7, 0, 0, 2, 4, 2,
0, 9, 0, 2, 7, 0, 7, 7, 5, 6, 6, 8, 4, 2, 7, 6, 0, 3, 6, 3, 0, 4,
8, 8, 9, 5, 2, 2, 5, 1, 1, 1, 9, 9, 5, 0, 1, 1, 9, 5, 4, 9, 5, 2,
7, 3, 9, 0, 1, 4, 9, 1, 1, 5, 4, 7, 5, 0, 3, 5, 1, 9, 4, 8, 8, 9,
7, 7, 7, 5, 6, 3, 2, 4, 3, 9, 6, 0])
>>> np.bincount(data)
array([14, 10, 9, 8, 14, 10, 6, 11, 7, 11])
The n
th value in the output array indicates the number of times that n
appeared, so we can create the dictionary if desired using enumerate
:
>>> dict(enumerate(np.bincount(data)))
{0: 14, 1: 10, 2: 9, 3: 8, 4: 14, 5: 10, 6: 6, 7: 11, 8: 7, 9: 11}