Okay, so, say I have a text file (not necessarily containing every possible symbol) and I'd like to calculate the frequency of each symbol and, after calculating the frequency, I then need to access each symbol and its frequency from most frequent to least frequent. The symbols are not necessarily ASCII characters, they could be arbitrary byte sequences, albeit all of the same length.
I was considering doing something like this (in pseudocode):
function add_to_heap (symbol)
freq = heap.find(symbol).frequency
if (freq.exists? == true)
freq++
else
symbol.freq = 1
heap.insert(symbol)
MaxBinaryHeap heap
while somefile != EOF
symbol = read_byte(somefile)
heap.add_to_heap(symbol)
heap.sort_by_frequency()
while heap.root != empty
root = heap.extract_root()
do_stuff(root)
I was wondering: is there a better, simpler way to calculate and store how many times each symbol occurs in a file?