I am currently reading up on SSD Single Shot Detector and there is a term that I am struggling to understand. The term is "head". When I hear this word, I think of the head of the network, as in the beginning.
I looked at the object detection API created by google and I found the "heads" folder with different head types, one for the box encoding and another for the class predictions.
The documentation for the abstract "head" class was not super enlightening:
All the different kinds of prediction heads in different models will inherit from this class. What is in common between all head classes is that they have a
predict
function that receivesfeatures
as its first argument.
I guess I understand them on a high level, but I don't have a concrete definition of them. Can someone define a "head" and explain how one can have a "box prediction head" or a "classification head"?