From what I've understood, tracking algorithms predict where a given object will be in the next frame (after object detection is already performed). The object is then again recognized in the next frame. What isn't clear is how the tracker then knows to associate the object in the 2nd frame as the same as the one in the 1st, especially when there are multiple objects in the frame.
I've seen in a few places that a cost matrix is created using Euclidean distance between the prediction and all detections, and the problem is framed as an assignment problem (Hungarian algorithm).
Is my understanding of tracking correct? Are there other ways of establishing that an object in one frame is the same as an object in the next frame?