There are easier ways of matching a template scale and rotationally invariant than going via feature detection and homographies (if you know its really only rotated and scales, but everything else is constant).
For true object detection the above suggested keypoint based approaches work better.
If you know it's the same template and there is no perspective change involved, you take an image pyramid for scale-space detection, and match your templates on the different levels of that pyramid (via something simple, for example SSD or NCC). It will be cheap to find rough matches on higher (= lower resolution) levels of the pyramid. In fact, it will be so cheap, that you can also rotate your template roughly on the low resolution levels, and when you trace the template back down to the higher resolution levels, you use a more finely grained rotation stepping. That's a pretty standard template matching technique and works well in practice.