There are two places in video codecs where these filters are typically used:
Motion estimation/compensation
Video codecs compress much better than still image codecs because they also remove the redundancy between frames. They do this using motion estimation and motion compensation. The encoder splits up the image into rectangular blocks of image data (typically 16x16) and then tries to find the block in a previously coded frame that is as similar as possible to the block that's currently being coded. The encoder then only transmits the difference, and a pointer to where it found this good match. This is the main reason why video codecs get about 1:100 compression, where image codecs get 1:10 compression. Now, you can imagine that sometimes camera or object in the scene didn't move by a full pixel, but actually by half or a quarter pixel. There's then a better match found if the image is scaled/interpolated, and these filters are used to do that. The exact way they do this filtering often differs per codec.
Deblocking
Another reason for using such a filter is to remove artifacts from the transform that's being used. Just like in still-image coding, there's a transform that transforms the image data into a different space that "compacts the energy". For instance, after this transform, those image sections that have a uniform color, like a blue sky, will result into data that has just a single number for the color, and then all zeros for the rest of the data. Comparing this to the original data, which stores blue for all of the pixels, a lot of redundancy has been removed. After the transform (Google for DCT, KLT, integer transform), the zeros are typically thrown away, and the other not so relevant data that's left is coded with fewer bits than in the original. During image decoding, since data has been thrown away, this often results in edges between the 8x8 or 16x16 of neighboring blocks. There's a separate smoothing filter that then smoothens these edges again.