MPEG-4 Part 2 had some awesome face- and body- motion concepts, but they disappeared in MPEG-4 Part 10 (H.264). Why?

Asked 21/3, 2012 at 7:5 Answered 29/3, 2012 at 10:53

During the last few weeks, I had the opportunity to read two documents:

The MPEG-4 Part 2 specification (ISO/IEC 14496-2), which people just call "mpeg-4"
The MPEG-4 Part 10 specification (ISO/IEC 14496-10), which is also called "h.264" or "AVC"

After having read all the cool ideas in "mpeg-4" like identifying facial expression, motion of limbs of people, and sprites, I got really excited. The ideas sound very fun, maybe even fantastic, for an idea from 1999.

But then I read the "h.264" standard, and none of those ideas were there. There was a lot of discussion on how to encode pixels, but none of the really cool ideas.

What happened? Why were these ideas removed?

This is not a code question, but as a programmer I feel I should attempt to understand as much of the intent behind a specification. If the code I write adheres to the spirit in which the specification was meant to be used, it's more likely to be positioned to take advantage of the entire specification.

Simeon answered 21/3, 2012 at 7:5 Comment(0)

You seem to be making the assumption that the MPEG-4 Part 10 specification improves on MPEG-4 Part 2, while the fact is that these two specifications are unrelated, have nothing in common and were even developed by different people (MPEG developed the Part 2 specification, while ITU-T, ISO, IEC and MPEG together developed the Part 10 specification).

Keep in mind that ISO/IEC 14496 standard is a collection of specifications that apply to different aspects of audiovisual encoding. The goal of the Part 2 specification is to encode different kinds of visual objects (video, 3D objects, etc.). The goal of Part 10 is to provide a very efficient and high quality encoding for video. Other parts of the standard deal with other aspects, for example the Part 3 specification deals with audio encoding, and Parts 12 and 15 define a container file format that is most typically used to wrap Part 10 video (i.e. H.264) and Part 3 audio (i.e. AAC) into a single file, the so called .mp4 format.

I hope this helps!

Aspergillum answered 27/3, 2012 at 6:46 Comment(2)

I guess I kinda expected the Part 10 to replace everything about the Part 2, since they are mutually exclusive. That is to say, in a single MPEG-4 stream you can only choose to either use Part 10 or Part 2, but not both. Kinda makes me sad to see all those features go :-( – Simeon 27/3, 2012 at 10:29

Well, don't consider those features gone, the Part 2 standard has not been deprecated or superseded by Part 10, it is still a valid option for implementors of audivisual encoding products. In fact the video encoding aspects of Part 2 are widely used still. The Part 10 specification provides an ultra-efficient (and more complex) encoding format for video, but that shouldn't prevent you from implementing Part 2 if that gives you what you need. – Aspergillum 27/3, 2012 at 14:26

A little bit of history might help.

MPEG-4 was designed as a carrier/container specification for different types of media related data communication. To be compliant a device only had to recognize and ignore the content.

This was a reaction to the short life time of the MPEG-1 specs, which were obsolete before they were formalized.

The MPEG-4 can be divided into

mechanisms to transport image generating data

These included the obvious things like

compression
motion compensation and explicit sprites

The experimental such as

Transporting and reconstructing 3D and 3D + time data from an image stream (video) to provide compression and feature expansion.

Rate Adaption Mechanisms

In 1999 there was a huge range of relevant bit rates from 128K dial up to 1000 Mbit L/M/WANs and the spec had many special cases and efforts to provide interoperability.

This produced much committee work which became redundant as the network performance range narrowed to minimums/maximums of 1Mbit to 100Mbit.

Initially every spec under the sun and some still in the creators mind was attached to the MPEG-4 framework except for the competing specs such as H.264.

Some of the specs faded out of existence as money dried up in the dot.com collapse and H.264 and others merged into MPEG4.

One thing I learned from this was reading a spec without at least an example implementation while often interesting was rarely productive.

I guess "use the source Luke" could apply

"Specs taste bad without source".

Hanforrd answered 29/3, 2012 at 10:53 Comment(0)

mechanisms to transport image generating data

Rate Adaption Mechanisms

Recommended topics

Hot tags