Can anyone tell me where metadata is stored in common video file formats? And if it would be located towards the start of the file, or scattered throughout.
I'm working with a remote object store containing a lot of video files and I want to extract metadata, in particular video duration and video dimensions from those files, without streaming the entire file contents to the local machine.
I'm hoping that this metadata will be stored in the first X bytes of files, and so I can just fetch a byte range starting at the beginning instead of the whole file, passing this partial file data to ffprobe
.
For testing purposes I created a 22MB MP4 file, and used the following command to supply only the first 1MB of data to ffprobe:
head -c1024K '2013-07-04 12.20.07.mp4' | ffprobe -
It prints:
avprobe version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright (c) 2007-2013 the Libav developers
built on Apr 2 2013 17:02:36 with gcc 4.6.3
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x1a6b7a0] stream 0, offset 0x10beab: partial file
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pipe:':
Metadata:
major_brand : isom
minor_version : 0
compatible_brands: isom3gp4
creation_time : 1947-07-04 11:20:07
Duration: 00:00:09.84, start: 0.000000, bitrate: N/A
Stream #0.0(eng): Video: h264 (High), yuv420p, 1920x1080, 20028 kb/s, PAR 65536:65536 DAR 16:9, 29.99 fps, 30 tbr, 90k tbn, 180k tbc
Metadata:
creation_time : 1947-07-04 11:20:07
Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 189 kb/s
Metadata:
creation_time : 1947-07-04 11:20:07
So I see the first 1MB was enough to extract video duration 9.84 seconds and video dimensions 1920x1080, even though ffprobe printed the warning about detecting a partial file. If I supply less than 1MB, it fails completely.
Would this approach work for other common video file formats to reliably extract metadata, or do any common formats scatter metadata throughout the file?
I'm aware of the concept of container formats and that various codecs may be used represent the audio/video data inside those containers. I'm not familiar with the details though. So I guess the question may apply to common combinations of containers + codecs? Thanks in advance.