Although Phil noted that these are embedded H.264 frames, I don't know how he deduced that, but I extracted the APP0 segment and tried to parse it as raw H.264, and it didn't decode.
$ exiv2 -pS Logitech-C270-003.jpg
STRUCTURE OF JPEG FILE: Logitech-C270-003.jpg
address | marker | length | data
0 | 0xffd8 SOI
2 | 0xffe0 APP0 | 33 | AVI1.....x.x..................
37 | 0xffdb DQT | 67
106 | 0xffdb DQT | 67
175 | 0xffdd DRI | 4
181 | 0xffe0 APP0 | 4 | .
187 | 0xffc0 SOF0 | 17
206 | 0xffda SOS
$ dd if=Logitech-C270-003.jpg bs=1 skip=6 count=31 of=Logitech-C270-003.h264
33+0 records in
33+0 records out
33 bytes copied, 0.000154041 s, 214 kB/s
$ ffplay -f h264 -i Logitech-C270-003.h264
[h264 @ 0x7f1794009d00] missing picture in access unit with size 31
[extract_extradata @ 0x7f1794021a40] No start code is found.
Logitech-C270-003.h264: could not find codec parameters
Another anomaly I noticed that, every MJPEG frame contains the APP0 segment of length 33 (same as his), which I find to be in odds with his assertion that the stream consists of key and delta frames.
$ exiv2 -pS Logitech-C270-001.jpg
STRUCTURE OF JPEG FILE: Logitech-C270-001.jpg
address | marker | length | data
0 | 0xffd8 SOI
2 | 0xffe0 APP0 | 33 | AVI1.....x.x..................
37 | 0xffdb DQT | 67
106 | 0xffdb DQT | 67
175 | 0xffdd DRI | 4
181 | 0xffe0 APP0 | 4 | .
187 | 0xffc0 SOF0 | 17
206 | 0xffda SOS
$ exiv2 -pS Logitech-C270-002.jpg
STRUCTURE OF JPEG FILE: Logitech-C270-002.jpg
address | marker | length | data
0 | 0xffd8 SOI
2 | 0xffe0 APP0 | 33 | AVI1.....x.x..................
37 | 0xffdb DQT | 67
106 | 0xffdb DQT | 67
175 | 0xffdd DRI | 4
181 | 0xffe0 APP0 | 4 | .
187 | 0xffc0 SOF0 | 17
206 | 0xffda SOS
$ exiv2 -pS Logitech-C270-003.jpg
STRUCTURE OF JPEG FILE: Logitech-C270-003.jpg
address | marker | length | data
0 | 0xffd8 SOI
2 | 0xffe0 APP0 | 33 | AVI1.....x.x..................
37 | 0xffdb DQT | 67
106 | 0xffdb DQT | 67
175 | 0xffdd DRI | 4
181 | 0xffe0 APP0 | 4 | .
187 | 0xffc0 SOF0 | 17
206 | 0xffda SOS
...
Because if we assume that, then every keyframe has a delta-frame associated to it, but it makes no sense. And, simultaneously encoding H.264 and MJPEG makes no sense either.
Further inspecting the APP0 segment, shows us that it's almost not a compliant JFIF file.
$ xxd Logitech-C270-003.h264
00000000: ffe0 0021 4156 4931 0001 0101 0078 0078 ...!AVI1.....x.x
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 00 ...
Let's break it down. FF E0
is APP0 marker. 00 21
is the size of the marker segment and size field itself (16 bit big-endian), 33
in decimal. The only significant anomaly is AVI1
(and a null byte) in place of JFIF (and a null byte), which likely to be FourCC. A quick inspection of FFmpeg source code confirms it, and also this too. I'm not sure why it is here. The best guess is to indicate that it is a MJPEG file. Next two bytes 01 01
indicate the version, major and minor respectively, which translate to JFIF version 1.02. The next byte 01
indicates DPI units. Next two bytes 00 78
(repeated again) indicate 120 DPI. Next two bytes 00 00
indicate, the thumbnail size, width and height respectively, indicate the abscence of a thumbnail. The rest are perhaps padding null bytes.
As per FFmpeg's MJPEG APPx decoding routine, the segment containing AVI1
pertains to some weird proprietary convention of storing some extradata, rather than a H.264 frame.
The APP0 marker is semi-compliant not at all compliant. For reference this is the APP0 marker of a compliant JFIF file encoded by FFmpeg.
$ ffmpeg -i Logitech-C270-003.jpg -bsf:v mjpeg2jpeg Logitech-C270-003-duplicate.jpg
...
[mjpeg @ 0x5d3e1cea4680] unable to decode APP fields: Invalid data found when processing input
Input #0, image2, from 'Logitech-C270-003.jpg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 9521 kb/s
Stream #0:0: Video: mjpeg (Baseline), yuvj422p(pc, bt470bg/unknown/unknown), 1280x720, 25 fps, 25 tbr, 25 tbn
Stream mapping:
Stream #0:0 -> #0:0 (mjpeg (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[mjpeg @ 0x5d3e1ceab600] unable to decode APP fields: Invalid data found when processing input
Output #0, image2, to 'Logitech-C270-003-duplicate.jpg':
Metadata:
encoder : Lavf60.16.100
Stream #0:0: Video: mjpeg, yuvj422p(pc, bt470bg/unknown/unknown, progressive), 1280x720, q=2-31, 200 kb/s, 25 fps, 25 tbn
Metadata:
encoder : Lavc60.31.102 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
[image2 @ 0x5d3e1ceab9c0] The specified filename 'Logitech-C270-003-duplicate.jpg' does not contain an image sequence pattern or a pattern is invalid.
[image2 @ 0x5d3e1ceab9c0] Use a pattern such as %03d for an image sequence or use the -update option (with -frames:v 1 if needed) to write a single image.
[out#0/image2 @ 0x5d3e1cea64c0] video:40kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
frame= 1 fps=0.0 q=5.8 Lsize=N/A time=00:00:00.00 bitrate=N/A speed= 0x
$ exiv2 -pS Logitech-C270-003-duplicate.jpg
STRUCTURE OF JPEG FILE: Logitech-C270-003-duplicate.jpg
address | marker | length | data
0 | 0xffd8 SOI
2 | 0xffe0 APP0 | 16 | JFIF.........
20 | 0xffc4 DHT | 418
440 | 0xfffe COM | 16 | Lavc60.31.102
458 | 0xffdb DQT | 67
527 | 0xffc4 DHT | 159
688 | 0xffc0 SOF0 | 17
707 | 0xffda SOS
$ xxd -s 2 -l 18 Logitech-C270-003-duplicate.jpg
00000002: ffe0 0010 4a46 4946 0001 0100 0000 0000 ....JFIF........
00000012: 0000 ..
Now interestingly, FFmpeg has no problem with the first APP0 segment, that appeared to be the cause of error. But, it is actually the second segment (ran with FFmpeg -loglevel debug
).
[AVFormatContext @ 0x5c53e38f3480] Opening 'Logitech-C270-003.jpg' for reading
[file @ 0x5c53e38f3b40] Setting default whitelist 'file,crypto,data'
[image2 @ 0x5c53e38f3480] Format image2 probed with size=2048 and score=50
[image2 @ 0x5c53e38f3480] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:1
[mjpeg @ 0x5c53e38f4680] marker=d8 avail_size_in_buf=47603
[mjpeg @ 0x5c53e38f4680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x5c53e38f4680] marker=e0 avail_size_in_buf=47601
[mjpeg @ 0x5c53e38f4680] polarity 0
[mjpeg @ 0x5c53e38f4680] marker parser used 32 bytes (256 bits)
[mjpeg @ 0x5c53e38f4680] marker=db avail_size_in_buf=47566
[mjpeg @ 0x5c53e38f4680] index=0
[mjpeg @ 0x5c53e38f4680] qscale[0]: 6
[mjpeg @ 0x5c53e38f4680] marker parser used 67 bytes (536 bits)
[mjpeg @ 0x5c53e38f4680] marker=db avail_size_in_buf=47497
[mjpeg @ 0x5c53e38f4680] index=1
[mjpeg @ 0x5c53e38f4680] qscale[1]: 13
[mjpeg @ 0x5c53e38f4680] marker parser used 67 bytes (536 bits)
[mjpeg @ 0x5c53e38f4680] marker=dd avail_size_in_buf=47428
[mjpeg @ 0x5c53e38f4680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x5c53e38f4680] marker=e0 avail_size_in_buf=47422
[mjpeg @ 0x5c53e38f4680] unable to decode APP fields: Invalid data found when processing input
[mjpeg @ 0x5c53e38f4680] marker parser used 2 bytes (16 bits)
[mjpeg @ 0x5c53e38f4680] marker=c0 avail_size_in_buf=47416
[mjpeg @ 0x5c53e38f4680] Changing bps from 0 to 8
[mjpeg @ 0x5c53e38f4680] sof0: picture: 1280x720
[mjpeg @ 0x5c53e38f4680] component 0 2:1 id: 1 quant:0
[mjpeg @ 0x5c53e38f4680] component 1 1:1 id: 2 quant:1
[mjpeg @ 0x5c53e38f4680] component 2 1:1 id: 3 quant:1
[mjpeg @ 0x5c53e38f4680] pix fmt id 21111100
[mjpeg @ 0x5c53e38f4680] Format yuvj422p chosen by get_format().
[mjpeg @ 0x5c53e38f4680] marker parser used 17 bytes (136 bits)
[mjpeg @ 0x5c53e38f4680] escaping removed 772 bytes
[mjpeg @ 0x5c53e38f4680] marker=da avail_size_in_buf=47397
[mjpeg @ 0x5c53e38f4680] marker parser used 46625 bytes (373000 bits)
[mjpeg @ 0x5c53e38f4680] marker=d3 avail_size_in_buf=754
[mjpeg @ 0x5c53e38f4680] restart marker: 3
[mjpeg @ 0x5c53e38f4680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x5c53e38f4680] marker=d4 avail_size_in_buf=714
[mjpeg @ 0x5c53e38f4680] restart marker: 4
[mjpeg @ 0x5c53e38f4680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x5c53e38f4680] marker=d5 avail_size_in_buf=673
[mjpeg @ 0x5c53e38f4680] restart marker: 5
[mjpeg @ 0x5c53e38f4680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x5c53e38f4680] marker=d6 avail_size_in_buf=630
[mjpeg @ 0x5c53e38f4680] restart marker: 6
[mjpeg @ 0x5c53e38f4680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x5c53e38f4680] marker=d9 avail_size_in_buf=577
[mjpeg @ 0x5c53e38f4680] decode frame unused 577 bytes
[image2 @ 0x5c53e38f3480] After avformat_find_stream_info() pos: 47605 bytes read:47605 seeks:0 frames:1
Notice these lines at the beginning.
[mjpeg @ 0x5c53e38f4680] marker=e0 avail_size_in_buf=47601
[mjpeg @ 0x5c53e38f4680] polarity 0
[mjpeg @ 0x5c53e38f4680] marker parser used 32 bytes (256 bits)
But, at the second marker (APP0 extension marker) it errors out.
[mjpeg @ 0x5c53e38f4680] marker=e0 avail_size_in_buf=47422
[mjpeg @ 0x5c53e38f4680] unable to decode APP fields: Invalid data found when processing input
[mjpeg @ 0x5c53e38f4680] marker parser used 2 bytes (16 bits)
It isn't compliant at all, because it doesn't immediately follow the first APP0, and the required fields are missing.
$ xxd -s 181 -l 6 Logitech-C270-003.jpg
000000b5: ffe0 0004 0000 ......
I have found that if the second APP0 is removed, using a Hex editor (it might be automated as well), the error is mitigated.
[AVFormatContext @ 0x603be737e480] Opening 'Logitech-C270-003.jpg' for reading
[file @ 0x603be737eb40] Setting default whitelist 'file,crypto,data'
[image2 @ 0x603be737e480] Format image2 probed with size=2048 and score=50
[image2 @ 0x603be737e480] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:1
[mjpeg @ 0x603be737f680] marker=d8 avail_size_in_buf=47597
[mjpeg @ 0x603be737f680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x603be737f680] marker=e0 avail_size_in_buf=47595
[mjpeg @ 0x603be737f680] polarity 0
[mjpeg @ 0x603be737f680] marker parser used 32 bytes (256 bits)
[mjpeg @ 0x603be737f680] marker=db avail_size_in_buf=47560
[mjpeg @ 0x603be737f680] index=0
[mjpeg @ 0x603be737f680] qscale[0]: 6
[mjpeg @ 0x603be737f680] marker parser used 67 bytes (536 bits)
[mjpeg @ 0x603be737f680] marker=db avail_size_in_buf=47491
[mjpeg @ 0x603be737f680] index=1
[mjpeg @ 0x603be737f680] qscale[1]: 13
[mjpeg @ 0x603be737f680] marker parser used 67 bytes (536 bits)
[mjpeg @ 0x603be737f680] marker=dd avail_size_in_buf=47422
[mjpeg @ 0x603be737f680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x603be737f680] marker=c0 avail_size_in_buf=47416
[mjpeg @ 0x603be737f680] Changing bps from 0 to 8
[mjpeg @ 0x603be737f680] sof0: picture: 1280x720
[mjpeg @ 0x603be737f680] component 0 2:1 id: 1 quant:0
[mjpeg @ 0x603be737f680] component 1 1:1 id: 2 quant:1
[mjpeg @ 0x603be737f680] component 2 1:1 id: 3 quant:1
[mjpeg @ 0x603be737f680] pix fmt id 21111100
[mjpeg @ 0x603be737f680] Format yuvj422p chosen by get_format().
[mjpeg @ 0x603be737f680] marker parser used 17 bytes (136 bits)
[mjpeg @ 0x603be737f680] escaping removed 772 bytes
[mjpeg @ 0x603be737f680] marker=da avail_size_in_buf=47397
[mjpeg @ 0x603be737f680] marker parser used 46625 bytes (373000 bits)
[mjpeg @ 0x603be737f680] marker=d3 avail_size_in_buf=754
[mjpeg @ 0x603be737f680] restart marker: 3
[mjpeg @ 0x603be737f680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x603be737f680] marker=d4 avail_size_in_buf=714
[mjpeg @ 0x603be737f680] restart marker: 4
[mjpeg @ 0x603be737f680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x603be737f680] marker=d5 avail_size_in_buf=673
[mjpeg @ 0x603be737f680] restart marker: 5
[mjpeg @ 0x603be737f680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x603be737f680] marker=d6 avail_size_in_buf=630
[mjpeg @ 0x603be737f680] restart marker: 6
[mjpeg @ 0x603be737f680] marker parser used 0 bytes (0 bits)
[mjpeg @ 0x603be737f680] marker=d9 avail_size_in_buf=577
[mjpeg @ 0x603be737f680] decode frame unused 577 bytes
[image2 @ 0x603be737e480] After avformat_find_stream_info() pos: 47599 bytes read:47599 seeks:0 frames:1
Removing all APP0 segments works too. For reference, FFmpeg's MJPEG encoder doesn't produce any APP0 segments.
$ ffmpeg -i Logitech-C270-003.jpg Logitech-C270-003-duplicate.jpg
...
Input #0, image2, from 'Logitech-C270-003.jpg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 9512 kb/s
Stream #0:0: Video: mjpeg (Baseline), yuvj422p(pc, bt470bg/unknown/unknown), 1280x720, 25 fps, 25 tbr, 25 tbn
File 'Logitech-C270-003-duplicate.jpg' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (mjpeg (native) -> mjpeg (native))
Press [q] to stop, [?] for help
Output #0, image2, to 'Logitech-C270-003-duplicate.jpg':
Metadata:
encoder : Lavf60.16.100
Stream #0:0: Video: mjpeg, yuvj422p(pc, bt470bg/unknown/unknown, progressive), 1280x720, q=2-31, 200 kb/s, 25 fps, 25 tbn
Metadata:
encoder : Lavc60.31.102 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
[image2 @ 0x5d6aa3c189c0] The specified filename 'Logitech-C270-003-duplicate.jpg' does not contain an image sequence pattern or a pattern is invalid.
[image2 @ 0x5d6aa3c189c0] Use a pattern such as %03d for an image sequence or use the -update option (with -frames:v 1 if needed) to write a single image.
[out#0/image2 @ 0x5d6aa3c134c0] video:40kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
frame= 1 fps=0.0 q=5.8 Lsize=N/A time=00:00:00.00 bitrate=N/A speed= 0x
$ exiv2 -pS Logitech-C270-003-duplicate.jpg
STRUCTURE OF JPEG FILE: Logitech-C270-003-duplicate.jpg
address | marker | length | data
0 | 0xffd8 SOI
2 | 0xfffe COM | 16 | Lavc60.31.102
20 | 0xffdb DQT | 67
89 | 0xffc4 DHT | 159
250 | 0xffc0 SOF0 | 17
269 | 0xffda SOS