I am trying to show H.264 encoded rtsp video on an Android device. The stream is coming from a Raspberry Pi, using vlc to encode /dev/video1
which is a "Pi NoIR Camera Board".
vlc-wrapper -vvv v4l2:///dev/video1 --v4l2-width $WIDTH --v4l2-height $HEIGHT --v4l2-fps ${FPS}.0 --v4l2-chroma h264 --no-audio --no-osd --sout "#rtp{sdp=rtsp://:8000/pi.sdp}" :demux=h264 > /tmp/vlc-wrapper.log 2>&1
I am using very minimal Android code right now:
final MediaPlayer mediaPlayer = new MediaPlayer();
mediaPlayer.setDisplay(holder);
try {
mediaPlayer.setDataSource(url);
mediaPlayer.prepare();
and getting a "Prepare failed.: status=0x1" IOException
. When I look at the logs, I see lines like
06-02 16:28:05.566 W/APacketSource( 316): Format:video 0 RTP/AVP 96 / MIME-Type:H264/90000
06-02 16:28:05.566 W/MyHandler( 316): Unsupported format. Ignoring track #1.
06-02 16:28:05.566 I/MyHandler( 316): SETUP(1) completed with result -1010 (Unknown error 1010)
coming from a system process. Grepping for these messages points to the libstagefright/rtsp
sources, and seems to mean that the ASessionDescription::getDimensions
call in the APacketSource::APacketSource
constructor is failing. This doesn't seem like it should be happening, because VLC certainly knows what dimensions to output:
[0x1c993a8] v4l2 demux debug: trying specified size 800x600
[0x1c993a8] v4l2 demux debug: Driver requires at most 262144 bytes to store a complete image
[0x1c993a8] v4l2 demux debug: Interlacing setting: progressive
[0x1c993a8] v4l2 demux debug: added new video es h264 800x600
What seems to be happening is that ASessionDescription::getDimensions
is looking for a framesize
attribute in the (seemingly well-formed) DESCRIBE
results
06-02 16:28:05.566 I/MyHandler( 316): DESCRIBE completed with result 0 (Success)
06-02 16:28:05.566 I/ASessionDescription( 316): v=0
06-02 16:28:05.566 I/ASessionDescription( 316): o=- 15508012299902503225 15508012299902503225 IN IP4 pimple
06-02 16:28:05.566 I/ASessionDescription( 316): s=Unnamed
06-02 16:28:05.566 I/ASessionDescription( 316): i=N/A
06-02 16:28:05.566 I/ASessionDescription( 316): c=IN IP4 0.0.0.0
06-02 16:28:05.566 I/ASessionDescription( 316): t=0 0
06-02 16:28:05.566 I/ASessionDescription( 316): a=tool:vlc 2.0.3
06-02 16:28:05.566 I/ASessionDescription( 316): a=recvonly
06-02 16:28:05.566 I/ASessionDescription( 316): a=type:broadcast
06-02 16:28:05.566 I/ASessionDescription( 316): a=charset:UTF-8
06-02 16:28:05.566 I/ASessionDescription( 316): a=control:rtsp://192.168.1.35:8000/pi.sdp
06-02 16:28:05.566 I/ASessionDescription( 316): m=video 0 RTP/AVP 96
06-02 16:28:05.566 I/ASessionDescription( 316): b=RR:0
06-02 16:28:05.566 I/ASessionDescription( 316): a=rtpmap:96 H264/90000
This looks like it may be a Stagefright bug: It knows (or should know) that it has a H.264 encoded stream, yet it seems to be expecting a H.263 framesize
attribute. Hence my questions:
- Am I reading the sources right? Is the problem in the
ASessionDescription::getDimensions
call? (Does stagefright only actually support H.263 streaming?) - Or is the Pi-side code wrong in some way?
- Or am I just missing a key step or two in my client-side code?
Update, 20140606:
The MediaPlayer
docs say that -1010 is MEDIA_ERROR_UNSUPPORTED: "Bitstream is conforming to the related coding standard or file spec, but the media framework does not support the feature." This makes me wonder if the problem is the 'standard' progressive download issue. That is, Supported Media Formats says
For video content that is streamed over HTTP or RTSP [in a] MPEG-4 [container] the
moov
atom must precede anymdat
atoms, but must succeed theftyp
atom
while most streams put the moov
atom last.
I am not at all sure how to verify this, though!
- I see no
moov
orftyp
atoms in the vlc source. (I am told that vlc is just streaming, here; that the actual H264 content is coming out of the camera driver.) - I see no
moov
orftyp
atoms in the https://github.com/raspberrypi linux or userland branches. (Maybe I'm just grepping for the wrong things, though.) - When I have vlc save the stream, I get an mp4 file with
moov
beforemdat
, but of course vlc could be doing some transcoding, here.
Update, 20140610:
The GPAC "Osmo4" player can display the stream on an Android 4.3 tablet. Badly (more lag than VLC on a laptop, and prone to lockups) but it can display it.
Update, 20140616:
When I tried grepping the VLC sources again (case-insensitive and without word-orientation, this time) I did find the FOURCC macros defining the moov
and ftyp
atoms in modules/mux/mp4.c
, which quickly led to the --sout-mp4-faststart
(and --no-sout-mp4-faststart
) switches ... which don't make any difference.
So, it looks like it may actually not be an atom-ordering issue. That's good to know, if it closes off a whole class of dead-ends, but it does leave me banging my head against the wall (which always seem to do more damage to my head than to the wall) without a clue.
Update, 20140702:
I compiled VLC for Android, and it can display the stream generated by VLC on the pi. It puts the image in the top-left of the screen; I tried writing my own skin for their .so, and couldn't find any 'knobs' that would let me zoom-to-surface or whatever. (Plus the .apk came to about 12M!)
So, I found the relevant RFCs and wrote my own RTSP client. Or tried to: I can parse the SDP and generate enough valid RTSP to get RTP and RTCP datagrams, and I can parse the RTP and RTCP headers. But even though the SDP claims to deliver m=video 0 RTP/AVP 96 and a=rtpmap:96 H264/90000, the MediaCodec
won't display video on my surface, no matter which of the three H264 codecs on my tablet I pass to MediaCodec.createByCodecName(), and when I look at the RTP payloads, I'm not too surprised: I don't see the NAL sync pattern anywhere in any of the packets.
Instead, they all start with either 21 9A __ 22 FF
(usually) or occasionally 3C 81 9A __ 22 FF
, where the __ seems to always be an even number that goes up by 2 each packet. I don't recognize this pattern - do you?
Update, 20140711:
Turns out that H264 packets don't have to start with the NAL sync pattern - that's only necessary where NAL Units may be embedded in a larger data stream. My RTP packets are in RFC 6184 format.