Sync Audio/Video in MP4 using AutoGen FFmpeg library
Asked Answered
T

2

1

I'm currently having problems making my audio and video streams stay synced.

These are the AVCodecContexts I'm using:

For Video:

AVCodec* videoCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_H264)
AVCodecContext* videoCodecContext = ffmpeg.avcodec_alloc_context3(videoCodec);
videoCodecContext->bit_rate = 400000;
videoCodecContext->width = 1280;
videoCodecContext->height = 720;
videoCodecContext->gop_size = 12;
videoCodecContext->max_b_frames = 1;
videoCodecContext->pix_fmt = videoCodec->pix_fmts[0];
videoCodecContext->codec_id = videoCodec->id;
videoCodecContext->codec_type = videoCodec->type;
videoCodecContext->time_base = new AVRational
{
    num = 1,
    den = 30
};

For Audio:

AVCodec* audioCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_AAC)
AVCodecContext* audioCodecContext = ffmpeg.avcodec_alloc_context3(audioCodec);
audioCodecContext->bit_rate = 1280000;
audioCodecContext->sample_rate = 48000;
audioCodecContext->channels = 2;
audioCodecContext->channel_layout = ffmpeg.AV_CH_LAYOUT_STEREO;
audioCodecContext->frame_size = 1024;
audioCodecContext->sample_fmt = audioCodec->sample_fmts[0];
audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
audioCodecContext->codec_id = audioCodec->id;
audioCodecContext->codec_type = audioCodec->type;

When writing the video frames, I setup the PTS position as follows:

outputFrame->pts = frameIndex;  // The current index of the image frame being written

I then encode the frame using avcodec_encode_video2(). After this, I call the following to setup the time stamps:

ffmpeg.av_packet_rescale_ts(&packet, videoCodecContext->time_base, videoStream->time_base);

This plays perfectly.

However, when I do the same for audio, the video plays in slow motion, plays the audio first and then carry's on with the video afterwards with no sound.

I cannot find an example anywhere of how to set pts/dts positions for video/audio in an MP4 file. Any examples of help would be great!

Also, I'm writing the video frames first, after which (once they are all written) I write the audio. I've updated this question with the adjusted values suggested in the comments.

I've uploaded a test video to show my results here: http://www.filedropper.com/test_124

Tower answered 5/7, 2016 at 7:54 Comment(3)
wrong tag, it must be c++Aristate
I'm using the AutoGen library, which using Invoke in c# to access the libraries!Tower
I don't use the FFmpeg API, only the compiled .exe as a process (std in/out). Unfortunately I can't test your code but... Let's hope the advice in my answer can be useful to you in some way.Wreckfish
T
1

Solved the problem. I've added a new function to set video/audio positions after setting the frames PTS positions.

Video is just the usual increment (+1 for each frame), whereas audio is done as follows:

outputFrame->pts = ffmpeg.av_rescale_q(m_audioFrameSampleIncrement, new AVRational { num = 1, den = 48000 }, m_audioCodecContext->time_base);

m_audioFrameSampleIncrement += outputFrame->nb_samples;

After the frame is encoded, I call my new function:

private static void SetPacketProperties(ref AVPacket packet, AVCodecContext* codecContext, AVStream* stream)
{
    packet.pts = ffmpeg.av_rescale_q_rnd(packet.pts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
    packet.dts = ffmpeg.av_rescale_q_rnd(packet.dts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
    packet.duration = (int)ffmpeg.av_rescale_q(packet.duration, codecContext->time_base, stream->time_base);
    packet.stream_index = stream->index;
}
Tower answered 12/7, 2016 at 10:31 Comment(0)
W
1

PS: Check out this article/tutorial on A/V Sync with FFmpeg. It might help you if the below doesn't.

1) Regarding the video & audio timestamps...

Rather than use a current frameIndex as the timestamp, and then later rescaling them. If possible just skip the rescale.

The alternative would then be to make sure PTS values (in outputFrame->pts) are created correctly in the first place by using the video's frames-per-second (FPS). To do this...

For each Video frame : outputFrame->pts = (1000 / FPS) * frameIndex;
(For a 30 FPS video, frame 1 has 0 time and by frame 30 the "clock" has reached 1 second.
So 1000 / 30 now gives each video frame a presentation interval of 33.333 msecs. When frameIndex is 30 we can say 33.333 x 30 = 1000 m.secs (or 1 second, confirming 30 frames for each second).

For each Audio frame : outputFrame->pts = ((1024 / 48000) * 1000) * frameIndex;
(since 48khz AAC frame has a duration of 21.333 m.secs, the timestamp increases by that amount of time. The formula is : (1024 PCM / SampleRate) x 1000 ms/perSec) then multiply by frame index).

2) Regarding the audio settings...

Bit-rate :
audioCodecContext->bit_rate = 64000; seems odd if your sample_rate is 48000Hz (and I assume, your bit-depth is 16-bits per sample?).

Try either 96000 or 128000 as lowest starting values.

Frame Size :

int AVCodecContext::frame_size means "Number of samples per channel in an audio frame".

Considering the above quote of the Docs, and that MPEG AAC does not do "per channel" (since data for both L/R channels is contained within each frame). The AAC frames each hold 1024 PCM samples.

audioCodecContext->frame_size = 88200; for size, you could try = 1024;

Profile :
I noticed you've used MAIN for AAC profile. I'm used to seeing Low Complexity in videos. I tried a few random MP4 filess from various sources on my HDD and I cannot find one using "Main" profile. As a last resort, testing "Low Complexity" won't hurt.

Try using audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;

PS: Check this for a possible AAC issue (depending on your FFmpeg version).

Wreckfish answered 6/7, 2016 at 18:1 Comment(5)
Oops, had forgotten to make my point about a/v timestamps. Hope it helps.Wreckfish
This is a very useful answer. Need to try it and go from there.Tower
outputFrame->pts = (1000 / FPS) * frameIndex (for video frames) causes a 19 second video to play fast play in 1 second.Tower
Can you provide temp link to a sample video file? I'll try check the bytes (to find a fixing value). Try to use 44khz and 128 bitrate. FFmpeg won't make a file when I use your settings, but it auto-defaults to a working fine video using 44100 samplerate + 16-bit depth + 128kbps bit-rate.Wreckfish
Sorry for the late reply. I've uploaded the video here: filedropper.com/test_124Tower
T
1

Solved the problem. I've added a new function to set video/audio positions after setting the frames PTS positions.

Video is just the usual increment (+1 for each frame), whereas audio is done as follows:

outputFrame->pts = ffmpeg.av_rescale_q(m_audioFrameSampleIncrement, new AVRational { num = 1, den = 48000 }, m_audioCodecContext->time_base);

m_audioFrameSampleIncrement += outputFrame->nb_samples;

After the frame is encoded, I call my new function:

private static void SetPacketProperties(ref AVPacket packet, AVCodecContext* codecContext, AVStream* stream)
{
    packet.pts = ffmpeg.av_rescale_q_rnd(packet.pts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
    packet.dts = ffmpeg.av_rescale_q_rnd(packet.dts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
    packet.duration = (int)ffmpeg.av_rescale_q(packet.duration, codecContext->time_base, stream->time_base);
    packet.stream_index = stream->index;
}
Tower answered 12/7, 2016 at 10:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.