Transcode HLS Segments individually using FFMPEG
Asked Answered
B

1

8

I am recording a continuous, live stream to a high-bitrate HLS stream. I then want to asynchronously transcode this to different formats/bitrates. I have this working, mostly, except audio artefacts are appearing between each segment (gaps and pops).

Here is an example ffmpeg command line:

ffmpeg -threads 1 -nostdin -loglevel verbose \
   -nostdin -y -i input.ts -c:a libfdk_aac \
   -ac 2 -b:a 64k -y -metadata -vn output.ts

Inspecting an example sound file shows that there is a gap at the end of the audio:

End

And the start of the file looks suspiciously attenuated (although this may not be an issue):

Start

My suspicion is that these artefacts are happening because transcoding are occurring without the context of the stream as a whole.

Any ideas on how to convince FFMPEG to produce audio that will fit back into a HLS stream?

** UPDATE 1 **

Here are the start/end of the original segment. As you can see, the start still appears the same, but the end is cleanly ended at 30s. I expect some degree of padding with lossy encoding, but I there is some way that HLS manages to do gapless playback (is this related to iTunes method with custom metadata?)

Original Start Original End

** UPDATED 2 **

So, I converted both the original (128k aac in MPEG2 TS) and the transcoded (64k aac in aac/adts container) to WAV and put the two side-by-side. This is the result:

Side-by-side start Side-by-side end

I'm not sure if this is representative of how a client will play it back, but it seems a bit odd that decoding the transcoded one introduces a gap at the start and makes the segment longer. Given they are both lossy encoding, I would have expected padding to be equally present in both (if at all).

** UPDATE 3 **

According to http://en.wikipedia.org/wiki/Gapless_playback - Only a handful of encoders support gapless - for MP3, I've switched to lame in ffmpeg, and the problem, so far, appears to have gone.

For AAC (see http://en.wikipedia.org/wiki/FAAC), I have tried libfaac (as opposed to libfdk_aac) and it also seems to produce gapless audio. However, the quality of the latter isn't that great and I'd rather use libfdk_aac is possible.

Badr answered 13/5, 2013 at 11:42 Comment(2)
And how does the waveform compare with the input file?Crayton
Updated with original and compared waveformsBadr
J
0

This is more of a conceptual answer rather than containing explicit tools to use, sorry, but it may be of some use in any case - it removes the problem of introducing audio artifacts at the expense of introducing more complexity in your processing layer.

My suggestion would be to not split your uncompressed input audio at all, but only produce a contiguous compressed stream that you pipe into an audio proxy such as an icecast2 server (or similar, if icecast doesn't support AAC) and then do the split/recombine on the client-side of the proxy using chunks of compressed audio.

So, the method here would be to regularly (say, every 60sec?) connect to the proxy and collect a chunk of audio a little bit bigger than the period that you are polling (say, 75sec worth?) - this needs to be set up to run in parallel, since at some points there will be two clients running - it could even be run from cron if need be or backgrounded from a shell script ...

Once that's working, you will have a series of chunks of audio that overlap a little - you'd then need to do some processing work to compare these and isolate the section of audio in the middle which is unique to each chunk ...

Obviously this is a simplification, but assuming that the proxy does not add any metadata info (ie, ICY data or hinting) then splitting up the audio this way should allow the processed chunks to be concatenated without any audio artifacts since there is only one set of output for the original audio input and comparing them will be a doddle since you actually don't care one whit about the format, it's just bytes at that point.

The benefit here is that you've disconnected the audio encoder from the client, so if you want to run some other process in parallel to transcode to different formats or bit rates or chunk the stream more aggressively for some other consumer then that doesn't change anything on the encoder side of the proxy - you just add another client to the proxy using a tool chain similar to the above.

Jughead answered 27/5, 2013 at 5:32 Comment(3)
I like the idea of having a simple proxy that would buffer the audio data from device.. this would allow restarting of encoding without loosing data... especially if it understood samples and could chunks the data on sample boundaries.Badr
However, without solving the original issue, transcoding in 60s chunks will just introduce those issues at the boundary of the chunks - the artefacts appear to be the result of aac encoding, so they would likely affect any whizzily merged audio files as well.Badr
probably ancient history by now, sorry, but this was why I suggested cutting the compressed audio at the frame boundary (which, admittedly, may not be exactly divisible where you want it but won't be far off) ... now, if you take two disparate chunks of compressed audio and run them together you will still get artifacts, but not if they were originally contiguousJughead

© 2022 - 2024 — McMap. All rights reserved.