How to determine if an audio track is a Dolby Pro Logic II mixdown
Asked Answered
P

2

7

I'm trying to find out if there's a way to determine if an AAC-encoded audio track is encoded with Dolby Pro Logic II data. Is there a way of examining the file such that you can see this information? I have for example encoded a media file in Handbrake with (truncated to audio options) -E av_aac -B 320 --mixdown dpl2 and this is the audio track output that mediainfo shows:

Audio #1
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 2h 5mn
Bit rate mode                            : Variable
Bit rate                                 : 321 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 288 MiB (3%)
Title                                    : Stereo / Stereo
Language                                 : English
Encoded date                             : UTC 2017-04-11 22:21:41
Tagged date                              : UTC 2017-04-11 22:21:41

but I can't tell if there's anything in this output that would suggest that it's encoded with DPL2 data.

Pneumatics answered 13/4, 2017 at 9:32 Comment(1)
"I have for example encoded a media file in Handbrake" Got a small sample file (seconds not 2 hrs long)? If it's mentioned in the bytes of output file then we can try advising how to retrieve such infoUnsavory
N
8

tl:dr; it's probably possible; it may be easier if you're a programmer.

Because the information encoded is just a stereo analog pair, there is no guaranteed way of detecting a Dolby Pro Logic II (DPL2) signal therein, unless you specifically store your own metadata saying "this is a DPL2 file." But you can probably make a pretty good guess.

All of the old analog Dolby Surround formats, including DPL2, store surround information in two channels by inverting the phase of the surround or surrounds and then mixing them into the original left and right channels. Dolby Surround type decoders, including DPL2, attempt to recover this information by inverting the phase of one of the two channels and then looking for similarities in these signal pairs. This is either done trivially, as in Dolby Surround, or else these similarities are artificially biased to be pushed much further to the left or right, or the left or right surround, as in DPL2.

So the trick is to detect whether important data is being stored in the surround channel(s). I'll sketch out for you a method that might work, and I'll try to express it without writing code, but it's up to you to implement and refine it to your liking.

  1. Crop the first N seconds or so of program content into a stereo file, where N is between one and thirty. Call this file Input.
  2. Mix down the Input stereo channels to a new mono file at -3dB per channel. Call this file Center.
  3. Split the left and right channels of Input into separate files. Call these Left and Right.
  4. Invert the right channel. Call this file RightInvert.
  5. Mix down the Left and RightInvert channels to a new mono file at -3dB per channel. Call this file Surround.
  6. Determine the RMS and peak dB of the Surround file.
  7. If the RMS or peak DB of the Surround file are below "a tolerance", stop; the original file is either mono or center-panned and hence contains no surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear. I'm guessing around -30 dB or so.
  8. Invert the Center file into a new file. Call this file CenterInvert.
  9. Mix the CenterInvert file into the Surround file at 0 dB (both CenterInvert and Surround should be mono). Call this new file SurroundInvert.
  10. Determine the RMS and peak dB of the SurroundInvert file.
  11. If either the RMS and/or peak dB of SurroundInvert are below "a tolerance," stop; your original source contains panned left or right front information, not surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear -- I'm guessing around -35 dB or so.
  12. If you've gotten this far, your original Input probably contains surround information, and hence is probably a member of the Dolby Surround family of encodings.

I've written this algorithm out such that you can do each of these steps with a specific command in sox. If you want to be fancier, instead of doing the RMS/peak value step in sox, you could run an ebur128 program and check your levels in LUFS against a tolerance. If you want to be even fancier, after you create the Surround and Center files, you could filter out all frequencies higher than 7kHz and do de-emphasis on them, just like a real DPL2 decoder would.

To keep this algorithm simple, I've sketched it out entirely in the amplitude domain. The calculation of the SurroundLevel file would probably be a lot more accurately done in the frequency domain, if you know how to calculate the magnitude and angle of FFT bins and you use windows of 30 to 100 ms. But this cheapo version above should get you started.

One last caution. AAC is a modern psychoacoustic codec, which means that it likes to play games with stereo phasing and imaging to achieve its compression. So I consider it likely that the mere act of encapsulating DPL2 into an AAC stream will likely hose some of the imaging present in DPL2. To be candid, neither DPL2 nor AAC belongs anywhere in this pipeline. If you must store an analog stream originally encoded with DPL2, do it in a lossless format like WAV or FLAC, not AAC.

As of this writing, operational concepts behind Dolby Pro Logic (I) are here. These basic concepts still apply to DPL2; operational concepts for DPL2 are here.

Nematic answered 30/4, 2017 at 21:5 Comment(9)
+1 from me, although this method would yield much false positives. You can get phase inversion from recording with badly placed microphones for instance. Happens all the time.Denman
Dalen's correct that crappy recordings can have phase inversion, but there's no cure for bad music; nor does phase inversion "happen all the time."Nematic
No? What happens when echo effect with stereo broadening bleeds out. Ow, even simple echo can sometimes seem like inverted signal if it's a long note. Whats with synthesized stuff that actually uses phase inversion? What when simple MS is done? What with phase vocoders being used for time stretching or pitch change, or in fact, when any filter using big overlap is used. There is always a possibility of some inversion at window edges. You cannot avoid it all the time just for you to be able to say that that wasn't originally on right front so it should go to right back speaker instead.Denman
Also just different instruments playing same note, one on left channel, one on right, may show up in your algo although there is no actual inversion, because harmonics need not to produce near 0 result when substracted. And it doesn't necessarily have to sound crappy when happens while recording. Usually does though.Denman
You will have to take care about relax time after examined window as well to eliminate idiotic occurrences. Something like when introducing noise gate. Probably use windows with variable size too.Denman
Thanks for this, great information presented in such a concise way, I should be able to work through this with sox and see what I can come up with. So do decoders and audio players go through this kind of analysis themselves in order to determine whether to playback to surround speakers as such? Or do they have a chip or something licensed from Dolby that can determine it in a different way?Pneumatics
They do something very similar and they also guess based on frequency distribution. There are chips for that, and I do not know whether they have to be licensed from DOLBY as this is actually not DOLBY.Denman
The Dolby Surround tech, and variations thereof, are sufficiently old that they may be (though IANAL) public domain. Additionally, they were sufficiently simple that they could be implemented cheaply in hardware by consumer hardware manufacturers of the 1980s. Anyway, back in the day, people knew that Dolby Surround was patented, so they danced around it by saying their encoders and decoders were "surround compatible." I don't know whether there were any decoders that tried to automatically detect a surround signal. The ones I remember waited for you to push a Dolby button first.Nematic
Is there an app I can use that will do this for me? I want to verify the type of Dolby Surround or Pro Logic downmix in the recording if possible, so I know what mode to set my receiver.Parette
D
2

If the file has more than one channel, you can with some certainty assume that they are used for surround purposes, although they could be just multiple tracks. In this case it falls on a playing system to do with channels as it "thinks" best. (if file header doesn't say what to do)

But your file is stereo. If you want to know whether it is a virtual surround file then you look in header for an encoder field to see which encoder was used. This may help somewhat, although not much. Mostly encoder field is left empty, and second thing is that the encoder doesn't have to be same as the recoder that mixed down the surround data. I.e. the recoder will first create raw PCM data, then feed it to some encoder to produce compressed file. (AAC or whatever) Also, there are many applications and versions vary, so might the encoder field, so tracking all of them would be nasty work.

However, you can, with over 60% certainty, deduce whether something is virtual surround or not by examining the data. This would be advanced DSP and, for speed, even machine learning may be involved. You would have to find out whether the stereo signals contain certain features of HRTF (head related transfer function). This may be achieved by examining intensity difference and delay features between same sounds appearing in time domain and harmonic features (characteristic frequency changes) in frequency domain. You would have to do both, because one without another may just tell you that something is very good stereo recording,, not a virtual surround. I don't know whether there are HRTF specific features mapped somewhere already, or you would need to do it by yourself.

It's a very complicated solution that takes a lot of time to make properly. Also it's performance would be problematic.

With this method you can also break the stereo mixdown to the nearly original surround channels. But for stereo to surround conversion other methods are used and they sound well.

If you are determined to perform such a detection, dedicate half a year or more of hard work if no HRTF features are mapped, few weeks if they are, brace yourself for big stress and I wish you luck. I have done something similar. It is a killer.

If you want an out of the box solution, then the answer to your question is no, unless header provides you with encoder field and the encoder is distinctive and known to be used only for doing surround to stereo conversion. I do not think anyone did this from actual data as I described, or if they did it is a part of commercial product. Doing what you want is not usually needed, but it can be done.

Ow, BTW, try googling HRTF inversion, it might give some help.

Denman answered 30/4, 2017 at 13:27 Comment(3)
One, Dolby PL2 has no header; two, Dolby PL2 has nothing to do with HRTF except in the broadest possible sense; three, saying that solving the problem would be "advanced DSP" and "artificial intelligence would be involved" is neither an answer nor correct.Nematic
One: We are talking of whether a stereo file was once dolby or not. AAC container most certainly has a header. And, two, yes, I would use DSP on data to extract features from signals hoping to get those of a surround (which ARE most similar to changes induced by HRTF). I would feed these features to a classifyer like NN to see whether signals are or are not belonging to a surround.Denman
If AI was too broad for anyone, I am changing AI to machine learning. In a method I propose, I'd use it. And, I am not a big supporter. In fact, I would use both my and your method to achieve 90% certainty or over.Denman

© 2022 - 2024 — McMap. All rights reserved.