How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?
Asked Answered
L

6

22

I'm writing some code that plays back WAV files at different speeds, so that the wave is either slower and lower-pitched, or faster and higher-pitched. I'm currently using simple linear interpolation, like so:

            int newlength = (int)Math.Round(rawdata.Length * lengthMultiplier);
            float[] output = new float[newlength];

            for (int i = 0; i < newlength; i++)
            {
                float realPos = i / lengthMultiplier;
                int iLow = (int)realPos;
                int iHigh = iLow + 1;
                float remainder = realPos - (float)iLow;

                float lowval = 0;
                float highval = 0;
                if ((iLow >= 0) && (iLow < rawdata.Length))
                {
                    lowval = rawdata[iLow];
                }
                if ((iHigh >= 0) && (iHigh < rawdata.Length))
                {
                    highval = rawdata[iHigh];
                }

                output[i] = (highval * remainder) + (lowval * (1 - remainder));
            }

This works fine, but it tends to sound OK only when I lower the frequency of the playback (i.e. slow it down). If I raise the pitch on playback, this method tends to produce high-frequency artifacts, presumably because of the loss of sample information.

I know that bicubic and other interpolation methods resample using more than just the two nearest sample values as in my code example, but I can't find any good code samples (C# preferably) that I could plug in to replace my linear interpolation method here.

Does anyone know of any good examples, or can anyone write a simple bicubic interpolation method? I'll bounty this if I have to. :)

Update: here are a couple of C# implementations of interpolation methods (thanks to Donnie DeBoer for the first one and nosredna for the second):

    public static float InterpolateCubic(float x0, float x1, float x2, float x3, float t)
    {
        float a0, a1, a2, a3;
        a0 = x3 - x2 - x0 + x1;
        a1 = x0 - x1 - a0;
        a2 = x2 - x0;
        a3 = x1;
        return (a0 * (t * t * t)) + (a1 * (t * t)) + (a2 * t) + (a3);
    }

    public static float InterpolateHermite4pt3oX(float x0, float x1, float x2, float x3, float t)
    {
        float c0 = x1;
        float c1 = .5F * (x2 - x0);
        float c2 = x0 - (2.5F * x1) + (2 * x2) - (.5F * x3);
        float c3 = (.5F * (x3 - x0)) + (1.5F * (x1 - x2));
        return (((((c3 * t) + c2) * t) + c1) * t) + c0;
    }

In these functions, x1 is the sample value ahead of the point you're trying to estimate and x2 is the sample value after your point. x0 is left of x1, and x3 is right of x2. t goes from 0 to 1 and is the distance between the point you're estimating and the x1 point.

The Hermite method seems to work pretty well, and appears to reduce the noise somewhat. More importantly it seems to sound better when the wave is sped up.

Lupe answered 14/7, 2009 at 14:15 Comment(8)
Isn't bicubic what you do to a 2D signal (ie image)? Surely you mean cubic for a 1D (ie audio) signal? I may be wrong though ...Delve
@Goz: it might just be "cubic" for 1D audio (I dunno). Part of my problem is that all the code samples I've seen are for 2D graphics, and I just need the one D.Lupe
@MusiGenesis, yeah you DON'T want to go with a solution for graphics. The ear is very picky. You want to use a solution that gives you a large signal-to-noise ratio. See my answer below.Seedy
I just reread and noticed that you were going to offer a bounty. Man, why did I answer so quickly?Seedy
I haven't picked an answer yet, so the bounty may still come up. I may put my entire rep up.Lupe
I don't think you should put your entire rep up. The best you can do to take an existing sample and create an oversampled version is to use a curve that adds as little noise to your original sample as possible. You can't bring back any missing information. Therefore, you're unlikely to do any better than the wonderful Elephant formulae. However, you could ask the question, "Why does my audio sound 'cheap' when I raise the frequency of it?"Seedy
I've been asking that question for years. :)Lupe
Many instruments have characteristic sounds that are screwed up when you raise their pitch. The human voice is one. Male and female voices have formants. If you just raise the pitch, it can sound cheap or goofy. But if you raise the pitch and keep the formants, it can sound better. Some synths have formant filter sections.Seedy
S
25

My favorite resource for audio interpolating (especially in resampling applications) is Olli Niemitalo's "Elephant" paper.

I've used a couple of these and they sound terrific (much better than a straight cubic solution, which is relatively noisy). There are spline forms, Hermite forms, Watte, parabolic, etc. And they are discussed from an audio point-of-view. This is not just your typical naive polynomial fitting.

And code is included!

To decide which to use, you probably want to start with the table on page 60 which groups the algorithms into operator complexity (how many multiplies, and how many adds). Then choose among the best signal-to-noise solutions--use your ears as a guide to make the final choice. Note: Generally, the higher SNR, the better.

Seedy answered 14/7, 2009 at 15:26 Comment(25)
Great link, again. I'm digging into it now. I just implemented the cubic interpolation from another answer, and it sounded terrible.Lupe
If you use any of these with a SNR over, say, 60, you should not be able to hear artifacts. If you do, you've probably made a mistake implementing. And don't underestimate the ease with which you can screw up which point is which and what your "between" value is. I screwed up my first try. You may have even messed up the cubic. It helps to graph sections of your input and output. :-)Seedy
Huh, like I evver make mistaks!Lupe
I think I implemented the cubic right - it was only a slight modification of my original code.Lupe
A bad result from the cubic would not surprise me. But it's incredibly easy to be off by one or get your "between value" at 1-x when it should be x, etc.Seedy
I might have missed this in the text, but what's the difference between an x-form implementation and a z-form implementation?Lupe
That's discussed on 35 and 36. If I recall correctly, they are equivalent in result, but one can be cheaper to calculate than the other depending on the specific interpolator. To you it should not make a difference. He does both for each equation and presents the cheaper solution.Seedy
Also, pardon my density here, but the linked article recommends these interpolators for oversampled data, but I'm working on pre-recorded WAV files which can't be oversampled (unless I'm misunderstanding the term?). The author says oversampling is left to the reader, so maybe I need to ask another question here.Lupe
It's been 3 or 4 years since I used one of these, though. So I'm highly likely to be wrong on any given point. The only thing I'm sure of is that the interpolators in the paper are better than anything else I ever found. I had a moment of panic when I searched for the paper, as it's not where i used to be. I got a 404 and almost feinted. Then I searched for the pdf's filename and finally found it. Scary.Seedy
The author mentions high-N oversampling with simple linear interpolation as a high-quality approach to this problem. This may be a better approach for my purposes, since I could do the oversampling once on the original audio, and then produce different playback speeds on the fly quickly with linear interpolation.Lupe
I've saved a copy on my laptop - that one's a keeper.Lupe
It's ideal to have oversampled data. If you don't, there's nothing you can do about it. I've used these in real-time ring modulators, oscillators, etc.Seedy
Is it possible to produce oversampled data from a pre-recorded WAV file? I'll ask this again as a real question - I just don't want to look especially stupid.Lupe
@MusiGnesis, your situation is different from mine. Whatever you end up doing, if it gets good results I want to know what you did. What I like about these equations is you won't get noise or weird overtones.Seedy
No, you can't get oversampled data from a prerecorded WAV file. Nyquist, essentially, says you don't have any real info for any frequencies above 1/2 the sampling rate.Seedy
Since you have existing data (and it's the same problem as real-time generated data, except that we can choose to be running our oscillators at 2x, 4x, 8x, or whatever the processor will handle), all you can do is draw a curve through your points that adds the least amount of noise possible. You can't get any extra data from your sample--that data no longer exists--it may as well be in a black hole.Seedy
I'm going to implement the 4-point Hermite, which he seems to say is the best for un-oversampled data. The funny thing is, I've never really found linear interpolation to be all that awful, but I'm using WAV file data for pitched synthesizer sounds and I can usually get away with only lowering the frequency. I'd like to be able to raise the pitch without sounding like a 1993 soundcard FM synth.Lupe
Is it possible that what's bothering you about raising the pitch is the Alvin and the Chipmunks effect? Do you know anything about formant filters? (But the linear interpolation is absolutely adding noise 10dB under your signal.)Seedy
I'm doing a version of wavetable synthesis, but my code forms the amplitude envelopes for all the notes, so the overall shape is the same regardless of pitch. The particular problem I'm having is with the sustain part of a note sounding kind of "scratchy" (if that makes sense) when I create a note at a higher frequency. The Hermite interpolation seems to make this part sound much better and clearer, but it might just be that my audio neurons are dying at a faster rate today.Lupe
How are you doing sustain? Looping the same section over and over? Where does that section come from? You make be lifting some low frequency noise into audible range.Seedy
It's a loop approach. I don't think the problem is low frequency noise lifted into the audible. The "scratchiness" is very high frequency - I can get rid of it with a lowpass filter, but I don't like the overall dulling effect and I'm trying to avoid having to do an FFT.Lupe
Any frequencies near Nyquist could do that to you when you speed up. You have to do the lowpass before you speed up, or else you'll be cutting out sounds that belong there, not just the ones that folded over (that's why you're getting a lack of brightness, probably). Suppose you have a sample you want to play at 44100. You have a sample you want to play at double speed. Put the lowpass cutoff at 11025, do the filter, then speed up. You don't want any frequencies going over Nyquist.Seedy
Thank you. This is the updated address, which contains a revised paper and accompanying source code zip file: yehar.com/blog/?p=197Kolomna
Hi i checked all algorithms, and i settled on B-spline for simple and good quality, but it has an error, because when i calculate sample between x0,x1,x2,x3, all the values between x1 to x2 are having a little offset, coz if i try to draw audio wave curves from this algo, it shows that the curve never reaches the values x1 & x2, but is a little lower than the original waveform. Currently using Hermite with no such error.Can anybody guide me how & when to use the Optimal algo, it it useful? it has many variation 2x,4x,8x how & when to use it?Absolve
@Seedy And please can anybody suggest, which algorithm is best for downsampling if i use Optimal 2x for all resampling from any sample rate to any sample rate will it be ok, or should i use all the algorithms and switch to them 2x,4x,8x..whenever needed? and is Optimal 2x algo good for downsampling?Absolve
U
8
double InterpCubic(double x0, double x1, double x2, double x3, double t)
{
   double a0, a1, a2, a3;

   a0 = x3 - x2 - x0 + x1;
   a1 = x0 - x1 - a0;
   a2 = x2 - x0;
   a3 = x1;

   return a0*(t^3) + a1*(t^2) + a2*t + a3;
}

where x1 and x2 are the samples being interpolated between, x0 is x1's left neighbor, and x3 is x2's right neighbor. t is [0, 1], denoting the interpolation position between x1 and x2.

Unsocial answered 14/7, 2009 at 15:18 Comment(6)
Me likey. I'm going to try this out.Lupe
One note: since cubic interpolation uses 4 samples (the 2 being interpolated between and their 2 closest neighbors), you must figure out how to handle the very first interpolation interval and the very last interpolation interval of the waveform data. Often, people just invent phantom samples to the left and right.Unsocial
No problem. My example above uses a phantom sample to the right.Lupe
Well, it works, but unfortunately it sounds even worse than linear interpolation. Linear interpolation seems to produce more of a general low-level hiss, whereas this cubic formula tends to produce a high-pitched ringing.Lupe
If you look at your processed sample with a spectrum analyzer, you'll see the hiss as a noise floor. The ringing is probably the result of a few added frequencies from the cubic that modulate in amplitude.Seedy
Be careful when copying this code, the ^operator is often used outside of C-languages to indicate "to the power of" but in most languages it is the XOR operator. So replace t^3 with ttt and t^2 with t*t.Dome
K
3

Honestly, cubic interpolation isn't generally much better for audio than linear. A simple suggestion for improving your linear interpolation would be to use an antialiasing filter (before or after the interpolation, depending on whether you are shortening the signal or lengthening it). Another option (though more computationally expensive) is sinc-interpolation, which can be done with very high quality.

We have released some simple, LGPL resampling code that can do both of these as part of WDL (see resample.h).

Korry answered 2/7, 2010 at 4:8 Comment(1)
yeah, I agree. I ultimately went back to straight interpolation, as the improvement over linear interpolation was essentially imperceptible.Lupe
G
2

You're looking for polynomial interpolation. The idea is that you pick a number of known data points around the point you want to interpolate, compute an interpolated polynomial using the data points, and then find out the value of the polynomial and the interpolation point.

There are other methods. If you can stomach the math, look at signal reconstruction, or google for "signal interpolation".

Ghazi answered 14/7, 2009 at 15:6 Comment(1)
I can stomach the math, but mainly I'm looking for code samples (preferably C# or C).Lupe
H
1

I don't have enough reputation to comment on Donnie's answer so I hope it's okay if I at least partially reference it here in addition to answering the question. I co-maintain the Godot engine's audio system which used the same coefficients as the polynomial provided in that answer for a while and just wanted to throw it out there that the coefficients are wrong in that code snippet. The code given has some pretty severe artifacts that show up especially with low-frequency audio. And for that matter, at least one of the algorithms in the paper Nosredna's answer gives have some pretty severe low-pass. Godot has switched back to a simple cubic resampling scheme and it seems to be working well for most users.

I highly recommend using cubic resampling with the following polynomial:

a0 = 3 * y1 - 3 * y2 + y3 - y0;
a1 = 2 * y0 - 5 * y1 + 4 * y2 - y3;
a2 = y2 - y0;
a3 = 2 * y1;

out = (a0 * mu * mu2 + a1 * mu2 + a2 * mu + a3) / 2;

Where y0, y1, y2, and y3 are successive samples in the original audio, from earliest to latest, mu is the fractional component of the time in samples, and mu2 is the square of mu (which exists entirely to save a multiplication if the compiler fails to optimize correctly).

The math is beyond me but these coefficients have been working well in Godot for some time now with no user complaints.

Hixon answered 8/4, 2022 at 18:12 Comment(1)
These coefficients correspond to cubic Hermite interpolation.Blackamoor
B
1

This version of cubic Hermite interpolation uses fewer instructions than the others, both with and without Fused Multiply-Add.

float InterpolateHermite(float x0, float x1, float x2, float x3, float t)
{
    float diff = x1 - x2;
    float c1 = x2 - x0;
    float c3 = x3 - x0 + 3 * diff;
    float c2 = -(2 * diff + c1 + c3);
    return 0.5f * ((c3 * t + c2) * t + c1) * t + x1;
}

Hermite interpolation is faster than Lagrange interpolation but has worse phase accuracy. Their amplitude responses have the same range, but Hermite is closer to flat away from 0.5n. Libsoxr's low-quality option has an optimized cubic Lagrange interpolator cubic_stage_fn, under LGPL license, with 6 multiplies, 12 adds. This can be compared to Olli Niemitalo's Lagrange code, which has 9 multiplies, 11 adds.

Hermite is also implemented on this page in Nosredna's InterpolateHermite4pt3oX, in Olli Niemitalo's 2001 paper, and in Ellen Poe's answer.

Blackamoor answered 5/5, 2022 at 5:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.