Efficient Bicubic filtering code in GLSL?
Asked Answered
S

8

17

I'm wondering if anyone has complete, working, and efficient code to do bicubic texture filtering in glsl. There is this:

http://www.codeproject.com/Articles/236394/Bi-Cubic-and-Bi-Linear-Interpolation-with-GLSL or https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/GPU/Shaders/Interp/interpolation-bicubic.glsl

but both do 16 texture reads where only 4 are necessary:

https://groups.google.com/forum/#!topic/comp.graphics.api.opengl/kqrujgJfTxo

However the method above uses a missing "cubic()" function that I don't know what it is supposed to do, and also takes an unexplained "texscale" parameter.

There is also the NVidia version:

https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-20-fast-third-order-texture-filtering

but I believe this uses CUDA, which is specific to NVidia's cards. I need glsl.

I could probably port the nvidia version to glsl, but thought I'd ask first to see if anyone already has a complete, working glsl bicubic shader.

Sidra answered 21/11, 2012 at 19:56 Comment(6)
"but both do 16 texture reads where only 4 are necessary:" That post is lying to you. Bicubic interpolation is not doing 4 bilinear samplings; that's just linear filtering on a larger scale. Bicubic interpolation requires doing cubic interpolation of the values, not linear interpolation. And you can't do cubic interpolation by doing a bunch of linear interpolations. It's like the difference between a Bezier curve and the lines created by connecting the 4 Bezier points. It's not quite the same thing, is it?Latimore
Shader that i posted doing something like texcoord = cubic(lerp(texcoord)) so it basically applying extra function on top of texture coordinate interpolation. This type of filtering can be used for image resizing.Constrain
@NicolBolas: you are mistaken; as documented in the GPUGems 2 chapter and written below by JAre and Maf, it is perfectly possible to perform bicubic lookup using 4 linear interpolations.Aframe
@NicolBolas And you can't do cubic interpolation by doing a bunch of linear interpolations. That's not technically correct, this how De Casteljau's algorithm worksKornegay
The GPUGems article is not CUDA. It's either Cg or HLSL.Homs
It's Cg, "The Cg code of the fragment program for one-dimensional cubic filtering is shown in Listing 20-1", developer.nvidia.com/cg-toolkit, en.wikipedia.org/wiki/Cg_(programming_language), now deprecated.Homs
S
12

I decided to take a minute to dig my old Perforce activities and found the missing cubic() function; enjoy! :)

vec4 cubic(float v)
{
    vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
    vec4 s = n * n * n;
    float x = s.x;
    float y = s.y - 4.0 * s.x;
    float z = s.z - 4.0 * s.y + 6.0 * s.x;
    float w = 6.0 - x - y - z;
    return vec4(x, y, z, w);
}
Scutari answered 21/5, 2014 at 13:32 Comment(1)
This appears to be identical to the one in @masterxilo's answer, and similar to others except in coefficients, but differs in result by a factor of 6. I expect that this version assumes the caller will divide by 6 as part of using it. Just something to be aware of if you're copy/pasting.Scalade
H
24

I found this implementation which can be used as a drop-in replacement for texture() (from http://www.java-gaming.org/index.php?topic=35123.0 (one typo fixed)):

// from http://www.java-gaming.org/index.php?topic=35123.0
vec4 cubic(float v){
    vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
    vec4 s = n * n * n;
    float x = s.x;
    float y = s.y - 4.0 * s.x;
    float z = s.z - 4.0 * s.y + 6.0 * s.x;
    float w = 6.0 - x - y - z;
    return vec4(x, y, z, w) * (1.0/6.0);
}

vec4 textureBicubic(sampler2D sampler, vec2 texCoords){

   vec2 texSize = textureSize(sampler, 0);
   vec2 invTexSize = 1.0 / texSize;
   
   texCoords = texCoords * texSize - 0.5;

   
    vec2 fxy = fract(texCoords);
    texCoords -= fxy;

    vec4 xcubic = cubic(fxy.x);
    vec4 ycubic = cubic(fxy.y);

    vec4 c = texCoords.xxyy + vec2 (-0.5, +1.5).xyxy;
    
    vec4 s = vec4(xcubic.xz + xcubic.yw, ycubic.xz + ycubic.yw);
    vec4 offset = c + vec4 (xcubic.yw, ycubic.yw) / s;
    
    offset *= invTexSize.xxyy;
    
    vec4 sample0 = texture(sampler, offset.xz);
    vec4 sample1 = texture(sampler, offset.yz);
    vec4 sample2 = texture(sampler, offset.xw);
    vec4 sample3 = texture(sampler, offset.yw);

    float sx = s.x / (s.x + s.y);
    float sy = s.z / (s.z + s.w);

    return mix(
       mix(sample3, sample2, sx), mix(sample1, sample0, sx)
    , sy);
}

Example: Nearest, bilinear, bicubic:

enter image description here

The ImageData of this image is

{{{0.698039, 0.996078, 0.262745}, {0., 0.266667, 1.}, {0.00392157, 
   0.25098, 0.996078}, {1., 0.65098, 0.}}, {{0.996078, 0.823529, 
   0.}, {0.498039, 0., 0.00392157}, {0.831373, 0.00392157, 
   0.00392157}, {0.956863, 0.972549, 0.00784314}}, {{0.909804, 
   0.00784314, 0.}, {0.87451, 0.996078, 0.0862745}, {0.196078, 
   0.992157, 0.760784}, {0.00392157, 0.00392157, 0.498039}}, {{1., 
   0.878431, 0.}, {0.588235, 0.00392157, 0.00392157}, {0.00392157, 
   0.0666667, 0.996078}, {0.996078, 0.517647, 0.}}}

I tried to reproduce this (many other interpolation techniques)

enter image description here

but they have clamped padding, while I have repeating (wrapping) boundaries. Therefore it is not exactly the same.

It seems this bicubic business is not a proper interpolation, i.e. it does not take on the original values at the points where the data is defined.

Homs answered 11/2, 2017 at 18:55 Comment(2)
I found this algorithm to be artifact free as compared to the combo of the two functions offered in Maf and JAre's answers. If you'd like to see for yourself, the easiest way is to go to this shader shadertoy.com/view/XtKfRV and swap out the BicubicTexture() and cubic() functions for the ones in this post. The former produces fairly apprent block-like artifacts and this one is quite smooth. Thanks, masterxilo! +1Thekla
Indeed, this is not interpolation, it looks more like a kind of blur applied to the linearly-interpolated image.Leptospirosis
S
12

I decided to take a minute to dig my old Perforce activities and found the missing cubic() function; enjoy! :)

vec4 cubic(float v)
{
    vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
    vec4 s = n * n * n;
    float x = s.x;
    float y = s.y - 4.0 * s.x;
    float z = s.z - 4.0 * s.y + 6.0 * s.x;
    float w = 6.0 - x - y - z;
    return vec4(x, y, z, w);
}
Scutari answered 21/5, 2014 at 13:32 Comment(1)
This appears to be identical to the one in @masterxilo's answer, and similar to others except in coefficients, but differs in result by a factor of 6. I expect that this version assumes the caller will divide by 6 as part of using it. Just something to be aware of if you're copy/pasting.Scalade
S
8

Wow. I recognize the code above (I can not comment w/ reputation < 50) as I came up with it in early 2011. The problem I was trying to solve was related to old IBM T42 (sorry the exact model number escapes me) laptop and it's ATI graphics stack. I developed the code on NV card and originally I used 16 texture fetches. That was kinda of slow but fast enough for my purposes. When someone reported it did not work on his laptop it became apparent that they did not support enough texture fetches per fragment. I had to engineer a work-around and the best I could come up with was to do it with number of texture fetches that would work.

I thought about it like this: okay, so if I handle each quad (2x2) with linear filter the remaining problem is can the rows and columns share the weights? That was the only problem on my mind when I set out to craft the code. Of course they could be shared; the weights are same for each column and row; perfect!

Now I had four samples. The remaining problem was how to correctly combine the samples. That was the biggest obstacle to overcome. It took about 10 minutes with pencil and paper. With trembling hands I typed the code in and it worked, nice. Then I uploaded the binaries to the guy who promised to check it out on his T42 (?) and he reported it worked. The end. :)

I can assure that the equations check out and give mathematically identical results to computing the samples individually. FYI: with CPU it's faster to do horizontal and vertical scan separately. With GPU multiple passes is not that great idea, especially when it's probably not feasible anyway in typical use case.

Food for thought: it is possible to use a texture lookup for the cubic() function. Which is faster depends on the GPU but generally speaking, the sampler is light on the ALU side just doing the arithmetic would balance things out. YMMV.

Scutari answered 21/5, 2014 at 10:34 Comment(0)
S
7

The missing function cubic() in JAre's answer could look like this:

vec4 cubic(float x)
{
    float x2 = x * x;
    float x3 = x2 * x;
    vec4 w;
    w.x =   -x3 + 3*x2 - 3*x + 1;
    w.y =  3*x3 - 6*x2       + 4;
    w.z = -3*x3 + 3*x2 + 3*x + 1;
    w.w =  x3;
    return w / 6.f;
}

It returns the four weights for cubic B-Spline.

It is all explained in NVidia Gems.

Sarcophagus answered 9/2, 2013 at 19:47 Comment(0)
C
5

(EDIT)

Example:

cubic spline

  • Texscale is sampling window size coefficient. You can start with 1.0 value.

vec4 filter(sampler2D texture, vec2 texcoord, vec2 texscale)
{
    float fx = fract(texcoord.x);
    float fy = fract(texcoord.y);
    texcoord.x -= fx;
    texcoord.y -= fy;

    vec4 xcubic = cubic(fx);
    vec4 ycubic = cubic(fy);

    vec4 c = vec4(texcoord.x - 0.5, texcoord.x + 1.5, texcoord.y -
0.5, texcoord.y + 1.5);
    vec4 s = vec4(xcubic.x + xcubic.y, xcubic.z + xcubic.w, ycubic.x +
ycubic.y, ycubic.z + ycubic.w);
    vec4 offset = c + vec4(xcubic.y, xcubic.w, ycubic.y, ycubic.w) /
s;

    vec4 sample0 = texture2D(texture, vec2(offset.x, offset.z) *
texscale);
    vec4 sample1 = texture2D(texture, vec2(offset.y, offset.z) *
texscale);
    vec4 sample2 = texture2D(texture, vec2(offset.x, offset.w) *
texscale);
    vec4 sample3 = texture2D(texture, vec2(offset.y, offset.w) *
texscale);

    float sx = s.x / (s.x + s.y);
    float sy = s.z / (s.z + s.w);

    return mix(
        mix(sample3, sample2, sx),
        mix(sample1, sample0, sx), sy);
}

Source

Constrain answered 21/11, 2012 at 21:37 Comment(5)
That's the exact same link I gave in my original post (see the 3rd link). Which would be fine... if you can explain what I need to write for the missing cubic() function, and what to pass in the texscale parameters. If you can explain that, I'm happy to mark your answer as the final answer.Sidra
However, I'm not sure how you would write a cubic function in C giving a single input parameter. All examples I found when Googling it required many input parameters.Sidra
@VernJensen it's most likely generic cubic spline (like on picture) + you have scale to adjust. In Gaussian blur shaders lookup matrix often also generic constant value.Constrain
Note that exploiting linear hardware interpolation only works for non-negative splines (other than e.g. the Catmull-Rom spline in Codeproject's sample), since you can't adjust tex coords in a way that bias a texel below zero.Dejection
What's the purpose of 'fract' here?Myall
A
2

For anybody interested in GLSL code to do tri-cubic interpolation, ray-casting code using cubic interpolation can be found in the examples/glCubicRayCast folder in: http://www.dannyruijters.nl/cubicinterpolation/CI.zip

edit: The cubic interpolation code is now available on github: CUDA version and WebGL version, and GLSL sample.

Aframe answered 5/8, 2013 at 9:21 Comment(0)
K
2

I've been using @Maf 's cubic spline recipe for over a year, and I recommend it, if a cubic B-spline meets your needs.

But I recently realized that, for my particular application, it is important for the intensities to match exactly at the sample points. So I switched to using a Catmull-Rom spline, which uses a slightly different recipe like so:

// Catmull-Rom spline actually passes through control points
vec4 cubic(float x) // cubic_catmullrom(float x)
{
    const float s = 0.5; // potentially adjustable parameter
    float x2 = x * x;
    float x3 = x2 * x;
    vec4 w;
    w.x =    -s*x3 +     2*s*x2 - s*x + 0;
    w.y = (2-s)*x3 +   (s-3)*x2       + 1;
    w.z = (s-2)*x3 + (3-2*s)*x2 + s*x + 0;
    w.w =     s*x3 -       s*x2       + 0;
    return w;
}

I found these coefficients, plus those for a number of other flavors of cubic splines, in the lecture notes at: http://www.cs.cmu.edu/afs/cs/academic/class/15462-s10/www/lec-slides/lec06.pdf

Keheley answered 18/9, 2015 at 18:35 Comment(3)
Interesting and useful thread. Unfortunately, the trick in the higher level textureBicubic to reduce the number of texture lookups from 16 to 4 relies on the fact that all the weights returned by cubic are non-negative. The combination of textureBicubic and the cubic_catmullrom is almost equivalent to linear interpolation.Urial
Caught by 'only edit for 5 mins rule .... Interesting and useful thread. Very late comment on an old thread ... Unfortunately, the trick in the higher level textureBicubic to reduce the number of texture lookups from 16 to 4 relies on the fact that all the weights returned by cubic() are non-negative. This is not so for cubic_catmullrom, and would not be so for alternative smooth interpolations through the points. This prevents the combination of textureBicubic and the cubic_catmullrom working as it should; it degrades to be (almost?) equivalent to linear interpolation.Urial
@StephenTodd You are right. I did not understand that when I wrote this answer, and I do not have time at the moment to properly update this answer. If I remember correctly, Catmull-Rom can be done using 9 linear lookups instead of 16, but that approach involves modifying more that just this cubic() function.Keheley
U
0

I think it is possible that the Catmull version could be done with 4 texture lookups by (a) arranging the input texture like a chessboard with alternate slots saved as positives and as negatives, and (b) an associated modification of textureBicubic. That would rely on the contributions/weights w.x/w.w always being negative, and the contributions w.y/w.z always being positive. I haven't double-checked if this is true, or exactly how the modified textureBicubic would look.

... I have verified that w contributions do satisfy the +ve -ve rules.

Urial answered 21/12, 2018 at 10:0 Comment(1)
It would be very cool if you could work out this idea further. ( E.g. is just one single texture with checkered sign enough or not ).Sprung

© 2022 - 2024 — McMap. All rights reserved.