Fast Gaussian blur at pause [closed]
Asked Answered
I

1

6

In cocos2d-x I need to implement fast gaussian blur and here is how it should looks like( I just found some game on the App Store with already done such blur, in unity):

enter image description here

So, it's nice fadeIn-fadeOut blur when user pauses the game.

GPUImage already has a fast blur I need, but I can't find solution for cocos2d-x.

Here is result of live camera view using GPUImage2 - tested on iPod Touch 5G, and it works fast on this slow and old device.

Blur in GPUImage works very fast even on very slow devices like iPod Touch 5G.
Looking for solution with super fast Gaussian blur for cocos2d-x.

Impi answered 14/10, 2017 at 13:38 Comment(6)
I don't know about cocos2d, but you can easily do this with: 1. capture the original screen, 2. for each frame, apply a small-radius gaussian blur, using the previous frame as input. Store the resulting picture. At resume, display the stored pictures in reverse order.Sociometry
If you need animated image blur, then: you can use a variable-width gaussian blur: for small blurs, you can blur the input with simply applying gaussian blur to it. For larger blurs, first downscale the input, and then apply a smaller gaussian blur to it (2d gaussian blur is separable so this won't be slow).Sociometry
@Sociometry Amusingly, most decent quality downsampling mathematically consists of blurring your input then point sampling (admittedly not a guassian blur). For even larger blurs, you can calculate the kernel of your blur, FFT both your image and the kernel, pointwise multiply, then reverse-FFT the result (which is O(1) in Gaussian radius, instead of O(n), so starts being practical when your blur needs to be more than a few dozen pixels in size).Sweeten
@Yakk: yeah, if your input image has a lot of high-frequency content, then a simple box filter for downscale might be not adequate. But usually, for real-time mobile applications, box filter is okay (a lot of years ago, I've experimented with truncated sinc filters, and most of the time, there was no visual difference, so it doesn't worth the hassle+extra computational cost). And I've never seen anyone using FFT for real-time blur for games :). Downscale + gaussian blur is usually a very good approximation for real gaussian blur (at least, it is visually as appealing as the real one).Sociometry
Your question is still too broad, and questions about code can't depend on external links like your Github repo. The code necessary to reproduce your problem needs to go in the question.Tarp
@Impi You can edit your question forever, but you should not edit it in response to the answers you've already gotten. Your question needs to remain the same so that the answers remain valid.Tarp
G
24

After studying "Post-Processing Effects in Cocos2d-X" and "RENDERTEXTURE + BLUR", I came along to the following solution.

The common way to achieve post processing effects in Cocos2s-X is to implement layers. The scene is one layer, and a post process is another layer, which uses the scene layer as an input. With this technique, the post process can manipulate the rendered scene.

The blur algorithm is implemented in a shader. A common way to apply a blur effect on a scene is to blur first along he X-axis of the viewport and in an second pass along the Y-axis of the viewport (see ShaderLesson5). This is an acceptable approximations, which gives a massive gain of performance.

This means, that we need 2 post process layers in Cocos2s-X. So wee need 3 layers, one for the scene and 2 for the post processes:

// scene (game) layer
m_gameLayer = Layer::create();
this->addChild(m_gameLayer, 0);

// blur X layer
m_blurX_PostProcessLayer = PostProcess::create("shader/blur.vert", "shader/blur.frag");
m_blurX_PostProcessLayer->setAnchorPoint(Point::ZERO);
m_blurX_PostProcessLayer->setPosition(Point::ZERO);
this->addChild(m_blurX_PostProcessLayer, 1);

// blur y layer
m_blurY_PostProcessLayer = PostProcess::create("shader/blur.vert", "shader/blur.frag");
m_blurY_PostProcessLayer->setAnchorPoint(Point::ZERO);
m_blurY_PostProcessLayer->setPosition(Point::ZERO);
this->addChild(m_blurY_PostProcessLayer, 2);

Note, the sprites and resources of the scene have to be added to m_gameLayer.

In the updated methode, the post processes have to be apllied to the scene (I'll describe the setup of the uniforms later):

// blur in X direction

cocos2d::GLProgramState &blurXstate = m_blurX_PostProcessLayer->ProgramState();
blurXstate.setUniformVec2( "u_blurOffset", Vec2( 1.0f/visibleSize.width, 0.0 ) ); 
blurXstate.setUniformFloat( "u_blurStrength", (float)blurStrength );

m_blurX_PostProcessLayer->draw(m_gameLayer);

// blur in Y direction

cocos2d::GLProgramState &blurYstate = m_blurY_PostProcessLayer->ProgramState();
blurYstate.setUniformVec2( "u_blurOffset", Vec2( 0.0, 1.0f/visibleSize.height ) );
blurYstate.setUniformFloat( "u_blurStrength", (float)blurStrength );

m_blurY_PostProcessLayer->draw(m_blurX_PostProcessLayer);


For the management of the post process I implemented a class PostProcess, where I tried to keep things as simple as possible:

PostProcess.hpp

#include <string>
#include "cocos2d.h"

class PostProcess : public cocos2d::Layer
{
private:
    PostProcess(void) {}
    virtual ~PostProcess() {}
public:
    static PostProcess* create(const std::string& vertexShaderFile, const std::string& fragmentShaderFile);
    virtual bool init(const std::string& vertexShaderFile, const std::string& fragmentShaderFile);
    void draw(cocos2d::Layer* layer);
    cocos2d::GLProgram      & Program( void )      { return *_program; }
    cocos2d::GLProgramState & ProgramState( void ) { return *_progState; }
private:
    cocos2d::GLProgram       *_program;
    cocos2d::GLProgramState  *_progState;
    cocos2d::RenderTexture   *_renderTexture;
    cocos2d::Sprite          *_sprite;
};

PostProcess.cpp

#include "PostProcess.hpp"

using namespace cocos2d;

bool PostProcess::init(const std::string& vertexShaderFile, const std::string& fragmentShaderFile)
{
    if (!Layer::init()) {
        return false;
    }

    _program = GLProgram::createWithFilenames(vertexShaderFile, fragmentShaderFile);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_COLOR, GLProgram::VERTEX_ATTRIB_POSITION);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_POSITION, GLProgram::VERTEX_ATTRIB_COLOR);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_TEX_COORD, GLProgram::VERTEX_ATTRIB_TEX_COORD);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_TEX_COORD1, GLProgram::VERTEX_ATTRIB_TEX_COORD1);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_TEX_COORD2, GLProgram::VERTEX_ATTRIB_TEX_COORD2);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_TEX_COORD3, GLProgram::VERTEX_ATTRIB_TEX_COORD3);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_NORMAL, GLProgram::VERTEX_ATTRIB_NORMAL);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_BLEND_WEIGHT, GLProgram::VERTEX_ATTRIB_BLEND_WEIGHT);
    _program->bindAttribLocation(GLProgram::ATTRIBUTE_NAME_BLEND_INDEX, GLProgram::VERTEX_ATTRIB_BLEND_INDEX);
    _program->link();

    _progState = GLProgramState::getOrCreateWithGLProgram(_program);

    _program->updateUniforms();

    auto visibleSize = Director::getInstance()->getVisibleSize();

    _renderTexture = RenderTexture::create(visibleSize.width, visibleSize.height);
    _renderTexture->retain();

    _sprite = Sprite::createWithTexture(_renderTexture->getSprite()->getTexture());
    _sprite->setTextureRect(Rect(0, 0, _sprite->getTexture()->getContentSize().width,
    _sprite->getTexture()->getContentSize().height));
    _sprite->setAnchorPoint(Point::ZERO);
    _sprite->setPosition(Point::ZERO);
    _sprite->setFlippedY(true);
    _sprite->setGLProgram(_program);
    _sprite->setGLProgramState(_progState);
    this->addChild(_sprite);

    return true;
}

void PostProcess::draw(cocos2d::Layer* layer)
{
    _renderTexture->beginWithClear(0.0f, 0.0f, 0.0f, 0.0f);
    layer->visit();
    _renderTexture->end();
}

PostProcess* PostProcess::create(const std::string& vertexShaderFile, const std::string& fragmentShaderFile)
{
    auto p = new (std::nothrow) PostProcess();
    if (p && p->init(vertexShaderFile, fragmentShaderFile)) {
        p->autorelease();
        return p;
    }
    delete p;
    return nullptr;
}


The shader needs a unifor which contains the offset for the blur algorithm (u_blurOffset). This is the distance between 2 pixels along the X-axis for the first blur pass and the distance between 2 texels along the Y-axis for the second blur pass.
The strength of the blur effect is setup by the uniform variable (u_blurStrength). Where 0.0 means that blurring is off and 1.0 means maximum blurring. The maximum blur effect is defined by the value of MAX_BLUR_WIDHT, which defines the range of the texels wich are looked on in each direction. So this is more or less the blur radius. If you increase the value, the blur effect will increase, at the disadvantage of a loss of performance. If you decrease the value the blur effect will decrease, but you will winn performance. The relation between performance and the value of MAX_BLUR_WIDHT is thankfully linear (and not quadratic), because of the approximated 2 pass implementation.
I decided to avoid pre calculating gauss weights and passing them to the shader (the gauss weights would depend on MAX_BLUR_WIDHT and u_blurStrength). Instead I used a smooth Hermite interpolation similar to the GLSL function smoothstep:

blur.vert

attribute vec4 a_position;
attribute vec2 a_texCoord;
attribute vec4 a_color;

varying vec4 v_fragmentColor;
varying vec2 v_texCoord;

void main()
{
    gl_Position     = CC_MVPMatrix * a_position;
    v_fragmentColor = a_color;
    v_texCoord      = a_texCoord;
}

blur.frag

varying vec4 v_fragmentColor;
varying vec2 v_texCoord;

uniform vec2  u_blurOffset;
uniform float u_blurStrength;

#define MAX_BLUR_WIDHT 10

void main()
{
    vec4 color   = texture2D(CC_Texture0, v_texCoord);

    float blurWidth = u_blurStrength * float(MAX_BLUR_WIDHT);
    vec4 blurColor  = vec4(color.rgb, 1.0);
    for (int i = 1; i <= MAX_BLUR_WIDHT; ++ i)
    {
        if ( float(i) >= blurWidth )
            break;

        float weight = 1.0 - float(i) / blurWidth;
        weight = weight * weight * (3.0 - 2.0 * weight); // smoothstep

        vec4 sampleColor1 = texture2D(CC_Texture0, v_texCoord + u_blurOffset * float(i));
        vec4 sampleColor2 = texture2D(CC_Texture0, v_texCoord - u_blurOffset * float(i));
        blurColor += vec4(sampleColor1.rgb + sampleColor2.rgb, 2.0) * weight; 
    }

    gl_FragColor = vec4(blurColor.rgb / blurColor.w, color.a);
}


The full C++ and GLSL source code can be found on GitHub (The implementation can be activated by bool HelloWorld::m_blurFast = false).

See the preview:
preview


Separate shader for each blur radius

A high performance version of an gaussian blur algorithm is the solution presented at GPUImage-x. In this implementation a separated blur shader for each blur radius is created. The source code of the full cocos2d-x demo implementation can be found at GitHub.The implementation provides 2 variants, the standard implementation and the optimized implementation, like the implementation in the link, which can be set up by bool GPUimageBlur::m_optimized. The implementation generates a shader for each radius from 0 to int GPUimageBlur::m_maxRadius and a sigma float GPUimageBlur::m_sigma.

See the preview:
preview


Fast limited quality blur

A much more powerful solution, but with obvious very low quality, would be to use the shader presented at Optimizing Gaussian blurs on a mobile GPU. The blurring is not dynamic and can only be switched on or off:

update methode:

// blur pass 1
cocos2d::GLProgramState &blurPass1state = m_blurPass1_PostProcessLayer->ProgramState();
blurPass1state.setUniformVec2( "u_blurOffset", Vec2( blurStrength/visibleSize.width, blurStrength/visibleSize.height ) );
m_gameLayer->setVisible( true );
m_blurPass1_PostProcessLayer->draw(m_gameLayer);
m_gameLayer->setVisible( false );

// blur pass 2
cocos2d::GLProgramState &blurPass2state = m_blurPass2_PostProcessLayer->ProgramState();
blurPass2state.setUniformVec2( "u_blurOffset", Vec2( blurStrength/visibleSize.width, -blurStrength/visibleSize.height ) );
m_blurPass1_PostProcessLayer->setVisible( true );
m_blurPass2_PostProcessLayer->draw(m_blurPass1_PostProcessLayer);
m_blurPass1_PostProcessLayer->setVisible( false );

Vetex shader:

attribute vec4 a_position;
attribute vec2 a_texCoord;

varying vec2 blurCoordinates[5];

uniform vec2  u_blurOffset;

void main()
{
    gl_Position     = CC_MVPMatrix * a_position;

    blurCoordinates[0] = a_texCoord.xy;
    blurCoordinates[1] = a_texCoord.xy + u_blurOffset * 1.407333;
    blurCoordinates[2] = a_texCoord.xy - u_blurOffset * 1.407333;
    blurCoordinates[3] = a_texCoord.xy + u_blurOffset * 3.294215;
    blurCoordinates[4] = a_texCoord.xy - u_blurOffset * 3.294215;
}

Fragment shader

varying vec2 blurCoordinates[5];

uniform float u_blurStrength;

void main()
{
    vec4 sum = vec4(0.0);
    sum += texture2D(CC_Texture0, blurCoordinates[0]) * 0.204164;
    sum += texture2D(CC_Texture0, blurCoordinates[1]) * 0.304005;
    sum += texture2D(CC_Texture0, blurCoordinates[2]) * 0.304005;
    sum += texture2D(CC_Texture0, blurCoordinates[3]) * 0.093913;
    sum += texture2D(CC_Texture0, blurCoordinates[4]) * 0.093913;
    gl_FragColor = sum;
}

See the preview:
enter image description here


The full C++ and GLSL source code can be found on GitHub (The implementation can be switched by bool HelloWorld::m_blurFast).


Progressive solution with two layers (frame buffers)

The idea of this solution is, to do a smooth, progressive, high quality blur of the scene. For this a weak, but fast and high quality blur algorithm is need. A blurry sprite is not deleted, it will be stored for the next refresh of the game engine and is used as source for the next blurring step. This means the weak blurry sprite, again gets blurry and so it is a little bit more blurry than the last one. This is a progressive process which end in a strong and exact blurred sprite.
To set up this process 3 layers are of need, the game layer and 2 blur layers (even and odd).

m_gameLayer = Layer::create();
m_gameLayer->setVisible( false );
this->addChild(m_gameLayer, 0);

// blur layer even
m_blur_PostProcessLayerEven = PostProcess::create("shader/blur_fast2.vert", "shader/blur_fast2.frag");
m_blur_PostProcessLayerEven->setVisible( false );
m_blur_PostProcessLayerEven->setAnchorPoint(Point::ZERO);
m_blur_PostProcessLayerEven->setPosition(Point::ZERO);
this->addChild(m_blur_PostProcessLayerEven, 1);

// blur layer odd
m_blur_PostProcessLayerOdd = PostProcess::create("shader/blur_fast2.vert", "shader/blur_fast2.frag");
m_blur_PostProcessLayerOdd->setVisible( false );
m_blur_PostProcessLayerOdd->setAnchorPoint(Point::ZERO);
m_blur_PostProcessLayerOdd->setPosition(Point::ZERO);
this->addChild(m_blur_PostProcessLayerOdd, 1);

Note, that initially all 3 layers are invisible.

In the update` method one layer is set to state visible. If there is no blurring, then the game layer is visible. Once blurring starts, the game layer is rendered to the even layer, with the blur shader. The game layer becomes invisible and the even layer becomes visible. In the next cycle the even layer is rendered to the odd layer, with the blur shader. The even layer becomes invisible and the odd layer becomes visible. This process continues till blurring is stopped. Meanwhile, the scene becomes blurred stronger and stronger, at high quality. If the original scene has to show again, then the game layer has be set to visible and the even and odd layer have to be set invisible.

update methode:

bool even = (m_blurTick % 2) == 0;
if ( m_blur )
{
    cocos2d::GLProgramState &blurFaststate1 = m_blur_PostProcessLayerEven->ProgramState();
    blurFaststate1.setUniformVec2( "u_texelOffset", Vec2( 1.0f/visibleSize.width, 1.0f/visibleSize.height ) );
    cocos2d::GLProgramState &blurFaststate2 = m_blur_PostProcessLayerOdd->ProgramState();
    blurFaststate2.setUniformVec2( "u_texelOffset", Vec2( -1.0f/visibleSize.width, -1.0f/visibleSize.height ) );

    if ( m_blurTick == 0 )
    {
        m_gameLayer->setVisible( true );
        m_blur_PostProcessLayerEven->draw(m_gameLayer);
    }
    else if ( even )
    {
      m_blur_PostProcessLayerEven->draw(m_blur_PostProcessLayerOdd);
    }
    else
    {
      m_blur_PostProcessLayerOdd->draw(m_blur_PostProcessLayerEven);
    }
    ++m_blurTick;
}
else
  m_blurTick = 0; 

m_gameLayer->setVisible( !m_blur );
m_blur_PostProcessLayerEven->setVisible( m_blur && even );
m_blur_PostProcessLayerOdd->setVisible( m_blur && !even );

The shader is a simple and exact 3*3 blur shader:

Vetex shader:

attribute vec4 a_position;
attribute vec2 a_texCoord;

varying vec2 blurCoordinates[9];

uniform vec2 u_texelOffset;

void main()
{
    gl_Position     = CC_MVPMatrix * a_position;

    blurCoordinates[0] = a_texCoord.st + vec2( 0.0,  0.0) * u_texelOffset.st;
    blurCoordinates[1] = a_texCoord.st + vec2(+1.0,  0.0) * u_texelOffset.st;
    blurCoordinates[2] = a_texCoord.st + vec2(-1.0,  0.0) * u_texelOffset.st;
    blurCoordinates[3] = a_texCoord.st + vec2( 0.0, +1.0) * u_texelOffset.st;
    blurCoordinates[4] = a_texCoord.st + vec2( 0.0, -1.0) * u_texelOffset.st;
    blurCoordinates[5] = a_texCoord.st + vec2(-1.0, -1.0) * u_texelOffset.st;
    blurCoordinates[6] = a_texCoord.st + vec2(+1.0, -1.0) * u_texelOffset.st;
    blurCoordinates[7] = a_texCoord.st + vec2(-1.0, +1.0) * u_texelOffset.st;
    blurCoordinates[8] = a_texCoord.st + vec2(+1.0, +1.0) * u_texelOffset.st;
}

Fragment shader:

varying vec2 blurCoordinates[9];

void main()
{
    vec4 sum = vec4(0.0);
    sum += texture2D(CC_Texture0, blurCoordinates[0]) * 4.0;
    sum += texture2D(CC_Texture0, blurCoordinates[1]) * 2.0;
    sum += texture2D(CC_Texture0, blurCoordinates[2]) * 2.0;
    sum += texture2D(CC_Texture0, blurCoordinates[3]) * 2.0;
    sum += texture2D(CC_Texture0, blurCoordinates[4]) * 2.0;
    sum += texture2D(CC_Texture0, blurCoordinates[5]) * 1.0;
    sum += texture2D(CC_Texture0, blurCoordinates[6]) * 1.0;
    sum += texture2D(CC_Texture0, blurCoordinates[7]) * 1.0;
    sum += texture2D(CC_Texture0, blurCoordinates[8]) * 1.0;
    sum /= 16.0; 
    gl_FragColor = sum;
}


Again, the full C++ and GLSL source code can be found on GitHub.

See the preview:
enter image description here

Gynophore answered 21/10, 2017 at 10:25 Comment(13)
A quick description of why the "approximation" of separating X and Y works (guassian blurs are separable; the only loss is from rounding), and why repeatedly doing small blurs works (guassian blurs kernels "multiply" into larger guassian blur kernels). As an aside, your "Optimized" one appears to have bugs, as that doesn't look like any guassian blur I've seen.Sweeten
@Yakk Do you mean that I improve the answer if I delete the optimized one?Gynophore
@Gynophore No, I mean I suspect the optimized one has an implementation bug. Maybe in your test suite or in your implementation. I mean, it looks like a diagonal-only ring effect, not a guassian blur. In my experience, getting a kernel wrong so that a guassian blur becomes a diagonal blur seems more likely than that website's optimized blur is that bad. The samples shown on the linked page don't have the same kind of artifacts.Sweeten
@Yakk Yes, I know. But the diagonal-only ring effect is because of the vertex shader, which I copied from the page I referred to. I'll search for a better implementation.Gynophore
@Gynophore Did you have texture interpolation on? Yes, the sample positions do seem a bit asymmetric. Oh wait -- the post says it is a two-pass algorithm. Try a second pass with texelHeightOffset passed in negated.Sweeten
@Yakk Oh. Of course you are right. I have to change my answer. This part of the answer is completely unimportant and wrong. So I'll remove it for now.Gynophore
@Gynophore I would disagreee; the optimized version may be the only one "fast enough" for old, low-end devices. Doing a radius-4 blur using the progressive one requires 9*4=36 pixel read/writes, 9*2=18 pixel read/writes using layered one, and 5*2=10 using optimized one. And optimized one claims that calculated pixel samples in sharder are slow, which means it could easily be more than 2x faster than layered one. KAMIKAZE has a video showing the solutions above aren't fast enough on low end hardware.Sweeten
@Impi Possible the error you get is because of the hardware limitation size of varying parameters, which can be passed from one shader stage to the next (from vertex to fragment). This is what GL_MAX_VARYING_VECTORS should handle. See my last commit.Gynophore
Nice. But I suspect that the lower quality on the optimized blur may be because of the lack of hardware interpolation in the sampling function. The math of the optimized blur seems to indicate it doesn't have a quality loss; and I'd expect if we lacked interpolation we'd get artifacts lik your image above. I lack sufficient skill in cocos2d to see where that would go, I'm just basing it off black-box understanding of what linked optimized code is doing. (As an aside, this is rockstar and above and beyond! Great answer Rabbid76)Sweeten
@Gynophore All this time I've investigated about this blur, never stopped.. I even asked some game developers.. got some responses. So, from my point of view result is: creation of live fast blur for iOS and android is like impossible, but mostly because of android devices - low performance.Impi
@Gynophore Actually, you really helped me, best ever of anyone helped me for all my life :D I don't even know how to thank you for all that in the best way..Impi
But what I wanted also to say - it's only possible to create for static sprite image. This is example of blur for static image from very low performance android device. Can I ask you a last request about this question? Like a new blur method just for static sprite? I can just render all screen after my game scene fully freezes at pause dialog and blur background, however - gradually. Well, same as on gif before.Impi
@Gynophore I understand that. Family is the most important thing. If you suddenly have time.. I'll be here always to wait for the answer. My game still under development..Impi

© 2022 - 2024 — McMap. All rights reserved.