Making C# mandelbrot drawing more efficient
Asked Answered
R

3

9

First of all, I am aware that this question really sounds as if I didn't search, but I did, a lot.

I wrote a small Mandelbrot drawing code for C#, it's basically a windows form with a PictureBox on which I draw the Mandelbrot set.

My problem is, is that it's pretty slow. Without a deep zoom it does a pretty good job and moving around and zooming is pretty smooth, takes less than a second per drawing, but once I start to zoom in a little and get to places which require more calculations it becomes really slow.

On other Mandelbrot applications my computer does really fine on places which work much slower in my application, so I'm guessing there is much I can do to improve the speed.

I did the following things to optimize it:

  • Instead of using the SetPixel GetPixel methods on the bitmap object, I used LockBits method to write directly to memory which made things a lot faster.

  • Instead of using complex number objects (with classes I made myself, not the built-in ones), I emulated complex numbers using 2 variables, re and im. Doing this allowed me to cut down on multiplications because squaring the real part and the imaginary part is something that is done a few time during the calculation, so I just save the square in a variable and reuse the result without the need to recalculate it.

  • I use 4 threads to draw the Mandelbrot, each thread does a different quarter of the image and they all work simultaneously. As I understood, that means my CPU will use 4 of its cores to draw the image.

  • I use the Escape Time Algorithm, which as I understood is the fastest?

Here is my how I move between the pixels and calculate, it's commented out so I hope it's understandable:

        //Pixel by pixel loop:
        for (int r = rRes; r < wTo; r++)
        {
            for (int i = iRes; i < hTo; i++)
            {

                //These calculations are to determine what complex number corresponds to the (r,i) pixel.
                double re = (r - (w/2))*step + zeroX ;
                double im = (i - (h/2))*step - zeroY;

                //Create the Z complex number
                double zRe = 0;
                double zIm = 0;

                //Variables to store the squares of the real and imaginary part.
                double multZre = 0;
                double multZim = 0;

                //Start iterating the with the complex number to determine it's escape time (mandelValue)
                int mandelValue = 0;
                while (multZre + multZim < 4 && mandelValue < iters)
                {
                    /*The new real part equals re(z)^2 - im(z)^2 + re(c), we store it in a temp variable
                    tempRe because we still need re(z) in the next calculation
                        */
                    double tempRe = multZre - multZim + re; 

                    /*The new imaginary part is equal to 2*re(z)*im(z) + im(c)
                        * Instead of multiplying these by 2 I add re(z) to itself and then multiply by im(z), which
                        * means I just do 1 multiplication instead of 2.
                        */
                    zRe += zRe; 
                    zIm = zRe * zIm + im;

                    zRe = tempRe; // We can now put the temp value in its place.

                    // Do the squaring now, they will be used in the next calculation.
                    multZre = zRe * zRe; 
                    multZim = zIm * zIm; 

                    //Increase the mandelValue by one, because the iteration is now finished.
                    mandelValue += 1;
                }


                //After the mandelValue is found, this colors its pixel accordingly (unsafe code, accesses memory directly):
                //(Unimportant for my question, I doubt the problem is with this because my code becomes really slow
                //    as the number of ITERATIONS grow, this only executes more as the number of pixels grow).
                Byte* pos = px + (i * str) + (pixelSize * r);
                byte col = (byte)((1 - ((double)mandelValue / iters)) * 255);
                pos[0] = col;
                pos[1] = col;
                pos[2] = col;

            }
        }

What can I do to improve this? Do you find any obvious optimization problems in my code?

Right now there are 2 ways I know I can improve it:

  1. I need to use a different type for numbers, double is limited with accuracy and I'm sure there are better non-built-in alternative types which are faster (they multiply and add faster) and have more accuracy, I just need someone to point me where I need to look and tell me if it's true.

  2. I can move processing to the GPU. I have no idea how to do this (OpenGL maybe? DirectX? is it even that simple or will I need to learn a lot of stuff?). If someone can send me links to proper tutorials on this subject or tell me in general about it that would be great.

Thanks a lot for reading that far and hope you can help me :)

Royce answered 1/7, 2013 at 14:7 Comment(1)
float is usually faster, although I think it depends what processor you use. float is typically faster than double if you use a gpu.Turquoise
T
5

If you decide to move the processing to the gpu, you can choose from a number of options. Since you are using C#, XNA will allow you to use HLSL. RB Whitaker has the easiest XNA tutorials if you choose this option. Another option is OpenCL. OpenTK comes with a demo program of a julia set fractal. This would be very simple to modify to display the mandlebrot set. See here Just remember to find the GLSL shader that goes with the source code.

About the GPU, examples are no help for me because I have absolutely no idea about this topic, how does it even work and what kind of calculations the GPU can do (or how is it even accessed?)

Different GPU software works differently however ...

Typically a programmer will write a program for the GPU in a shader language such as HLSL, GLSL or OpenCL. The program written in C# will load the shader code and compile it, and then use functions in an API to send a job to the GPU and get the result back afterwards.

Take a look at FX Composer or render monkey if you want some practice with shaders with out having to worry about APIs.

If you are using HLSL, the rendering pipeline looks like this.

pipeline

The vertex shader is responsible for taking points in 3D space and calculating their position in your 2D viewing field. (Not a big concern for you since you are working in 2D)

The pixel shader is responsible for applying shader effects to the pixels after the vertex shader is done.

OpenCL is a different story, its geared towards general purpose GPU computing (ie: not just graphics). Its more powerful and can be used for GPUs, DSPs, and building super computers.

Turquoise answered 13/8, 2013 at 7:18 Comment(0)
Q
3

WRT coding for the GPU, you can look at Cudafy.Net (it does OpenCL too, which is not tied to NVidia) to start getting an understanding of what's going on and perhaps even do everything you need there. I've quickly found it - and my graphics card - unsuitable for my needs, but for the Mandelbrot at the stage you're at, it should be fine.

In brief: You code for the GPU with a flavour of C (Cuda C or OpenCL normally) then push the "kernel" (your compiled C method) to the GPU followed by any source data, and then invoke that "kernel", often with parameters to say what data to use - or perhaps a few parameters to tell it where to place the results in its memory.

When I've been doing fractal rendering myself, I've avoided drawing to a bitmap for the reasons already outlined and deferred the render phase. Besides that, I tend to write massively multithreaded code which is really bad for trying to access a bitmap. Instead, I write to a common store - most recently I've used a MemoryMappedFile (a builtin .Net class) since that gives me pretty decent random access speed and a huge addressable area. I also tend to write my results to a queue and have another thread deal with committing the data to storage; the compute times of each Mandelbrot pixel will be "ragged" - that is to say that they will not always take the same length of time. As a result, your pixel commit could be the bottleneck for very low iteration counts. Farming it out to another thread means your compute threads are never waiting for storage to complete.

I'm currently playing with the Buddhabrot visualisation of the Mandelbrot set, looking at using a GPU to scale out the rendering (since it's taking a very long time with the CPU) and having a huge result-set. I was thinking of targetting an 8 gigapixel image, but I've come to the realisation that I need to diverge from the constraints of pixels, and possibly away from floating point arithmetic due to precision issues. I'm also going to have to buy some new hardware so I can interact with the GPU differently - different compute jobs will finish at different times (as per my iteration count comment earlier) so I can't just fire batches of threads and wait for them all to complete without potentially wasting a lot of time waiting for one particularly high iteration count out of the whole batch.

Another point to make that I hardly ever see being made about the Mandelbrot Set is that it is symmetrical. You might be doing twice as much calculating as you need to.

Quelpart answered 9/8, 2013 at 13:22 Comment(1)
kluge.in-chemnitz.de/documents/fractal/node9.html The answers are out there :) Chaos does not mean Random, there is a high degree of predictability in the Mandelbrot set.Quelpart
S
1

For moving the processing to the GPU, you have lots of excellent examples here:

https://www.shadertoy.com/results?query=mandelbrot

Note that you need an WebGL capable browser to view that link. Works best in Chrome.

I'm no expert on fractals but you seem to have come far already with the optimizations. Going beyond that may make the code much harder to read and maintain so you should ask yourself it is worth it.

One technique I've often observed in other fractal programs is this: While zooming, calculate the fractal at a lower resolution and stretch it to full size during render. Then render at full resolution as soon as zooming stops.

Another suggestion is that when you use multiple threads you should take care that each thread don't read/write memory of other threads because this will cause cache collisions and hurt performance. One good algorithm could be split the work up in scanlines (instead of four quarters like you did now). Create a number of threads, then as long as there as lines left to process, assign a scanline to a thread that is available. Let each thread write the pixel data to a local piece of memory and copy this back to main bitmap after each line (to avoid cache collisions).

Sleety answered 1/7, 2013 at 17:13 Comment(3)
Thanks a lot for taking the time to answer :) About the GPU, examples are no help for me because I have absolutely no idea about this topic, how does it even work and what kind of calculations the GPU can do (or how is it even accessed?). I was hoping for something with basic information first. About the further optimizations, I don't mind about code readability. The low resolution zooming is something I've considered but I hoped maybe there were other things I can do first.Royce
About the cache collisions: I don't really get it, why would there be cache collisions? If I make sure every thread writes exactly to the memory it should would there still be cache collisions? Why are scan-lines a better option (aren't they just another way to split up the image?)Royce
@Omer Scanlines are good because they are one continuous block in memory, which again is good for CPU cache. It is always best to write in continuous memory (this is why it is better to traverse pixels in y/x order instead of x/y). Collisions occur because caches overlap, several threads can have the same 4096 (say) bytes memory in cache so they collide even when they write different parts of that memory.Sleety

© 2022 - 2024 — McMap. All rights reserved.