How can I calculate FLOPS of my application? If I have the total number of executed instructions, I can divide it by the execution time. But, how to count the number of executed instructions?
My question is general and answer for any language is highly appreciated. But I am looking to find a solution for my application which is developed by C/C++ and CUDA.
I do not know whether the tags are proper, please correct me if I am wrong.