I am writing a feed forward net in VC++ using AVX intrinsics. I am invoking this code via PInvoke in C#. My performance when calling a function that calculates a large loop including the function exp() is ~1000ms for a loopsize of 160M. As soon as I call any function that uses AVX intrinsics, and then subsequently use exp(), my performance drops to about ~8000ms for the same operation. Note that the function calculating the exp() is standard C, and the call that uses the AVX intrinsics can be completely unrelated in terms of data being processed. Some kind of flag is getting tripped somewhere at runtime.
In other words,
A(); // 1000ms calculates 160M exp()
B(); // completely unrelated but contains AVX
A(); // 8000ms
or, curiously,
C(); // contains 128 bit SSE SIMD expressions
A(); // 1000ms
I am lost as to what possible mechanism is going on here, or how to pursue a sol'n. I'm on an Intel 2500K cpu\Win 7. Express versions of VS.
Thanks.