OpenCL Intel Iris Integrated Graphics exits with Abort Trap 6: Timeout Issue
Asked Answered
G

1

4

I am attempting to write a program that executes Monte Carlo simulations using OpenCL. I have run into an issue involving exponentials. When the value of the variable steps becomes large, approximately 20000, the calculation of the exponent fails unexpectedly, and the program quits with "Abort Trap: 6". This seems to be a bizarre error given that steps should not affect memory allocation. I have tried setting normal, alpha, and beta to 0 but this does not resolve the problem however commenting out the exponent and replacing it with the constant 1 seems to fix the problem. I have run my code on an AWS GPU instance and it does not run into any issues. Does anybody have any ideas as to why this might be a problem on an integrated graphics card?

SOLUTION

Execute the kernel multiple times over a smaller ranges to keep kernel execution time under 5 seconds

Code Snippet

#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

static uint MWC64X(uint2 *state) {
  enum { A = 4294883355U };
  uint x = (*state).x, c = (*state).y; 
  uint res = x ^ c;                    
  uint hi = mul_hi(x, A);              
  x = x * A + c;
  c = hi + (x < c);
  *state = (uint2)(x, c); 
  return res;          
}

__kernel void discreteMonteCarloKernel(...) {

  float cumulativeWalk = stockPrice;
  float currentValue = stockPrice;

  ...

  uint n = get_global_id(0);
  uint2 seed2 = (uint2)(n, seed);
  uint random1 = MWC64X(&seed2);
  uint2 seed3 = (uint2)(random1, seed);
  uint random2 = MWC64X(&seed3);

  float alpha = (interestRate - 0.5 * sigma * sigma) * dt;
  float beta = sigma * sqrt(dt);

  float u1;
  float u2;
  float a;
  float b;
  float normal;

  for (int j = 0; j < steps; j++) {

    random1 = MWC64X(&seed2);

    if (random1 == 0) {
      random1 = MWC64X(&seed2);
    }

    random2 = MWC64X(&seed3);

    u1 = (float)random1 / (float)0xffffffff;
    u2 = (float)random2 / (float)0xffffffff;

    a = sqrt(-2 * log(u1));
    b = 2 * M_PI * u2;

    normal = a * sin(b);

    exponent = exp(alpha + beta * normal);
    currentValue = currentValue * exponent;
    cumulativeWalk += currentValue;

    ...
}

Problem Report

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called

Application Specific Signatures:
Graphics hardware encountered an error and was reset: 0x00000813


Thread 0 Crashed:: Dispatch queue: opencl_runtime
0   libsystem_kernel.dylib          0x00007fffb14bad42 __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fffb15a85bf pthread_kill + 90
2   libsystem_c.dylib               0x00007fffb1420420 abort + 129
3   libGPUSupportMercury.dylib      0x00007fffa98e6fbf gpusGenerateCrashLog + 158
4   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x000000010915f13b gpusKillClientExt + 9
5   libGPUSupportMercury.dylib      0x00007fffa98e7983 gpusQueueSubmitDataBuffers + 168
6   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091aa031 IntelCLCommandBuffer::getNew(GLDQueueRec*) + 31
7   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091a9f99 intelSubmitCLCommands(GLDQueueRec*, unsigned int) + 65
8   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091b00a1 CHAL_INTEL::ChalContext::ChalFlush() + 83
9   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091aa2c3 gldFinishQueue + 43
10  com.apple.opencl                0x00007fff9ffeeb37 0x7fff9ffed000 + 6967
11  com.apple.opencl                0x00007fff9ffef000 0x7fff9ffed000 + 8192
12  com.apple.opencl                0x00007fffa000ccca 0x7fff9ffed000 + 130250
13  com.apple.opencl                0x00007fffa001029d 0x7fff9ffed000 + 144029
14  libdispatch.dylib               0x00007fffb13568fc _dispatch_client_callout + 8
15  libdispatch.dylib               0x00007fffb1357536 _dispatch_barrier_sync_f_invoke + 83
16  com.apple.opencl                0x00007fffa001011d 0x7fff9ffed000 + 143645
17  com.apple.opencl                0x00007fffa000bda6 0x7fff9ffed000 + 126374
18  com.apple.opencl                0x00007fffa00011df clEnqueueReadBuffer + 813
19  simplisticComparison            0x0000000107b953cf BinomialMultiplication::execute(int) + 1791
20  simplisticComparison            0x0000000107b9ec7f main + 767
21  libdyld.dylib                   0x00007fffb138c235 start + 1

Thread 1:
0   libsystem_pthread.dylib         0x00007fffb15a50e4 start_wqthread + 0
1   ???                             0x000070000eed6b30 0 + 123145552751408

Thread 2:
0   libsystem_pthread.dylib         0x00007fffb15a50e4 start_wqthread + 0

Thread 3:
0   libsystem_pthread.dylib         0x00007fffb15a50e4 start_wqthread + 0
1   ???                             0x007865646e496d65 0 + 33888479226719589

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000006  rcx: 0x00007fff58074078  rdx: 0x0000000000000000
  rdi: 0x0000000000000307  rsi: 0x0000000000000006  rbp: 0x00007fff580740a0  rsp: 0x00007fff58074078
   r8: 0x0000000000000000   r9: 0x00007fffb140ba50  r10: 0x0000000008000000  r11: 0x0000000000000206
  r12: 0x00007f92de80a7e0  r13: 0x00007f92e0008c00  r14: 0x00007fffba29e3c0  r15: 0x00007f92de801a00
  rip: 0x00007fffb14bad42  rfl: 0x0000000000000206  cr2: 0x00007fffba280128

Logical CPU:     0
Error Code:      0x02000148
Trap Number:     133
Guillerminaguillermo answered 15/5, 2017 at 11:3 Comment(3)
Which/whose OpenCL implementation is this?Exasperate
Looks like Intel HD5000 (from the stack trace)Wye
It's OpenCL 1.2 running on Intel HD5000 (Iris)Guillerminaguillermo
W
2

I have a guess. The driver can crash in two ways:

  1. We reference a bad buffer address. This is probably not your case.
  2. We time out (exceed the TDR). A kernel has a few seconds to complete.

My money is on #2. If the larger value (steps) makes the GPU run too long, the system will kill things.

I am not familiar with the guts of Apple's Intel driver, but typically there is a way to disable the TDR in extreme cases. E.g. see the Windows Documenation on TDRs to get the gist. (Linux drivers have a way to disable this too.)

Normally we want to avoid running things that take super long and it might be a good idea to decompose the workload in some way so that you naturally don't hit this kill switch. E.g. perhaps chunk the "steps" into smaller chunks (pass in and save your state for parts you can't recompute).

Wye answered 16/5, 2017 at 2:17 Comment(1)
I've looked at the memory from every angle so I agree that I have strong doubts that the driver is crashing because of a bad buffer address. I noticed that as the number of monte-carlo simulations decreases the number of steps it is able to execute increases. I decided to put a timer around the kernel execution and the evidence suggests there is a 5 second timeout. Thanks so much for you help!Guillerminaguillermo

© 2022 - 2024 — McMap. All rights reserved.