I have a very frustrating problem. My application runs on a few machines flawlessly for a month. However, there is one machine on which my application crashes nearly every day because of segfault. It always crashes at the same instruction address:
segfault at 7fec33ef36a8 ip 000000000041c16d sp 00007fec50a55c80 error 6 in myapp[400000+f8000]
This address points to memcpy
call.
Below, there is an excerpt #1 from my app:
....
uint32_t size = messageSize - sizeof(uint64_t) + 1;
stack->trcData = (char*)Realloc(stack->trcData,(stack->trcSize + size + sizeof(uint32_t)));
char* buffer = stack->trcData + stack->trcSize;
uint32_t n_size = htonl(size);
memcpy(buffer,&n_size,sizeof(uint32_t)); /* ip 000000000041c16d points here*/
buffer += sizeof(uint32_t);
....
stack->trcSize += size + sizeof(uint32_t);
....
where stack
is a structure:
struct Stack{
char* trcData;
uint32_t trcSize;
/* ... some other elements */
};
and Realloc
is a realloc
wrapper:
#define Realloc(x,y) _Realloc((x),(y),__LINE__)
void* _Realloc(void* ptr,size_t size,int line){
void *tmp = realloc(ptr,size);
if(tmp == NULL){
fprintf(stderr,"R%i: Out of memory: trying to allocate: %lu.\n",line,size);
exit(EXIT_FAILURE);
}
return tmp;
}
messageSize
is of uint32_t
type and its value is always greater than 44 bytes. The code #1 runs in a loop. stack->trcData
is just a buffer which collects some data until some condition is fulfilled. stack->trcData
is always initialized to NULL
. The application is compiled with gcc
with optimization -O3
enabled. When I run it in gdb
, of course it did not crash, as I expected;)
I ran out of ideas why myapp crashes during memcpy
call. Realloc
returns with no error, so I guess it allocated enough space and I can write to this area. Valgrind
valgrind --leak-check=full --track-origins=yes --show-reachable=yes myapp
shows absolutely no invalid reads/writes.
Is it possible that on this particular machine the memory itself is corrupted and it causes these often crashes? Or maybe I corrupt memory somewhere else in myapp, but if this is the case, why it does not crash earlier, when the invalid write is made?
Thanks in advance for any help.
Assembly piece:
41c164: 00
41c165: 48 01 d0 add %rdx,%rax
41c168: 44 89 ea mov %r13d,%edx
41c16b: 0f ca bswap %edx
41c16d: 89 10 mov %edx,(%rax)
41c16f: 0f b6 94 24 47 10 00 movzbl 0x1047(%rsp),%edx
41c176: 00
I'm not sure whether this information is relevant but all the machines, my application runs on successfully, have Intel processors whilst the one causing the problem has AMD.
stack->trcData
initially? How/where ismessageSize
set? Your segfault could be due to a memory management bug in your code, but you don't have enough pieces here to determine that. – Adalbertoadalheidstack->trcData
is set toNULL
initially. A value is assigned tomessageSize
and it's always checked. – Crabstickmemcpy
would crash at the call site. – HyperRealloc()
is called with only two instead of three parameters. Is that the case in the original code as well? – ThornyRealloc
is a macro. I've just edited my post. – Crabstickvoid *_Realloc()
in your code? Thanks to cast you have made, the code would compile without as well. But on some 64bit architectures you would only store the last four bytes of the eight byte address instack->trcData
– Thorny_Realloc
in the code. – Crabstickhtonl
sincestack->trcData
is sent to another application over network eventually. On the other side of communication, the size is decoded by callingntohl
. – CrabstickmessageSize
has a value makingstack->trcSize + size + sizeof(uint32_t) = 0
? Making realloc returning NULL ? (By exemple withmessageSize = 4
andtrcSize = 0
, if my calculation are correct...) – AssertionmessageSize
is always greater than 44 bytes. Its value is always checked beforeRealloc
call. – Crabstickstack->trcData
before and after theRealloc()
call in an instance where it crashes? What is the value of therax
register when it crashes? What are all of the regions of memory mapped into your program when it crashes (cat /proc/<PID>/maps
)? – MylanMALLOC_CHECK_
to 1 (errors go to stderr), 2 (error calls abort()), or 3 (error is printed to stderr and calls abort(). – ErythrocyteMALLOC_CHECK_
gave no extra information. The application crashed exactly the same as before. – Crabstickmemcpy
malfunction. Even better, replace the memcpy line by this statement:*((uint32_t*)buffer) = htonl(size)
– Favrotstack->trcSize + size
exceedsUINT32_MAX
. That meansRealloc
in fact shrinksstc->trcData
. Next, I definebuffer
which now is far behind the allocated area. Hence, when I write to buffer I get segfault. What do you think? – Crabstick