A SSCCE would be very helpful.
That said, I'll try to answer as good as I can:
I have a malloc'd array of integers that I fill with MPI_Recv
MPI_Recv(d.current, n, MPI_INT, 0, TAG_CURRENT_ARRAY, MPI_COMM_WORLD, &status);
How large is that array? How exactly did you malloc()
it? What is n
in this case and how is it related to the malloc()
ed size?
Your observations show that MPI_Recv()
is the reason for this error to occur. In order to make this error occur, MPI_Recv()
has written beyond the end of the malloc()
ed memory area, which it isn't allowed to. This messes up either the linked list used internally by memory management or the size of blocks behind it or both, leading to the said error.
I have tested the value of d.current both before and after the MPI_Recv and it doesn't change (which is correct).
(How should it? You are passing the pointer to the function, not its address. So the pointer cannot change.)
However if I try to free the data I get an error:
* Error in `./bin/obddhe-mpi': free(): invalid next size (fast): 0x0965e988 *
The exact same free before the receive works perfectly.
That is another clue for what I wrote above: the meory behind the block you use has been freed and contains a pointer to the next free area. If you free()
your memory, the library tries to merge the free blocks, the second of those being corrupt, leading to this error.
Imagine you have the following situation:
- Your memory manager prepends each memory block, be it free or allocated, with its length.
- The free blocks have the address of the next free block at their start - this is the linked list I mentioned.
- Your allocated block, prepended with its length, is followed by
- a free block, prepended with its length and containing the address of the next free block of NULL if there is no next free block.
Then, if you write past the end of your memory block, the length and content of the next block will be touched and tampered with.
This doesn't affect anything - till now.
But if you call free()
on your block, this block will be merged with the free block after it.
In order to do so, the following actions must occur:
- Traverse the linked list in order to find adjacent free blocks - which already might lead to this error because the "next" pointer of the 2nd free block is garbage.
- Calculate the size of the bigger free block from the other blocks. If one of these contains garbage, the garbage will be used for calculating the new, bigger free block size and the confusion is perfect.