MPI matrix-vector-multiplication returns sometimes correct sometimes weird values
Asked Answered
G

1

5

I have the following code:

    //Start MPI...
MPI_Init(&argc, &argv);

int size = atoi(argv[1]);
int delta = 10;
int rnk;
int p;
int root = 0;

MPI_Status mystatus;
MPI_Comm_rank(MPI_COMM_WORLD, &rnk);
MPI_Comm_size(MPI_COMM_WORLD, &p);

//Checking compatibility of size and number of processors
assert(size % p == 0);

//Initialize vector...
double *vector = NULL;
vector = malloc(size*sizeof(double));
double *matrix = NULL;

//Rank 0 -----------------------------------
if (rnk == 0) {

    //Initialize vector...
    srand(1);
    for (int i = 0; i < size; i++) {
        vector[i] = rand() % delta + 1;
    }
    printf("Initial vector:");
    print_vector(vector, size);

    //Initialize matrix...
    matrix = malloc(size*size*sizeof(double));
    srand(2);
    for (int i = 0; i < (size*size); i++) {
        matrix[i] = rand() % delta + 1;
    }

    //Print matrix...
    printf("Initial matrix:");
    print_flat_matrix(matrix, size);

}

//Calculating chunk_size...
int chunk_size = size/p;

//Initialize submatrix..
double *submatrix = malloc(size*chunk_size*sizeof(double));

//Initialize result vector...
double *result = malloc(chunk_size*sizeof(double));

//Broadcasting vector...
MPI_Bcast(vector, size, MPI_DOUBLE, root, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);

//Scattering matrix...
MPI_Scatter(matrix, (size*chunk_size), MPI_DOUBLE, submatrix, (size*chunk_size), MPI_DOUBLE, root, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);

printf("I am rank %d and first element of my vector is: %f and of my matrix1: %f/matrix2: %f/matrix3: %f/matrix4: %f\n", rnk, vector[0], submatrix[0], submatrix[1], submatrix[2], submatrix[3]);

//Calculating...
for (int i = 0; i < chunk_size; i++) {
    for (int j = 0; j < size; j++) {
        result[i] += (submatrix[(i*size)+j] * vector[j]);
        printf("Rank %d; current result: %f, ", rnk, result[i]);
    }
    printf("\n");
    printf("Rank %d; result: %f...\n", rnk, result[i]);
}

printf("Rank: %d; first result: %f\n", rnk, result[0]);


double *final_result = NULL;
//Rank 0 -----------------------------------
if (rnk == 0) {
    final_result = malloc(size*sizeof(double));
}

//Gather...
MPI_Gather(result, chunk_size, MPI_DOUBLE, final_result, chunk_size, MPI_DOUBLE, root, MPI_COMM_WORLD);


//Rank 0 -----------------------------------
if (rnk == 0) {
    printf("Final result:\n");
    print_vector(final_result, size);

    free(matrix);
    free(final_result);
}

free(submatrix);
free(result);
free(vector);

MPI_Finalize();

When I run the program it runs to completion without errors, but the values I print at the end aren't always the correct ones. Sometimes I receive the vector with correct output, sometimes it's partially correct and sometimes completely wrong. The wrong values are either wrong by exactly the value of 2 or they are some very long useless sequence of number (Which seems to me there has to be a wrong memory access, but I can't find anything and also weird, because it sometimes works).

I also always choose my size so it'll fit the number of created processes by mpi. mpi creates 4 processes on my machine (tested and checked value), so for testing my algorithm I've always choosen 4 as the value for size. Same problem occurs also with bigger sizes.

Looking forward to your help and inputs guys, thanks in advance!

PS: I am in C

Glabrate answered 2/3, 2015 at 14:7 Comment(4)
Why do you use a Barrier before the Gather?Megasporophyll
Ups...they're not actually there. Removing them changes nothing. I know they do nothing right before the Gather....Glabrate
Do you take into account that the size is not necessary dividable by p and thus that a part of your matrix is not processed?Megasporophyll
Yes I do, for testing purposes I always set the size equal to a value divisible by p. In my case always 4, which is the smallest possibility on my quad core machine (I also tested this value, mpi sets 4 processes for me).Glabrate
E
5

Are you familiar with valgrind? It will draw your attention to the problematic line straight away.

Your trouble appears to be this line:

result[i] += (submatrix[(i*size)+j] * vector[j]);

What was result[] initially? It was pulled off the heap. Sometimes, if you are lucky, it will be zero. Don't count on luck with C.

There are many ways to initialize the array. Here are a few approaches, listed in order by most likely to be optimized:

Allocate result[] with calloc:

double *result = calloc(chunk_size , sizeof(double));

Or, initialize the array with memset:

double *result = malloc(chunk_size *sizeof(double));
memset(result, 0, chunk_size *sizeof(double));

or, one could loop over the array

for (i=0; i < chunk_size; i++)
    result[i] = 0.0
Estes answered 2/3, 2015 at 16:1 Comment(3)
Yeah, thanks so much, this made it work!!! You saved my day! - I already figured with all my printf statements that the line you mentioned must be causing the problem, but I couldn't figure out why, thank you very much!Glabrate
So basically, if I understand you correctly, I could still allocate my memory with malloc() if I then would for example set result[i]=0 before starting to add up? - Question solved for myself, logically it would, because of the difference of malloc and calloc; calloc() zero-initializes the memory it allocates, whereas malloc() doesn't.Glabrate
sure. calloc guarantees the buffer you get back is set to zero (c is for "clear" i guess?). You can certainly set it to zero yourself in a loop, but memset will likely do so faster. In other situations (but probably not this one), you might want to initialize the array to some crazy value so you can tell if code that was supposed to overwrite values in the array actually did so.Estes

© 2022 - 2024 — McMap. All rights reserved.