Asked 14/4, 2015 at 21:13 Answered 15/3, 2022 at 11:0

Background

I have to implement Cannon's Algorithm which is a parallel algorithm for multiplying matrices that are square in dimension and the dimension is divisible by the square root of the number of processors. I wrote this code which runs perfectly fine but in life running without crashing is just half the story. It doesn't multiply A x B into a new matrix C, correctly. Here is the code, please help me and steer me to helping me solve what I am doing wrong. As it should be obvious this is homework.

Code

void shift_left(datatype** mat, int s, int row, int n, int amount) {
    datatype* temp_buffer = malloc(sizeof(datatype) * n);
    for(int col = 0; col < n; col++) {
        datatype temp = mat[row][(col+amount)%s];
        temp_buffer[(col+amount)%s] = mat[row][col];
        temp_buffer[col] = temp;
    }
    memcpy(mat[row], temp_buffer, n);
    free(temp_buffer);
}

void shift_up(datatype** mat, int s, int col, int n, int amount) {
    datatype* temp_buffer = malloc(sizeof(datatype) * n);
    for(int row = 0; row < n; row++) {
        datatype temp = mat[(row+amount)%s][col];
        temp_buffer[(row+amount)%s] = mat[row][col];
        temp_buffer[row] = temp;
    }
    memcpy(&mat[0][col], temp_buffer, n);
    free(temp_buffer);
}

void cannon_mul(int p_sqrt,datatype** a, datatype** b, datatype** c, int n) {
    /* 2D matrices and n^2 sized only!*/
    int i = 0, j = 0, k = 0;
    int s = p_sqrt;
    for(i = 0; i < (s-1); i++) {
        shift_left(a, s, i, s-1, i); // Skew matrix a
    }
    for (i = 0; i < (s-1); i++) {
        shift_up(b, s, i, s-1, i); // Skew matrix b
    }
    for(k = 0; k < (s-1); k++) {
        for(i = 0; i < (s-1); i++) {
            for(j = 0; j < (s-1); j++) {
                c[i][j] += a[i][j]*b[i][j];
                shift_left(a, s, i, s-1, 1);
                shift_up(b, s, i, s-1, 1);  
            }                       
        }
    }  
}

What I believe is going wrong?

My hunch is that the shifting is wrong or I am missing an essential portion of the algorithm. My original shift function wasn't using a temp buffer so I figured to use a temp buffer this time but it didn't make a difference. I could show some sample output if it helps but the result is not even remotely close to the desired result. The good news it runs fast :)

Results

1.48 0.14 9.47 8.99 8.06 0.06 6.68 1.04 4.44 7.50
7.26 8.87 2.21 6.27 2.12 7.91 0.65 5.24 0.45 4.94
0.47 4.13 1.87 2.25 6.83 1.52 6.41 9.14 9.22 8.91
7.34 2.70 6.78 2.78 3.51 4.95 5.27 0.85 9.51 6.82
0.28 6.73 0.70 8.88 7.14 9.09 2.36 5.38 6.43 9.00
7.13 6.71 6.92 9.81 5.13 9.35 7.50 5.16 4.68 3.62
1.30 6.26 4.55 4.27 0.51 2.23 3.19 8.75 6.57 9.07
7.49 6.41 1.04 7.78 7.16 2.78 2.25 6.23 9.42 0.32
3.21 3.60 2.04 2.93 4.29 3.88 2.78 8.01 4.57 6.47
7.52 3.77 0.63 5.97 7.32 4.90 9.63 4.90 8.46 1.90

multiplying the matrix above with itself produces with my code this:

2.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
50.81 0.00 0.00 0.00 0.00 87.51 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

The sequential program produces this result:

163.41 212.17 144.32 227.10 251.03 205.60 245.63 277.33 368.99 334.11
257.85 230.82 203.60 314.08 246.02 240.12 228.37 197.90 264.38 228.24
234.13 272.10 110.75 294.84 263.16 242.07 209.54 316.13 339.23 260.51
185.33 215.59 192.26 283.31 270.80 208.38 265.08 291.49 312.24 319.73
313.23 301.95 182.04 348.11 283.20 337.49 266.54 284.57 355.28 281.07
293.25 323.29 281.35 393.92 325.24 313.62 313.48 342.95 418.37 401.91
255.88 238.25 122.17 254.52 243.58 204.49 217.69 273.03 314.89 214.45
219.26 239.07 200.18 309.98 262.21 242.68 190.02 245.85 297.96 308.56
209.03 213.11 126.24 266.48 233.88 199.33 193.28 228.92 277.50 202.27
210.31 264.67 227.59 337.79 261.40 250.35 225.77 295.00 331.92 352.17

It is important to note I am showing relevant portions of my program if you feel I need to show more please do so and I will provide more code. Lastly why is the Homework tag missing?

Edit

One person noted that the buffer was to small and was missing sizeof a foolish mistake and now corrected. I tried and I get the same result, so apparently the problem is nothing to do with that. Hopefully in 2 days I can open a bounty to entice some person to at least give me a clue where the issue is. Its a bug that I can't seem to debug, and my understanding I must admit of this algorithm is zero to null. I am relying on web sources that do very little to increase my understanding.

Edit 2

Tried using calloc to zero allocate the buffer but it doesn't change the results. So strange but thanks for suggesting that; I forget memory doesn't allocate zeroed.

Edit 3

I tried this:

void shift_left(datatype** mat, int s, int row, int n, int amount) {

    datatype* temp_buffer = calloc(n, sizeof(datatype) * n);
    for(int col = 0; col < n; col++) {
        /* temp_buffer[(col+amount)%s] = mat[row][col];
        temp_buffer[col] =  mat[row][(col+amount)%s]; */
        temp_buffer[(col+amount)%s] = 0;
        temp_buffer[col] =  0;
    }
    memcpy(mat[row], temp_buffer, sizeof(datatype) * n);
    //free(temp_buffer); 

}

void shift_up(datatype** mat, int s, int col, int n, int amount) {
    datatype* temp_buffer = calloc(n, sizeof(datatype) * n);
    for(int row = 0; row < n; row++) {
      /* temp_buffer[(row+amount)%s] = mat[row][col];
      temp_buffer[row] = mat[(row+amount)%s][col]; */
      temp_buffer[(row+amount)%s] = 0;
      temp_buffer[row] = 0;
    }
    memcpy(&mat[0][col], temp_buffer, sizeof(datatype) * n);
    free(temp_buffer); 
}

And surprisingly the same result. When it should print all zero since I commented the code and replaced it with zero. My guess memcpy isn't working.

Edit 4

I have confirmed that memcpy is the culprit. But why I do not know I am stumped, if it helps datatype is just an alias for double the professor wrote that for some odd reason as it really doesn't help make the code more readable.

But if I do solve it on my own will be glad to show the solution to my issue.

Idyll answered 14/4, 2015 at 21:13 Comment(17)

Can somebody explain why this going to be closed? This is a unique question and there isn't one in StackOverflow that answers my question. I would appreciate that, whoever just tried to attempt a close. – Idyll 14/4, 2015 at 21:16

Seems like CodeReview might be a better forum for this sort of question? BTW, not my vote to close. – Congregate 14/4, 2015 at 21:29

@MichaelDorgan Thanks for suggesting Code Review; however, this would be off-topic on CR since the code returns incorrect results. Once the code works as intended, it could be a good fit there. – Knorr 14/4, 2015 at 21:32

Well if it does close where do you think would be the best forum for this type of question. It seems the person who voted for it still hasn't come with the explanation. I can gladly phrase the question in a way that fits into SO if that is the issue. – Idyll 14/4, 2015 at 21:33

@DanielLopez "why is the Homework tag missing?" A tag for homework would not be particularly useful, it's too "meta". It doesn't help others understand the problem itself better. – Knorr 14/4, 2015 at 21:35

It doesn't exist Phrancis I wanted to add it but it doesn't exist. I believe they removed it a long time ago. – Idyll 14/4, 2015 at 21:36

Stack Overflow is definitely the best place for helping with bugs/issues with the code. Perhaps some expected vs. actual (wrong) results would help. – Knorr 14/4, 2015 at 21:36

I added the wrong result with the expected results. – Idyll 14/4, 2015 at 21:39

This is how to ask a question. It's quite refreshing given the number of bad questions we see here. – Coburn 14/4, 2015 at 22:6

Filling the malloc'ed data with random data before using it may flesh out an incorrectly fill buffer. – Margiemargin 14/4, 2015 at 22:24

I forgot to mention the buffers are zeroed before being processed. Well not the temp. Good point. – Idyll 14/4, 2015 at 22:25

1) I talking about datatype* temp_buffer = malloc(sizeof(datatype) * n);, this allocated data should work regardless of what data is initially in it. Forcing various patterns into it first and see if you get consistent results. If not, code is not using these allocated buffers correctly. 2) Confident about p_sqrt has the correct value? – Margiemargin 14/4, 2015 at 22:27

If you doubt the professor it wouldn't be hard to provide your own output print function. The input might be trickier but in life you have to better your betters, if you can. – Dominicdominica 14/4, 2015 at 22:29

p_sqrt is valid I just printed it on the screen. – Idyll 14/4, 2015 at 22:30

Could it be that the mat isn't being modified once it returns? I seem to get consistent results which might be the issue you raise a good point. The mat may not be modified in the code. – Idyll 14/4, 2015 at 22:33

temp_buffer[(col+amount)%s] = mat[row][col]; temp_buffer[col] = temp; looks curious. Code populates temp_buffer one at a time with temp_buffer[col] = temp , but that overwrites what it did with temp_buffer[(col+amount)%s] = mat[row][col] (which has no bounds.) Hmmmm? GTG. – Margiemargin 14/4, 2015 at 22:34

Hmmm If the size of the matrix was 1, should not the result be a[0][0] * a[0][0]. See if that works. Maybe k < (s-1) --> k < s? Also hinted by @AakashM – Margiemargin 15/4, 2015 at 22:35

Copy is too small.

// memcpy(mat[row], temp_buffer, n);
memcpy(mat[row], temp_buffer, n * sizeof(datatype) );

Same for memcpy(&mat[0][col], temp_buffer, n);

Margiemargin answered 14/4, 2015 at 22:8 Comment(5)

@Daniel Lopez Yes, code is only copying n bytes. Needs to copy n numbers. – Margiemargin 14/4, 2015 at 22:13

It doesn't change the result at all. It seems the error is not that. But thank you for noticing the buffer was to small. I am sure that does help make the code more correct. – Idyll 14/4, 2015 at 22:15

@Daniel Lopez Minor : consider this malloc style: datatype* temp_buffer = malloc(n * sizeof *temp_buffer); Easier to code and maintain. Same for memcpy(mat[row], temp_buffer, n * sizeof *(mat[row])); – Margiemargin 14/4, 2015 at 22:15

@Daniel Lopez As code that fills/prints data is not shown - perhaps there? – Margiemargin 14/4, 2015 at 22:17

The code that is written by the professor? I doubt it, that should be assumed working or else the professor needs to get a new job :) But in all seriousness the code is supposed to implement the cannon algo the rest is done by the professor which means reading the two input matrices and producing the final matrices as a file is done by the professor's code. The program doesn't print matrices I use another program provided by the professor. – Idyll 14/4, 2015 at 22:21

You have what looks to me like an off-by-one error.

Your cannon_mul has these loops:

for(i = 0; i < (s-1); i++)

for (i = 0; i < (s-1); i++)

for(k = 0; k < (s-1); k++) 
    for(i = 0; i < (s-1); i++) 
        for(j = 0; j < (s-1); j++)

However, in what I believe is the relevant source material at http://www.cs.berkeley.edu/~demmel/cs267/lecture11/lecture11.html#link_5, the pseudocode says:

for all (i=0 to s-1)

for all (i=0 to s-1)

for k=0 to s-1
    for all (i=0 to s-1, j=0 to s-1)

which I'm pretty sure means that the loop variables should take the value s-1 in the last iteration of the loop. Your loop variables never take the value s-1.

Try changing your loops to <= (s-1) and see if that helps.

Osborne answered 15/4, 2015 at 7:44 Comment(1)

I think the algorithm is completly wrong. I tried that before you posted your answer. But thank you for the help. Yes that is the link of where I based my code from. – Idyll 15/4, 2015 at 14:36

Here is my solution

void shift_left(int** mat, int i, int n, int amount) {
    int* temp_buffer = (int*)malloc(sizeof(int) * n);
    for (int j = 0; j < n; j++) 
        temp_buffer[j] = mat[i][(j + amount) % n];

    for (int j = 0; j < n; j++)
        mat[i][j] = temp_buffer[j];

    free(temp_buffer);
}

void shift_up(int** mat, int j, int n, int amount) {
    int* temp_buffer = (int*)malloc(sizeof(int) * n);
    for (int i = 0; i < n; i++) 
        temp_buffer[i] = mat[(i + amount) % n][j];

    for (int i = 0; i < n; i++)
        mat[i][j] = temp_buffer[i];

    free(temp_buffer);
}

void cannon_mul(int** mat1, int** mat2, int** rez, int n, int threads) {
    
    int i, j, k, i1, j1;
    for (i = 0; i < n; i++)
        shift_left(mat1, i, n, i);

    for (j = 0; j < n; j++)
        shift_up(mat2, j, n, j);

    for (i = 0; i < n; i++)
        for (j = 0; j < n; j++) 
            rez[i][j] = 0;   

    

    for (k = 0; k < n; k++) {
        for (i = 0; i < n; i++)
            for (j = 0; j < n; j++)
                rez[i][j] += mat1[i][j] * mat2[i][j];

        for (i1 = 0; i1 < n; i1++)
            shift_left(mat1, i1, n, 1);

        for (j1 = 0; j1 < n; j1++)
            shift_up(mat2, j1, n, 1);
    }
      
}

Schafer answered 15/3, 2022 at 11:0 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++