August 24, 2009 10:23 AM PDT
Strassen Matrix Multiplication source code
The attached zip archive has the serial source files for Strassen's Algorithm as well as the standard Matrix Multiplication algorithm. Use this code as a starting point for your submission to the first problem in the Phase 2 part of the Intel Threading Challenge Contest 2009.
The code was developed under Windows, so Linux users may need to adjust the include files (or use better ones).
The attached zip archive has the serial source files for Strassen's Algorithm as well as the standard Matrix Multiplication algorithm. Use this code as a starting point for your submission to the first problem in the Phase 2 part of the Intel Threading Challenge Contest 2009.
The code was developed under Windows, so Linux users may need to adjust the include files (or use better ones).
--clay
Can we just thread this code or are we supposed to bring a completely new implementation to the table?
-------- 'Everything is proceeding as I have foreseen.'
In the problem description and the above post I'm reading language such as "This source file should be used as a starting point. Your entry should keep the body of the main function, the matrix generation function, the sequential multiplication code, and the function to compare the two matrix product results."
I'm reading this as: implement Strassen's Algorithm as I see fit; but dont change the code for:
void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C) /* subroutine that uses the simple triple loop to multiply a submatrix from A with a submatrix from B and store the result in a submatrix of C. */ // mf, ml; /* first and last+1 i index */ // nf, nl; /* first and last+1 j index */ // pf, pl; /* first and last+1 k index */ { for (int i = mf; i < ml; i++) for (int j = nf; j < nl; j++) { C[i][j] = 0; for (int k = pf; k < pl; k++) C[i][j] += A[i][k]*B[k][j]; } }
void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C) /* subroutine that uses the simple triple loop to multiply a submatrix from A with a submatrix from B and store the result in a submatrix of C. */ // mf, ml; /* first and last+1 i index */ // nf, nl; /* first and last+1 j index */ // pf, pl; /* first and last+1 k index */ { for (int i = mf; i < ml; i++) for (int j = nf; j < nl; j++) { C[i][j] = 0; for (int k = pf; k < pl; k++) C[i][j] += A[i][k]*B[k][j]; } }
Thanks, 邓辉. This can fix the problem when the parameters are the exact power of 2.
Thanks, 邓辉. This can fix the problem when the parameters are the exact power of 2.
I think there was never a problem with parameters being powers of 2. I ran with m=n=p=32,64,128,256,512,1024.. and there were no problems whatsoever.
The code given by intel does not support all parameters.
I'll explain how. The code says that if m*n*p is < 1024 they will perform normal matrix multiplication that is using 3 for loops. In this case m,n and p can be anything even, odd, multiple of 2.. anything.
If m*n*p is greater than 1024 then they will divides m,n,p by 2 and tries to create 4 small matrices out of the big matrix and then use Strassens Algo. That means if m*n*p is greater than 1024 then odd numbers cannot be used for m,n and p. So m=n=p=1914 worked. If this value was big enough to get divided/2 twice then that would have failed.
So i would say the current code given by intel best works with powers of 2. Anyways for the actual solution you are free to change and support any type of parameter you like!
See below code snippets from StrassenMMult.cpp for more understanding.
#define GRAIN 1024 /* product size below which matmultleaf is used */
void strassenMMult(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C)
In the problem description and the above post I'm reading language such as "This source file should be used as a starting point. Your entry should keep the body of the main function, the matrix generation function, the sequential multiplication code, and the function to compare the two matrix product results."
I'm reading this as: implement Strassen's Algorithm as I see fit; but dont change the code for:
main()
seqMatMult()
CheckResults()
Is this correct?
Yes, this is the intention. The sequential multiplication and checking function should be untouched (or interpretted in the programming language used). The matrix intialization routine should remain intact or at least be shown to generate equivalent random matrices. Changes to the main() routine to support threading are allowed. All of these details should be part of the submitted write-up in order to understand the accompanying source code.
You must use Strassen's Algorithm or variation of that algorithm. You can change the memory allocation model, you can use different data structures as needed for threading or performance, and you can change the order of independent computations. Again, all the changes from the serial code should be documented in the write-up.
Can we just thread this code or are we supposed to bring a completely new implementation to the table?
You can thread the given code, if you wish. If you write equivalent functions for those routines that need to be kept intact, you can use a different programming language. If you need to modify the underlying data structures used for threading or performance, you can rewrite the "fixed" routines to incorporate those changes.
Any modifications that are done would best be described in the submitted write-up to show what was done and explain why it was done. It needn't be too involved; something like "Since I changed the data structure for the matrices to be linked lists, the initialization, sequential multiplication, and results checking function were altered to use the linked list structure" would be sufficient
The attached zip archive has the serial source files for Strassen's Algorithm as well as the standard Matrix Multiplication algorithm. Use this code as a starting point for your submission to the first problem in the Phase 2 part of the Intel Threading Challenge Contest 2009.
The code was developed under Windows, so Linux users may need to adjust the include files (or use better ones).
--clay
Hi, I have some questions: 1) If we win the contest for a particular problem, then along with the $100 cash prize, do we get any paper certificate stating that we have won the contest? 2) Can two or more people collaborate to write the code? 3) How many people normally participate in this contest? Thanks! -A B
Hi, I have some questions: 1) If we win the contest for a particular problem, then along with the $100 cash prize, do we get any paper certificate stating that we have won the contest? 2) Can two or more people collaborate to write the code? 3) How many people normally participate in this contest? Thanks! -A B
querty2009,
Thank you for your interest in this contest. Below are the answers to your questions.
1. There is not paper certificate for winning in this contest however your name will be announced as the winner within the contest pages, on ISN, in the forums and your entry (code and write-up) will be posted for others to see.
2. We do not have any restrictions on more than one person working on a problem but only one person can enter and submit the code and write-up to be eligible to win.
3. We had nearly a 100 different people enter 1 or more of the 6 problems available in phase 1 of the Threading Challenge this year so far. How many participate in each problem varies.
void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C) /* subroutine that uses the simple triple loop to multiply a submatrix from A with a submatrix from B and store the result in a submatrix of C. */ // mf, ml; /* first and last+1 i index */ // nf, nl; /* first and last+1 j index */ // pf, pl; /* first and last+1 k index */ { for (int i = mf; i < ml; i++) for (int j = nf; j < nl; j++) { C[i][j] = 0; for (int k = pf; k < pl; k++) C[i][j] += A[i][k]*B[k][j]; } }
In addition, The CheckResults function should use fabsand not abs.
-------- Andrew Marshall
http://www.planetmarshall.co.uk
Thank you for your interest in this contest. Below are the answers to your questions.
1. There is not paper certificate for winning in this contest however your name will be announced as the winner within the contest pages, on ISN, in the forums and your entry (code and write-up) will be posted for others to see.
2. We do not have any restrictions on more than one person working on a problem but only one person can enter and submit the code and write-up to be eligible to win.
3. We had nearly a 100 different people enter 1 or more of the 6 problems available in phase 1 of the Threading Challenge this year so far. How many participate in each problem varies.
Are you saying that each of M, N, P can be, at most, 1024 so that the maximum size of each matrix will be 2^20 = 1048576 or do you mean that M, N, P can each be 1048576 so that the maximum size of each matrix could be as much as 2^40 (i.e. 8TB for double values)?
void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C) /* subroutine that uses the simple triple loop to multiply a submatrix from A with a submatrix from B and store the result in a submatrix of C. */ // mf, ml; /* first and last+1 i index */ // nf, nl; /* first and last+1 j index */ // pf, pl; /* first and last+1 k index */ { for (int i = mf; i < ml; i++) for (int j = nf; j < nl; j++) { C[i][j] = 0; for (int k = pf; k < pl; k++) C[i][j] += A[i][k]*B[k][j]; } }
I agree 邓辉. Clay, can you confirm this modification, please?
Are you saying that each of M, N, P can be, at most, 1024 so that the maximum size of each matrix will be 2^20 = 1048576 or do you mean that M, N, P can each be 1048576 so that the maximum size of each matrix could be as much as 2^40 (i.e. 8TB for double values)?
D'oh!! <hit self on forehead>
I was going for the maximum limit of something close to one million (1000000). Trying to be too clever, I messed up my math.
So, your latter statement is what was intended. M, N, and P can each be up to 1048576.
Of course, we don't have machines with 8TB of memory (much more needed for the memory-hogging Strassen's Algorithm) and the time to compute the answer for matrices of that size could be prohibitive for judging. More likely would be the case where M=N=4 and P=1048576. (I'm thinking that is 128MB of doubles, but you might want to check that for yourself).
I agree 邓辉. Clay, can you confirm this modification, please?
Yes, that modification to matmultleaf (inserting C[i][j] = 0.0; as first statement in j-loop body) should be done. My original code initialized the C matrices in the main() routine. Somehow I removed that section before posting the serial version for this problem.
My apologies for (again) trying to be as simple as possible.
Yes, that modification to matmultleaf (inserting C[i][j] = 0.0; as first statement in j-loop body) should be done. My original code initialized the C matrices in the main() routine. Somehow I removed that section before posting the serial version for this problem.
My apologies for (again) trying to be as simple as possible.
--clay
Attached is a version of the Strassen code with Akki's corrections. 've also created a Makefile that works on Linux with gcc 4.3.2. I turned on all the compiler warnings I could think of, and the compiler is happy. I ran valgrind on it and there are no memory leaks or uses of uninitialized memory. It seems to work on non-square matrices, but it doesn't work on non-power-of-two matrix sizes. Except for the power-of-two issue, this version seems pretty good.
Looking at the implementation of main(), I see the use of the clock() function. I don't know what that does in Windows, but it is almost certainly the wrong function for Linux. clock() returns the CPU time used. I expect that we're more interested in the elapsed time, since this is a multithreading challenge.
Attached is a version that uses gettimeofday(). I reorganized the code that measures performance so that the clock manipulation code and reporting is done only once (instead of once for MM and once for Strassen). I also changed the code to report the effective megaflops rate (that is, tkae the number of floating point operations for the standard implementation and divide it by the run time.) It also does a little bit more error checking on the arguments (makes sure that they are positive integers ) And since I'm running on Linux, I changed the end-of-line characters from CRLF to LF.
I hope that some of you find these changes useful.