Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 24, 2009 10:23 AM PDT
Strassen Matrix Multiplication source code
The attached zip archive has the serial source files for Strassen's Algorithm as well as the standard Matrix Multiplication algorithm.  Use this code as a starting point for your submission to the first problem in the Phase 2 part of the Intel Threading Challenge Contest 2009.

The code was developed under Windows, so Linux users may need to adjust the include files (or use better ones).

--clay
 Attachments 
Tudor
Total Points:
945
Status Points:
445
Brown Belt
August 24, 2009 12:58 PM PDT
Rate
 
#1
The attached zip archive has the serial source files for Strassen's Algorithm as well as the standard Matrix Multiplication algorithm.  Use this code as a starting point for your submission to the first problem in the Phase 2 part of the Intel Threading Challenge Contest 2009.

The code was developed under Windows, so Linux users may need to adjust the include files (or use better ones).

--clay

Can we just thread this code or are we supposed to bring a completely new implementation to the table?

--------
'Everything is proceeding as I have foreseen.'


mrmarshallman
August 24, 2009 9:38 PM PDT
Rate
 
#2 Reply to #1
In the problem description and the above post I'm reading language such as "This source file should be used as a starting point.  Your entry should keep the body of the main function, the matrix generation function, the sequential multiplication code, and the function to compare the two matrix product results."

I'm reading this as: implement Strassen's Algorithm as I see fit; but dont change the code for:
  • main()
  • seqMatMult()
  • CheckResults()
Is this correct?

邓辉
Total Points:
3,850
Status Points:
3,350
Brown Belt
August 25, 2009 4:43 AM PDT
Rate
 
#3 Reply to #2

M, N, P must be 2 ^ n?  Their range is how much? thanks!
--------
写字楼里写字间,写字间里程序员
程序人员写程序,又拿程序换酒钱
酒醒只在网上坐,酒醉还来网下眠
酒醉酒醒日复日,网上网下年复年


jfguo
Total Points:
80
Status Points:
30
Green Belt
August 25, 2009 8:48 AM PDT
Rate
 
#4

It seems the seqMatMult() and matmultS() give different result.
When I ran with the argement "11 11 11", the ouput is,


Execute Standard matmult

Standard matrix function done in    0.00 secs


Strassen matrix function done in    0.00 secs


-0.810000  13.140000
Error in matmultS


jfguo
Total Points:
80
Status Points:
30
Green Belt
August 25, 2009 9:18 AM PDT
Rate
 
#5 Reply to #4

Even when M, N and P are all exact power of 2, the problem still exists.
For eaxmple for "32 32 32", the output is,


Execute Standard matmult

Standard matrix function done in    0.00 secs


Strassen matrix function done in    0.00 secs


-28.700000  -17.900000
Error in matmultS


邓辉
Total Points:
3,850
Status Points:
3,350
Brown Belt
August 25, 2009 9:38 AM PDT
Rate
 
#6 Reply to #5

void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C)   
/* 
  subroutine that uses the simple triple loop to multiply 
  a submatrix from A with a submatrix from B and store the 
  result in a submatrix of C.  
*/ 
// mf, ml; /* first and last+1 i index */ 
// nf, nl; /* first and last+1 j index */ 
// pf, pl; /* first and last+1 k index */ 
{     
 for (int i = mf; i < ml; i++)  
  for (int j = nf; j < nl; j++)
  {
   C[i][j] = 0;
   for (int k = pf; k < pl; k++)  
    C[i][j] += A[i][k]*B[k][j];
  }
}   
 


--------
写字楼里写字间,写字间里程序员
程序人员写程序,又拿程序换酒钱
酒醒只在网上坐,酒醉还来网下眠
酒醉酒醒日复日,网上网下年复年


jfguo
Total Points:
80
Status Points:
30
Green Belt
August 26, 2009 8:55 AM PDT
Rate
 
#7 Reply to #6
Quoting - 邓辉

void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C)   
/* 
subroutine that uses the simple triple loop to multiply 
a submatrix from A with a submatrix from B and store the 
result in a submatrix of C.  
*/ 
// mf, ml; /* first and last+1 i index */ 
// nf, nl; /* first and last+1 j index */ 
// pf, pl; /* first and last+1 k index */ 
{     
for (int i = mf; i < ml; i++)  
for (int j = nf; j < nl; j++)
{
C[i][j] = 0;
for (int k = pf; k < pl; k++)  
C[i][j] += A[i][k]*B[k][j];
}
}



Thanks, 邓辉.
This can fix the problem when the parameters are the exact power of 2.



tushar.dedhia
Total Points:
20
Registered User
August 27, 2009 9:04 AM PDT
Rate
 
#8 Reply to #7
Quoting - jfguo


Thanks, 邓辉.
This can fix the problem when the parameters are the exact power of 2.


I think there was never a problem with parameters being powers of 2. I ran with m=n=p=32,64,128,256,512,1024.. and there were no problems whatsoever.

The code given by intel does not support all parameters.

I'll explain how. The code says that if m*n*p is < 1024 they will perform normal matrix multiplication that is using 3 for loops. In this case m,n and p can be anything even, odd, multiple of 2.. anything.

If m*n*p is greater than 1024 then they will divides m,n,p by 2 and tries to create 4 small matrices out of the big matrix and then use Strassens Algo. That means if m*n*p is greater than 1024 then odd numbers cannot be used for m,n and p. So m=n=p=1914 worked. If this value was big enough to get divided/2 twice then that would have failed.

So i would say the current code given by intel best works with powers of 2. Anyways for the actual solution you are free to change and support any type of parameter you like!

See below code snippets from StrassenMMult.cpp for more understanding.

#define GRAIN 1024 /* product size below which matmultleaf is used */

void strassenMMult(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C)

{

if ((ml-mf)*(nl-nf)*(pl-pf) < GRAIN)

matmultleaf(mf, ml, nf, nl, pf, pl, A, B, C);

else {

int m2 = (ml-mf)/2;

int n2 = (nl-nf)/2;

int p2 = (pl-pf)/2;
...
...
...



Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 27, 2009 10:13 AM PDT
Rate
 
#9 Reply to #2
Quoting - mrmarshallman
In the problem description and the above post I'm reading language such as "This source file should be used as a starting point.  Your entry should keep the body of the main function, the matrix generation function, the sequential multiplication code, and the function to compare the two matrix product results."

I'm reading this as: implement Strassen's Algorithm as I see fit; but dont change the code for:
  • main()
  • seqMatMult()
  • CheckResults()
Is this correct?

Yes, this is the intention.  The sequential multiplication and checking function should be untouched (or interpretted in the programming language used).  The matrix intialization routine should remain intact or at least be shown to generate equivalent random matrices.  Changes to the main() routine to support threading are allowed.  All of these details should be part of the submitted write-up in order to understand the accompanying source code.

You must use Strassen's Algorithm or variation of that algorithm.  You can change the memory allocation model, you can use different data structures as needed for threading or performance, and you can change the order of independent computations.  Again, all the changes from the serial code should be documented in the write-up.

--clay

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 27, 2009 10:22 AM PDT
Rate
 
#10 Reply to #3
Quoting - 邓辉

M, N, P must be 2 ^ n?  Their range is how much? thanks!

Let's set the upper limit at 2^10 = 1048576. 

--clay

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 27, 2009 10:35 AM PDT
Rate
 
#11 Reply to #1
Quoting - Tudor

Can we just thread this code or are we supposed to bring a completely new implementation to the table?

You can thread the given code, if you wish.  If you write equivalent functions for those routines that need to be kept intact, you can use a different programming language.  If you need to modify the underlying data structures used for threading or performance, you can rewrite the "fixed" routines to incorporate those changes.

Any modifications that are done would best be described in the submitted write-up to show what was done and explain why it was done.  It needn't be too involved; something like "Since I changed the data structure for the matrices to be linked lists, the initialization, sequential multiplication, and results checking function were altered to use the linked list structure" would be sufficient

--clay

qwerty2009
Total Points:
40
Registered User
August 27, 2009 3:07 PM PDT
Rate
 
#12
The attached zip archive has the serial source files for Strassen's Algorithm as well as the standard Matrix Multiplication algorithm.  Use this code as a starting point for your submission to the first problem in the Phase 2 part of the Intel Threading Challenge Contest 2009.

The code was developed under Windows, so Linux users may need to adjust the include files (or use better ones).

--clay

Hi,
I have some questions:
1) If we win the contest for a particular problem, then along with the $100 cash prize, do we get any paper certificate stating that we have won the contest?
2) Can two or more people collaborate to write the code?
3) How many people normally participate in this contest?
Thanks!
-A B




Diana Lanius (Intel)
Total Points:
2,557
Status Points:
2,557
Community Manager
August 28, 2009 8:43 AM PDT
Rate
 
#13 Reply to #12
Quoting - qwerty2009

Hi,
I have some questions:
1) If we win the contest for a particular problem, then along with the $100 cash prize, do we get any paper certificate stating that we have won the contest?
2) Can two or more people collaborate to write the code?
3) How many people normally participate in this contest?
Thanks!
-A B



querty2009,

Thank you for your interest in this contest.  Below are the answers to your questions.

1.  There is not paper certificate for winning in this contest however your name will be announced as the winner within the contest pages, on ISN, in the forums and your entry (code and write-up) will be posted for others to see.

2.  We do not have any restrictions on more than one person working on a problem but only one person can enter and submit the code and write-up to be eligible to win.

3.  We had nearly a 100 different people enter 1 or more of the 6 problems available in phase 1 of the Threading Challenge this year so far.  How many participate in each problem varies.

Thanks,

Diana

planetmarshall
Total Points:
440
Status Points:
390
Green Belt
August 28, 2009 8:50 AM PDT
Rate
 
#14 Reply to #6
Quoting - 邓辉

void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C)   
/* 
subroutine that uses the simple triple loop to multiply 
a submatrix from A with a submatrix from B and store the 
result in a submatrix of C.  
*/ 
// mf, ml; /* first and last+1 i index */ 
// nf, nl; /* first and last+1 j index */ 
// pf, pl; /* first and last+1 k index */ 
{     
for (int i = mf; i < ml; i++)  
for (int j = nf; j < nl; j++)
{
C[i][j] = 0;
for (int k = pf; k < pl; k++)  
C[i][j] += A[i][k]*B[k][j];
}
}


In addition, The CheckResults function should use fabs and not abs.

--------
Andrew Marshall
http://www.planetmarshall.co.uk


qwerty2009
Total Points:
40
Registered User
August 29, 2009 6:16 PM PDT
Rate
 
#15 Reply to #13

querty2009,

Thank you for your interest in this contest.  Below are the answers to your questions.

1.  There is not paper certificate for winning in this contest however your name will be announced as the winner within the contest pages, on ISN, in the forums and your entry (code and write-up) will be posted for others to see.

2.  We do not have any restrictions on more than one person working on a problem but only one person can enter and submit the code and write-up to be eligible to win.

3.  We had nearly a 100 different people enter 1 or more of the 6 problems available in phase 1 of the Threading Challenge this year so far.  How many participate in each problem varies.

Thanks,

Diana



akki
Total Points:
2,720
Status Points:
2,220
Brown Belt
August 30, 2009 12:14 AM PDT
Rate
 
#16 Reply to #10

Let's set the upper limit at 2^10 = 1048576. 

--clay

Clay, I'm a little confused here.

2^10 = 1024

Are you saying that each of M, N, P can be, at most, 1024 so that the maximum size of each matrix will be 2^20 = 1048576 or do you mean that M, N, P can each be 1048576 so that the maximum size of each matrix could be as much as 2^40 (i.e. 8TB for double values)?


akki
Total Points:
2,720
Status Points:
2,220
Brown Belt
August 30, 2009 12:18 AM PDT
Rate
 
#17 Reply to #6
Quoting - 邓辉

void matmultleaf(int mf, int ml, int nf, int nl, int pf, int pl, double **A, double **B, double **C)   
/* 
subroutine that uses the simple triple loop to multiply 
a submatrix from A with a submatrix from B and store the 
result in a submatrix of C.  
*/ 
// mf, ml; /* first and last+1 i index */ 
// nf, nl; /* first and last+1 j index */ 
// pf, pl; /* first and last+1 k index */ 
{     
for (int i = mf; i < ml; i++)  
for (int j = nf; j < nl; j++)
{
C[i][j] = 0;
for (int k = pf; k < pl; k++)  
C[i][j] += A[i][k]*B[k][j];
}
}


I agree 邓辉. Clay, can you confirm this modification, please?


akki
Total Points:
2,720
Status Points:
2,220
Brown Belt
August 30, 2009 12:22 AM PDT
Rate
 
#18 Reply to #14
Quoting - planetmarshall

In addition, The CheckResults function should use fabs and not abs.

I agree, planetmarshall. Clay, can you confirm this modification, please?


邓辉
Total Points:
3,850
Status Points:
3,350
Brown Belt
August 30, 2009 8:56 AM PDT
Rate
 
#19 Reply to #18
I tested M = 2048 N = 2048 P = 2048 the original algorithm is almost slow enough to not accept my test data to estimate this value can not be
--------
写字楼里写字间,写字间里程序员
程序人员写程序,又拿程序换酒钱
酒醒只在网上坐,酒醉还来网下眠
酒醉酒醒日复日,网上网下年复年


nickbes
Total Points:
1,030
Status Points:
530
Brown Belt
August 31, 2009 6:07 AM PDT
Rate
 
#20 Reply to #14
Quoting - planetmarshall

In addition, The CheckResults function should use fabs and not abs.

Or you can include the math.h and mantain abs.

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 31, 2009 9:22 AM PDT
Rate
 
#21 Reply to #16
Quoting - akki

Clay, I'm a little confused here.

2^10 = 1024

Are you saying that each of M, N, P can be, at most, 1024 so that the maximum size of each matrix will be 2^20 = 1048576 or do you mean that M, N, P can each be 1048576 so that the maximum size of each matrix could be as much as 2^40 (i.e. 8TB for double values)?

D'oh!! <hit self on forehead>

I was going for the maximum limit of something close to one million (1000000).  Trying to be too clever, I messed up my math.

So, your latter statement is what was intended.  M, N, and P can each be up to 1048576. 

Of course, we don't have machines with 8TB of memory (much more needed for the memory-hogging Strassen's Algorithm) and the time to compute the answer for matrices of that size could be prohibitive for judging.  More likely would be the case where M=N=4 and P=1048576.  (I'm thinking that is 128MB of doubles, but you might want to check that for yourself).

--clay

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 31, 2009 9:39 AM PDT
Rate
 
#22 Reply to #14
Quoting - planetmarshall

In addition, The CheckResults function should use fabs and not abs.

Yes, use of fabs is fine and probably the better choice.  (Sometimes my simple codes are too simple.)

--clay

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
August 31, 2009 9:45 AM PDT
Rate
 
#23 Reply to #17
Quoting - akki

I agree 邓辉. Clay, can you confirm this modification, please?

Yes, that modification to matmultleaf (inserting C[i][j] = 0.0; as first statement in j-loop body) should be done.   My original code initialized the C matrices in the main() routine.  Somehow I removed that section before posting the serial version for this problem. 

My apologies for (again) trying to be as simple as possible.

--clay

BradleyKuszmaul
September 1, 2009 7:39 AM PDT
Rate
 
#24 Reply to #23

Yes, that modification to matmultleaf (inserting C[i][j] = 0.0; as first statement in j-loop body) should be done.   My original code initialized the C matrices in the main() routine.  Somehow I removed that section before posting the serial version for this problem. 

My apologies for (again) trying to be as simple as possible.

--clay

Attached is a version of the Strassen code with Akki's corrections.  've also created a Makefile that works on Linux with gcc 4.3.2.  I turned on all the compiler warnings I could think of, and the compiler is happy.  I ran valgrind on it and there are no memory leaks or uses of uninitialized memory.  It seems to work on non-square matrices, but it doesn't work on non-power-of-two matrix sizes.  Except for the power-of-two issue, this version seems pretty good.


 Attachments 
BradleyKuszmaul
September 1, 2009 8:46 AM PDT
Rate
 
#25 Reply to #24
Looking at the implementation of main(), I see the use of the clock() function.  I don't know what that does in Windows, but it is almost certainly the wrong function for Linux.  clock() returns the CPU time used.   I expect that we're more interested in the elapsed time, since this is a multithreading challenge. 

Attached is a version that uses gettimeofday().   I reorganized the code that measures performance so that the clock manipulation code and reporting is done only once (instead of once for MM and once for Strassen).  I also changed the code to report the effective megaflops rate (that is, tkae the number of floating point operations for the standard implementation and divide it by the run time.)
It also does a little bit more error checking on the arguments (makes sure that they are positive integers )   And since I'm running on Linux, I changed the end-of-line characters from CRLF to LF.

I hope that some of you find these changes useful.


 Attachments 


Intel Software Network Forums Statistics

8479 users have contributed to 31611 threads and 100667 posts to date.
In the past 24 hours, we have 31 new thread(s) 108 new posts(s), and 167 new user(s).

In the past 3 days, the most popular thread for everyone has been gemm(A,A,A) like possible? The most posts were made to gemm(A,A,A) like possible? The post with the most views is Dear Steve, excuse me for a d

Please welcome our newest member zhpn