Unable to generate vectorized loops

Unable to generate vectorized loops

Hi,

I'm using Parallel Studio with Visual Studio Pro 2008.
Parallel Studio includes the C++ Compiler 11.1.061. It seems to be unable to generate VECTORIZED (!= PARALLELIZED) loops. Compiler says that /Qunroll is not supported ! (it doesn't say it is an unknown command)

If I install the C++ Compiler 11.1.072 over Parallel Studio it does automatically, but it breaks a little bit the integration of Parallel Studio in VS and I could not debug threads anymore.

Global optimizations and SSE instructions are enabled in both cases.
How can I make vectorized loops with Parallel Studio?

10 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di Jennifer J. (Intel)

"11.1.072"? probably you mean 11.0.072.

1. the vectorization - maybe some extra inlining is necessary in order for the vectorization. the inlining control value in Composer is different. So you may want to try out different values for the following options:

INLINING OPTION VALUES:
-Qinline-factor: 100
-Qinline-min-size: 20
-Qinline-max-size: 230
-Qinline-max-total-size: 2000
-Qinline-max-per-routine: disabled
-Qinline-max-per-compile: disabled

Or if the inlining is not a factor here, could you attach the loop?

2. IDE integration: Composer is newer than 11.0. So if you install the 11.0.072 after Composer, the IDE integration will probably not function.
The best is to uninstall 11.0.072, uninstall Composer;
Then install 11.0.072, after that installing the Composer.

Jennifer

It's 11.0.072 of course, not 11.1.072.

I have uninstalled 11.0.072, then uninstalled Composer and reinstalled Composer.
All the Qinline options you mentionned "are not supported" ( I got this msg at the beginning of a build for instance: "remark #10148: option '-Qinline-max-size' not supported)

Another problem is that in the Debug Menu, Intel Debugger Parallel Extension=>Windows=> SSE Registers and OpenMP=>... options are not grayed out, even with /ZI, /debug:parallel, /DEBUG, /Qopenmp, Thread Data Sharing Detection and IPP enabled...

PS: I had the french version of VS Pro 2008, it may be the cause of all this mess, don't know...

Ritratto di Jennifer J. (Intel)

Quoting - qns1086
Another problem is that in the Debug Menu, Intel Debugger Parallel Extension=>Windows=> SSE Registers and OpenMP=>... options are not grayed out, even with /ZI, /debug:parallel, /DEBUG, /Qopenmp, Thread Data Sharing Detection and IPP enabled...

PS: I had the french version of VS Pro 2008, it may be the cause of all this mess, don't know...

Sorry about /Qinline options. They're for pro edition only.

About the SSE register and OpenMP windows issue, please check with "debug" config to see if it works. It works on double-byte Japanese and Chinese.

Let me know how it goes.

Jennifer

Ritratto di Brandon Hewitt (Intel)

Quoting - qns1086
Hi,

I'm using Parallel Studio with Visual Studio Pro 2008.
Parallel Studio includes the C++ Compiler 11.1.061. It seems to be unable to generate VECTORIZED (!= PARALLELIZED) loops. Compiler says that /Qunroll is not supported ! (it doesn't say it is an unknown command)

If I install the C++ Compiler 11.1.072 over Parallel Studio it does automatically, but it breaks a little bit the integration of Parallel Studio in VS and I could not debug threads anymore.

Global optimizations and SSE instructions are enabled in both cases.
How can I make vectorized loops with Parallel Studio?

Can you try adding the option /Qvec-report3, and provide an example loop that doesn't vectorize and the output that explains why it doesn't vectorize? Thanks!

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: "http://software.intel.com/en-us/articles/tools" Software Product Support info: "http://www.intel.com/software/support"


I tried /Qvec-report3 and I didn't have any #remark about vectorized loops.

Example of loops :

for(int i = 0; i < M; i++) 
{


foo (i,...);

for(int j = 0; j < N; j++)

{									

Output[i][j] = Input[i][j] + (const_value *    myInstanceClass.getanothervalue(j));
}

}

//////////////////////////////////////
//a trivial one
for(int i = 0; i < N; i++)
{
B[i] = A[i];
}

I want the inner loop to be(partially) vectorized and the outer one to be parallelized.
=>No vectorization even with the "trivial" loop :/. It worked perfectly with C ++ Compiler 11.0.072...
Code is in a .h not a .cpp file coz I use templates.

Jennifer, no change in debug mode options are still grayed :/

Hmm, so the included version of the C++ Compiler of Parallel Studio is not a Pro version ? The Compiler v11.0.072 I have is a pro version. But I supposed a "basic" version of the Compiler is still able to vectorize loops...

Ritratto di Jennifer J. (Intel)

Did you add /Qipo or /GL? If so, you should add "/qvec-report3" to the linker (lower case "q").

About the grey icon issue, I'd like to confirm your Composer version. Open VS2005, open Help->About, check the Intel Parallel Composer version there. It should be something like:
Intel Parallel Composer (Package ID: composer.061)

Also, do you have same issue with the NQ-sample under the sample dir?

Jennifer

Quoting - Jennifer Jiang (Intel)

Did you add /Qipo or /GL? If so, you should add "/qvec-report3" to the linker (lower case "q").

About the grey icon issue, I'd like to confirm your Composer version. Open VS2005, open Help->About, check the Intel Parallel Composer version there. It should be something like:
Intel Parallel Composer (Package ID: composer.061)

Also, do you have same issue with the NQ-sample under the sample dir?

Jennifer

/qvec-report3 is not recognized.
The Package ID is correct: 061

NQ sample: you mean NQueens? I got the same problem with this sample + the Inspector seems not to find threading errors (nrOfSolutions unprotected and no threading errors found)

Amplifier and Thread Data Sharing both work fine.

Ritratto di Jennifer J. (Intel)

You didn't tell me if you have "/Qipo" used. This will affect the report.

Also could you let me know the data types? I'd like to try it as well.

for(int i = 0; i < M; i++)  
{ 
   foo (i,...); // is this small func?
   for(int j = 0; j < N; j++) 
   {                                   
       Output[i][j] = Input[i][j] + (const_value *    myInstanceClass.getanothervalue(j)); 
   } 
} 
// need to know the data type of "Output", "Input", "const_value", "myInstanceClass);

Quoting - Jennifer Jiang (Intel)

You didn't tell me if you have "/Qipo" used. This will affect the report.

Also could you let me know the data types? I'd like to try it as well.

for(int i = 0; i < M; i++)  
{ 
   foo (i,...); // is this small func?
   for(int j = 0; j < N; j++) 
   {                                   
       Output[i][j] = Input[i][j] + (const_value *    myInstanceClass.getanothervalue(j)); 
   } 
} 
// need to know the data type of "Output", "Input", "const_value", "myInstanceClass);

Whatever with /Qipo or not, it doesn't change anything.

- foo(i, input[i], arg3,...) is a large function whichshould be able to be executed in parallel ( for each iteration of "i" each argument is different, no deadlocks possible)
- Input and Output are float[M][N] ;
- const_value is a constfloat
- myInstanceClass is a statically allocated instance of a concreteclass.

Accedere per lasciare un commento.