General Code Questions

63 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

I'm unable to use qmic. I am getting the following error message in STDIN:

JOB 5210.cfxcluster PROLOGUE REPORT: could not connect to Intel Xeon Phi coprocessor(s) in job host
Bad owner or permissions on /home/mcdc0998/.ssh/config

It was working fine yesterday. Please let me know if I need to refresh the SSH config file in any way.

Thanks.

Quote:

Pablo G. wrote:

Quote:

Iman Saleh (Intel) wrote:

 

Hi Pablo,

Yes, the profiler shows you where most of the CPU time is spent in your code, so it's still relevant.

Thanks,

Iman

 

 

Uhm, I still have my doubts... can't it be the case that a loop that is a bottleneck in sequential execution, to be beneficial because of parallel execution or vectorization, and because we are using a serial CPU, and a non-vectorizing compiler, this can be misleading?.

Thanks for your help.

Hi Pablo,

Good point and I agree with you if you are depending on compiler's automatic optimization.While that's part of the optimizations, you get the best for this contest (and in most cases) when you apply explicit optimization techniques. VTUNE can still help you figure out where to do so.

Thanks,

Iman

Hi, when I combine pragmas openmp parallel for with collapse, pragma simd, ivdep and specifying a fixed scheduler I obtain weird results.

Do you think this can be a limitation of the scheduler, or on the other hand, a problem with my code ?

Regards.

The challenge deadline is extended!!

This past weekend, we experienced an unexpected technical issues on the cluster, and therefore, are extending the deadline for submissions to November 1, 2015 at 11:59 p.m. GMT.

Good luck!

Iman

Quote:

Iman Saleh (Intel) wrote:

Quote:

Pablo G. wrote:

 

Hi, could someone officially clarify question #1 from jremmons, please?, getEnergy consumes a huge chunk of the execution time and I'd like to know if I can get rid of it.

Thanks.

 

 

Hi Pablo,

The optimized code needs to generate the same output as the serial version. Values may not be the same for some parameters but you still need to calculate them.

Thanks,

Iman

Do we also have to compute energy and criterion at timestep 0?.

I guess when you say we have to compute them ... we have to do it *correctly*, right? ;-P (in an equivalent manner as in the serial version).

Regards.

Quote:

Pablo G. wrote:

Do we also have to compute energy and criterion at timestep 0?.

I guess when you say we have to compute them ... we have to do it *correctly*, right? ;-P (in an equivalent manner as in the serial version).

Regards.

Hi Pablo,

Yes, you need to preserve the same computations as the serial version.

Thanks,

Iman

Quote:

Pablo G. wrote:

Hi, when I combine pragmas openmp parallel for with collapse, pragma simd, ivdep and specifying a fixed scheduler I obtain weird results.

Do you think this can be a limitation of the scheduler, or on the other hand, a problem with my code ?

Regards.

Hi Pablo,

Using ivdep and simd assumes that the loop is safe to vectorize regardless of what the compiler says. You may need to double check your code that this is correct. No indication that this is a scheduler issue. 

Thanks,

Iman

Hi Amr,

I am not aware of any plans yet. We will definitely announce on the site future similar challenges. Stay tuned :)

Thanks,

Iman

Quote:

Iman Saleh (Intel) wrote:

Hi Amr,

I am not aware of any plans yet. We will definitely announce on the site future similar challenges. Stay tuned :)

Thanks,

Iman

Dear Iman;

For this competition, can we see the winner optimized code?  I think it is helpful for us to learn from it.

 

BR

Quote:

Quote:

Iman Saleh (Intel) wrote:

 

Hi Amr,

I am not aware of any plans yet. We will definitely announce on the site future similar challenges. Stay tuned :)

Thanks,

Iman

 

 

Dear Iman;

For this competition, can we see the winner optimized code?  I think it is helpful for us to learn from it.

 

BR

 

Hi BR,

We haven't published yet. I'll let you know when we do.

Thanks,

Iman

Dear Iman;

When I run the original code that you provide me( cell_clustering, huge.cdc) on Xeonphi card or just CPU to get a baseline, every time there will be either a segmentation fault or process killed. I wonder the reason.  Besides how long is the baseline?

Best Regards

 

 

Hi,

The challenge is over. FWIW, the huge data set won't run unoptimized.

Iman

Pages

Leave a Comment

Please sign in to add a comment. Not a member? Join today