execution problem ( offload / windows host / mkl / phi )

execution problem ( offload / windows host / mkl / phi )

 

hi,

i have an excution problem see ecr5.jpg joined

i put some informations joined but it seems that is is a library link problem

i used VS2012 + latest composer XE 2013 SP1  updates

host is windows X64 enterprise SP1 ,  i used latest MPSS

 

any idea?

 

thanks a lot

 

bertrand

AttachmentSize
Downloadimage/jpeg ecr1.jpg115.67 KB
Downloadimage/jpeg ecr2.jpg139.45 KB
Downloadimage/jpeg ecr3jpg.jpg127.68 KB
Downloadimage/jpeg ecr4jpg.jpg100.85 KB
Downloadimage/jpeg ecr5.jpg122.96 KB
Downloadtext/plain env_0.txt2.79 KB
Downloadimage/jpeg mklAdvisor.jpg196.13 KB
Downloadtext/x-csrc dftsample.c10.5 KB
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Bertrand,

The sample you are using can be found in MKL example section shipped with the Intel(R) C Compiler: C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\examples\example_mic\mic_offload\dftc

The simplest way to compile and run mkl sample is using makefile shipped with the sample.

To run the sample code, start the console Intel 64 Visual Studio 2012 mode (Start -> Intel Parallel Studio XE 2013 -> Parallel Studio XE 2013 with VS2012 Command Prompt -> Parallel Studio XE with Intel Compiler XE -> Intel 64 Visual Studio 2012 mode). This console sets all necessary environment variables. To build and run the sample:

% nmake libintel64  

I don't know if you can use Visual Studio IDE to run mkl offload programs, but if you want to use Visual Studio IDE, please let me know. I will have to ask Windows developers here.

 

hi loc

 

thanks,

 

Hello,

I think I have finally made ​​progress on the subject

I dropped the VS2012 which I think was causing me problems.

So I called the intel compiler directly (icl under windows)

After a "iclvars Intel64"

icl / Qmkl / QSTD = c99 dftsample.c

I get a large 12Mbytes exe which should also contain the code offload

I get the trace attached below or it comes right from the offload (OFFLOAD_REPORT = 2) and
"MIC0 using Intel (R) Math Kernel Library Version 11.1.3 Product Build 20140416 for Intel (R) 64 architecture applications"

The final are ok.

So for the rest I will not use Visual Studio.

cordially

 

bertrand

 

 

 

 

see the trace :

 

 

 

icl /Qmkl  /Qstd=c99  dftsample.c

C:\XeonPhi\dftsample\dftsample\dftsample>icl /Qmkl  /Qstd=c99  dftsample.c
dftsample.c
Microsoft (R) Incremental Linker Version 11.00.50727.1
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:dftsample.exe
"-libpath:C:/Program Files (x86)/Intel/Composer XE 2013 SP1/mkl\lib\intel64"
dftsample.obj
C:\Users\ADMINI~1\AppData\Local\Temp\33564.obj
-defaultlib:liboffload
-defaultlib:libiomp5md
ofldbegin.obj
ofldend.obj
[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            229
[Offload] [MIC 0] [Tag]             Tag 0
MIC0 using Intel(R) Math Kernel Library Version 11.1.3 Product Build 20140416 for Intel(R) 64 architecture applications
[Offload] [HOST]  [Tag 0] [CPU Time]        3.084845(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   0 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.000357(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   4 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            112
[Offload] [MIC 0] [Tag]             Tag 1
[Offload] [HOST]  [Tag 1] [CPU Time]        0.165761(seconds)
[Offload] [MIC 0] [Tag 1] [CPU->MIC Data]   4 (bytes)
[Offload] [MIC 0] [Tag 1] [MIC Time]        0.152473(seconds)
[Offload] [MIC 0] [Tag 1] [MIC->CPU Data]   4 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            148
[Offload] [MIC 0] [Tag]             Tag 2
[Offload] [HOST]  [Tag 2] [CPU Time]        0.005004(seconds)
[Offload] [MIC 0] [Tag 2] [CPU->MIC Data]   8192 (bytes)
[Offload] [MIC 0] [Tag 2] [MIC Time]        0.000142(seconds)
[Offload] [MIC 0] [Tag 2] [MIC->CPU Data]   8192 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            170
[Offload] [MIC 0] [Tag]             Tag 3
[Offload] [HOST]  [Tag 3] [CPU Time]        0.000929(seconds)
[Offload] [MIC 0] [Tag 3] [CPU->MIC Data]   8212 (bytes)
[Offload] [MIC 0] [Tag 3] [MIC Time]        0.000230(seconds)
[Offload] [MIC 0] [Tag 3] [MIC->CPU Data]   8196 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            170
[Offload] [MIC 0] [Tag]             Tag 4
[Offload] [HOST]  [Tag 4] [CPU Time]        0.000624(seconds)
[Offload] [MIC 0] [Tag 4] [CPU->MIC Data]   8212 (bytes)
[Offload] [MIC 0] [Tag 4] [MIC Time]        0.000029(seconds)
[Offload] [MIC 0] [Tag 4] [MIC->CPU Data]   8196 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            170
[Offload] [MIC 0] [Tag]             Tag 5
[Offload] [HOST]  [Tag 5] [CPU Time]        0.000614(seconds)
[Offload] [MIC 0] [Tag 5] [CPU->MIC Data]   8212 (bytes)
[Offload] [MIC 0] [Tag 5] [MIC Time]        0.000028(seconds)
[Offload] [MIC 0] [Tag 5] [MIC->CPU Data]   8196 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            203
[Offload] [MIC 0] [Tag]             Tag 6
[Offload] [HOST]  [Tag 6] [CPU Time]        0.000507(seconds)
[Offload] [MIC 0] [Tag 6] [CPU->MIC Data]   0 (bytes)
[Offload] [MIC 0] [Tag 6] [MIC Time]        0.000042(seconds)
[Offload] [MIC 0] [Tag 6] [MIC->CPU Data]   0 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            210
[Offload] [MIC 0] [Tag]             Tag 7
[Offload] [HOST]  [Tag 7] [CPU Time]        0.001257(seconds)
[Offload] [MIC 0] [Tag 7] [CPU->MIC Data]   8 (bytes)
[Offload] [MIC 0] [Tag 7] [MIC Time]        0.000034(seconds)
[Offload] [MIC 0] [Tag 7] [MIC->CPU Data]   0 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            112
[Offload] [MIC 0] [Tag]             Tag 8
[Offload] [HOST]  [Tag 8] [CPU Time]        0.003690(seconds)
[Offload] [MIC 0] [Tag 8] [CPU->MIC Data]   4 (bytes)
[Offload] [MIC 0] [Tag 8] [MIC Time]        0.001275(seconds)
[Offload] [MIC 0] [Tag 8] [MIC->CPU Data]   4 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            148
[Offload] [MIC 0] [Tag]             Tag 9
[Offload] [HOST]  [Tag 9] [CPU Time]        0.013611(seconds)
[Offload] [MIC 0] [Tag 9] [CPU->MIC Data]   4194304 (bytes)
[Offload] [MIC 0] [Tag 9] [MIC Time]        0.000041(seconds)
[Offload] [MIC 0] [Tag 9] [MIC->CPU Data]   4194304 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            170
[Offload] [MIC 0] [Tag]             Tag 10
[Offload] [HOST]  [Tag 10] [CPU Time]        0.705929(seconds)
[Offload] [MIC 0] [Tag 10] [CPU->MIC Data]   4194324 (bytes)
[Offload] [MIC 0] [Tag 10] [MIC Time]        0.683595(seconds)
[Offload] [MIC 0] [Tag 10] [MIC->CPU Data]   4194308 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            170
[Offload] [MIC 0] [Tag]             Tag 11
[Offload] [HOST]  [Tag 11] [CPU Time]        0.002820(seconds)
[Offload] [MIC 0] [Tag 11] [CPU->MIC Data]   4194324 (bytes)
[Offload] [MIC 0] [Tag 11] [MIC Time]        0.000694(seconds)
[Offload] [MIC 0] [Tag 11] [MIC->CPU Data]   4194308 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            170
[Offload] [MIC 0] [Tag]             Tag 12
[Offload] [HOST]  [Tag 12] [CPU Time]        0.002755(seconds)
[Offload] [MIC 0] [Tag 12] [CPU->MIC Data]   4194324 (bytes)
[Offload] [MIC 0] [Tag 12] [MIC Time]        0.000628(seconds)
[Offload] [MIC 0] [Tag 12] [MIC->CPU Data]   4194308 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            203
[Offload] [MIC 0] [Tag]             Tag 13
[Offload] [HOST]  [Tag 13] [CPU Time]        0.000657(seconds)
[Offload] [MIC 0] [Tag 13] [CPU->MIC Data]   0 (bytes)
[Offload] [MIC 0] [Tag 13] [MIC Time]        0.000081(seconds)
[Offload] [MIC 0] [Tag 13] [MIC->CPU Data]   0 (bytes)

[Offload] [MIC 0] [File]            C:\XeonPhi\dftsample\dftsample\dftsample\dftsample.c
[Offload] [MIC 0] [Line]            210
[Offload] [MIC 0] [Tag]             Tag 14
[Offload] [HOST]  [Tag 14] [CPU Time]        0.002910(seconds)
[Offload] [MIC 0] [Tag 14] [CPU->MIC Data]   8 (bytes)
[Offload] [MIC 0] [Tag 14] [MIC Time]        0.000037(seconds)
[Offload] [MIC 0] [Tag 14] [MIC->CPU Data]   0 (bytes)

Prepare DFTI descriptor for N=1024 on the target
Allocate space for data on the host.
For best performance in offload mode align data on host to 4096 bytes
and do not set any special aligment for data on target - it will be the same automatically
Preallocate buffers on the target
Computation is performed on the target, H=1
 Verifying the result, errthr = 5.96e-006
 Verified, maximum error was 1.91e-008
Computation is performed on the target, H=2
 Verifying the result, errthr = 5.96e-006
 Verified, maximum error was 4.11e-008
Computation is performed on the target, H=3
 Verifying the result, errthr = 5.96e-006
 Verified, maximum error was 2.28e-008
Cleanup resources
Test for N=1024 passed

Prepare DFTI descriptor for N=524288 on the target
Allocate space for data on the host.
For best performance in offload mode align data on host to 4096 bytes
and do not set any special aligment for data on target - it will be the same automatically
Preallocate buffers on the target
Computation is performed on the target, H=6
 Verifying the result, errthr = 1.13e-005
 Verified, maximum error was 1.26e-008
Computation is performed on the target, H=1
 Verifying the result, errthr = 1.13e-005
 Verified, maximum error was 1.14e-008
Computation is performed on the target, H=7
 Verifying the result, errthr = 1.13e-005
 Verified, maximum error was 1.7e-008
Cleanup resources
Test for N=524288 passed

All tests passed

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today