Optimization

internal error: bad pointer

my code is this: ---------------------------------------------------------------------------------------------- #include class TEST{ public: double *A; public: TEST(double * _A){ A = _A; #pragma offload_transfer target(mic:0) nocopy(this : alloc_if(1) free_if(0)) in(A:length(2*3) alloc_if(1) free_if(0)) } void run(){ A[1] = 0; // double *B = A; std::cout<

Intel C++ compiler produces a HUGE code

Hi,

I'm spending way too much time comparing MSVC and Intel C++ compiler. My current results are that MSVC generates sometimes better code, sometimes worse, but if it is better, than it's just a little, but if it is worse, the proportions are worse. Since my code is highly dependent on floating point signal processing, I assume the better vectorization and AVX dispatching could be the reason. So I'm keen on switching to Intel compiler.

Is pointer aliasing a problem if the pointers are the same?

Hi,

consider this functions intended for vectorization:

void AddSqr(float* restrict dst, float* restrict src, int cnt)
{
for (int i=0; i<cnt; i++) dst[i] = src[i] * src[i];
};

This would work if the src & dst are not aliased of course. But what if src == dst? Extreme cases such as src == dst+1 are not allowed of course. But if the pointers are the same, there shouldn't be a problem, or am I missing something?

Too much memory was occupied with sample_decode

 

Hi,

Here comes a problem related to the example sample_decode.

When the sample_decode was called to decode 1080p video stream, I found that lots of memory (about 90Mb) would be occured and when 10 streams were decoded parallel the situation would be 1Gb occured, memory consumption seems to be increasing linearity.

Is this situation abnormal ? Am I called the example without dispatching appropriately ? If there's any reasonable plan for multi-decoding in one single process to decrease the memory consumption?  that's what makes me confused.

timing is different each time

Hello ,

I wrote a simple application on cpu and I am using offload pragmas for the pieces I want to run on the coprocessors.

Since I am compiling on cpu and I use offloads , I am using :

<code>export MIC_ENV_PREFIX=MIC
export MIC_OMP_NUM_THREADS=120
</code>

in order to specify the threads number.

My problems:

1) Running the code , shows always 40 threads been used.

2) Running again and again the code without compiling , I am getting different time results.

Error Return of SyncOperation

Hi all,

I encounter a return error MFX_ERR_DEVICE_FAILED from SyncOperation during 264 long time encode.

There is also no bitstream output when error appears.

Below is the platform info.

Is this a known issue on this platform?

 

OS: Windows 7 Ultimate sp1 64bit

Motherboard : Intel DH61AG

BIOS version : ACRSYS - 28

Processor : Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz

RAM : 6 GB

Graphic Driver version : 10.18.10.3945

Thanks,

James

 

Licensing issue with amplxe-cl

I am able to use the VTune GUI without issue, and am also able to collect using amplxe-cl from the command line.  What I cannot do is generate a report from the command line.  amplxe-cl -report hotspots -r <results dir> produces:

amplxe: Error: 0x4000001f (No valid license) -- Cannot connect to the license server. Make sure your license daemon process is running, the used port@host or license file is correct, and the port or hostname in the license file has not been changed.

Subscribe to Optimization