优化

Surface sharing between OPENCL and DirectX

I am working on Decode-OPENCL-Encode pipeline on intel processor. There is a sample code provide by intel for media interop.

If we look at the DecodeOneFrame() function below: 

    // decode next frame and put result to output surface
    mfxStatus CDecodingPipeline::DecodeOneFrame(int Width, int Height, IDirect3DSurface9 *pDstSurface, IDirect3DDevice9* pd3dDevice)

Error when expanding benchmarks to two or more nodes

Hello, I'm running mp_linpack in two nodes(I have grasped how to run in one node :) ), I changed the path to mp_linpack/bin_intel/intel64, and modify the HPL.dat to set P=1, Q=2. Then I created hosts file, whose content is:

mic01:1
mic03:1

mic01 and mic03 is the two nodes where I want to run linpack. Then I use the command in MKL_userguide.pdf:

mpirun --prefix 1 -n 2 -hosts Node1,Node2 \
-genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \
-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl_offload_intel64

How to install Vtune Amplifier 2015 vtsspp on MPSS2.1.6720

I'm using mpss_gold_update_3-2.1.6720, with a uos version of 2.6.38.8-g5f2543d.

I'm now trying to install the vtune amplifier 2015 on this mpss, so I need sep3_15-k1om-2.6.38.8-g5f2543dsmp.ko and vtsspp-k1om-2.6.38.8-g5f2543dsmp.ko. But I can find only sep3_15-k1om-2.6.38.8-g5f2543dsmp.ko and vtsspp-k1om-2.6.38.8-gd50f2a5smp.ko, but not vtsspp-k1om-2.6.38.8-g5f2543dsmp.ko.

Q on memory comparison optimization

Hi All,

I am using AVX/SSE instructions to replace memcmp and our workload includes comparing 64 bytes and occasionally 64 and 128 bytes. I am using following function cmp32 for 32byte comparisons and extend it 2 times for 64 or 4 times for 128 bytes and I am hardly getting 1% performance improvement. Testing was done on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04 x86_64.

I tried replacing following lines
vcmp = _mm256_cmpeq_epi64(xmm0, xmm1);
vmask = _mm256_movemask_epi8(vcmp);

Media SDK Decode vs. FFMPEG Decode

Hi,

I have been doing some performance testing with your encode and decode samples, and comparing results with the same operations done on the same system using FFMPEG.

Encoding is significantly faster - 4x that of FFMPEG, while only using 10% of the CPU.

Decoding - FFMPEG wins. Overall, it takes twice as long to do a decode using the Intel GPU as it does using FFMPEG. Would you have any ideas why that might be? I might guess that  the large amount of YUV data produced is bottlenecked returning from GPU to main data bus.

订阅 优化