Checking GPU performance with Intel GPA

Checking GPU performance with Intel GPA

Imagen de Mikhail Smirnov

Hi, I have a challenge to check the perfomance of my application on Intel HD4000, nVidia and AMD GPU architectures. OpenCL code only. Is it possible to provide such tests with Intel GPA or I should use different tools for each platform? Thank you. Regards, Mikhail 

BR, Mikhail
publicaciones de 8 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Neal P (Intel)

Hello,

Could you please provide some more information about what you are trying to do?

Even though you mention "checking GPU performance", it's pretty hard to
separate out the GPU without understanding what the CPU is doing as
well.

So when you mention trying to check the performance of your application, are you trying to determine system performance, detailed frame analysis, or "platform" analysis (that is, operation of your app across all CPU cores)? Intel GPA has three main tools, one each for the different analysis operations you want (Intel GPA System Analyzer, Intel GPA Frame Analyzer, and Intel GPA Platform Analyzer), and it's not clear what kind of performance measurement you want.

You also mention "OpenCL code only" -- can you help me understand what you mean by this? Does this mean you only want to analyze the OpenCL threads, and you don't care about the rest of the system performance? Also, is this for the CPU, GPU, or both? Again, "only GPU" may hide some important aspects of your overall performance, so please help me understand what you want to do here.

Thanks!

Neal

Imagen de Mikhail Smirnov

Hi Neal, Sorry for may be very common request. We have very well paralleled SW (at least we think so) running on CPU. This SW provides real-time image processing and at least 80% of code works with different parts of images, so have now intersections by data and control flow. We use Intel CI-3770 now and TBB for parallel computing. Our current task is to port all SW (or major part) on GPU. And out terget GPU is HD4000 for now (we want to fit ultrabooks platform). But i'm worried about HD4000 performance (it has much less graphic cores then embedded AMD graphics), so I'd like to check the performance of other GPU also (i.e. low-cost nVidia or AMD cards).  I plan to hva as much portable OpenCL code as possible and I'm interested if it possiblr to run and measure it's performance under all GPUs in Intel GPA or it is better to use separate tools for each platform? Platform will be CI7-3770 + HD4000, or CI7-3770 + external card to be tested. Hope this answers at leas part of your questions. Looking forward you comments. Thank you.

BR, Mikhail
Imagen de Neal P (Intel)

Hello Mikhail,

So re-reading your comments, I believe that what you really care about the time to complete a specific image-processing task; that is, how long from "start" to "end" for a specific task on different architectures. You also indicated that you hope to improve performance by pushing more of the processing to the GPU -- by keeping the CPU fixed what's the overall performance of this task on Intel, nVidia, and AMD GPU's?

But I'm still not quite clear about your original question: are you looking for benchmarks for the various GPU's with OpenCL, or are you really wanting to analyze and optimize your workloads? I think you are really looking for both. If you want benchmarks, your favorite search engine can help find various OpenCL benchmark tools; if you want to help understand what's happening across both the CPU and GPU cores in order to improve performance, then that's where Intel GPA can help -- this link provides information about using Intel GPA with OpenCL.

One last thing -- have you seen IPP (Intel® Integrated Performance Primitives) -- this may help in optimizing the total image processing task (independent of the GPU).

Is this the kind of information you wanted?

Regards,

Neal

Imagen de Mikhail Smirnov

Hi Neal, thank you very much for the infotrmation provided. This is very close to what I was looking for!

BR, Mikhail
Imagen de Neal P (Intel)

Hello,

I'm glad that you found this helpful -- after you've read through some of this information please let me know if you have any follow-up questions.

Regards,

Neal

Imagen de sureshgupta22

I would like some insurance during the render that at least the current
progress is recoverable by performing a periodic write to disk. In
outputmode it is not rendering the entire image each pass and instead
rendering individual tiles to the max SPP

Imagen de Neal P (Intel)
Quoting sureshgupta22 I would like some insurance during the render that at least the current
progress is recoverable by performing a periodic write to disk. In
outputmode it is not rendering the entire image each pass and instead
rendering individual tiles to the max SPP

Hello,

I'm not sure how your comment/question is related to the topic of this thread (that is, "checking gpu performance") -- did you mean to have this included this here or start a new thread?

Also, can you provide more specific information on your comment/question -- including your hardware and graphics platform, your software and API's in use, and more details on the exact problem that you are seeing (screen shots, error logs, etc.).

Thanks!

Neal

Inicie sesión para dejar un comentario.