Expected performance on ArBB samples?

Expected performance on ArBB samples?

Hello, I just installed ArBB on my system (OS=fc12.x86_64, CPU = 2 Quad-Core 2.8 GHz Xeon). As a test of installation quality, I executed the run scripts for the samples (build_run-icc.sh and build_run-gcc.sh). For some tests I observed significant C-to-ArBB speedups (such as a 89.1x speedup on ArBB1 for the monte-carlo test [O3, icc, non-debug]). However, for other tests I saw nominal speed-ups (such as back_projection) or even slow-downs (such as a 0.016x performance factor on ArBB3 for raytracing2 [O3, icc, non-debug].)

I was curious if this is the expected performance or is there something probably wrong with my installation? If you have a link to expected performance results for different platforms, that would be particularly useful for me to compare to.


4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thanks for your message. From your message you haven't done anything wrong. Our samples are in the process of being improved significantly in-lieu of the next release. In some of the samples we are getting unusually high speedups/slowdowns as you mentioned. We are looking into the possibility of a bug in the recording of speed on a few of the workloads, in particular the ones you mentioned. Theslowdown for the raytracing2 is a known issue. Also, some of the samples default to very small data sets which do not represent the full capabilities of ArBB. We highly recommend running each of the samples individually, making sure theO3 is enabled (vectorization+threading), and increasing the size of the datasets in question. We will have these issues fixed in the upcoming beta update release.

The samples coming with the installation package all have very small problem sizes by default, which are only good for quick testing of functionality. To do performance comparison against the C baseline implementation (included in the package), please define the BIG_DATA_SET preprocessor macro at compile time to use appropriate problem sizes.

Thanks for exploring Intel ArBB. Let us know more questions that you may have.

Thanks Noah and Zhang for the replies. As you mentioned, I defined BIG_DATA_SET and saw considerably better performance on raytracing2. For compiler=icc and O3, I got:

For #define SMALL_DATA_SET

Version Time(s) Speed Up
C 0.000004 1.000
ArBB1 0.000143 0.028
ArBB2 0.000154 0.026
ArBB3 0.000202 0.020

For #define BIG_DATA_SET

Version Time(s) Speed Up
C 0.189003 1.000
ArBB1 0.098823 1.913
ArBB2 0.072550 2.605
ArBB3 0.073078 2.586

Leave a Comment

Please sign in to add a comment. Not a member? Join today