I have an application that performs a large number of rtcPointQueryV() over moderately-sized triangle meshes. I am seeing significant differences in performance depending on the hardware it is being executed on. I'm using Embree 3.8.0 and ISPC 1.12.0.
On my i9 MacBookPro, these jobs execute reliably in a handful of seconds or less (usually much less). Perfectly acceptable performance.
On less capable hardware (e.g. CPUs not reporting avx2 support), the same binary executing the same job can take well over a minute to execute. I certainly expected some slowdown but a >10X drop seems excessive (but maybe it isn't???). And it isn't consistent -- some jobs are reasonably performant while others seem to get lost somewhere in the rtcPointQuery() calls (for some of these jobs, it making 10,000+ calls).
In some cases, dialing the ISPC compiler optimizations down to -O0 actually significantly improved performance, but not in all cases.
I've tried changing the ISPC --target but it didn't seem to make a significant difference. My last debugging iteration targeted avx only.
I'm trying to figure out if this is an ISPC problem, an embree problem, or a _me_ problem. Any recommendations on next steps I should take? Or am I just expecting too much out of limited hardware?