In my application I'm firing a small number (likely less than ~10k) of coherent rays and no secondary or incoherent rays. Also I need to minimize compute time. I see in the docs you recommend using ray streaming:
Best primary ray performance can be obtained by using the ray stream API and setting the intersect context flag to RTC_INTERSECT_CONTEXT_FLAG_COHERENT.
I see the viewer_stream tutorial that uses rtcIntersect1M but is there any reason not to use rtcIntersectNM to fire streams of ray packets? Also, is there any overhead using rtcIntersectNM with N==1 vs rtcIntersect1M? If not I can profile the difference with less code.