I am casting a large number of rays from a compact ray-casting source into a static geometry scene. This is not for rendering. The geometry may have a very large # of triangles.

Each ray can 'reflect' a fixed number of times ( unless of course it misses geometry entirely) . The angle of reflection of each ray from the surface might need to be adjusted from the exact form based on some surface characteristics.

I collect information about each ray "segment" for later processing such as how far in total the ray traveled and was attenuated.

At the moment I have a fairly naive algorithm with a outer loop over Nrefl ( # of reflections) and an inner loop over N rays. In the inner loop I call rtcIntersect to intersect each ray with the scene. At an intersection, I change the direction of the ray ( the reflection off the surface)

Before using embree I used my own custom code using a kd-tree and OpenMP to keep a multicore machine busy. So I am looking for suggestions on how to optimize my use of embree. For example, should I cast packets of rays?