Hi, I have spotted a bug in ray traversal (not sure if it had been already reported/fixed).
Ray constructor precomputes reciprocal ray dir usig approximate SSE _mm_rcp_ps with improved precision through one iteration of Newton method. Problem revelas if one of the ray dir component is zero. This would normally gives rcp=inf, which would be correctly handled during ray traversal. But improving precision using Newton-Raphson method turns that inf into a NaN and that will cause the tree traversal to visit every single node of the tree with very strong performance degradation when those rare case happens (in my test case a 70ms render was turning into a 2000ms one).
A full precision _mm_div_(__mm_set1_ps(one), m128)) instead fixes the problem with apparently negligible performance drawbacks.