Hi migdalskiy,
I'm fuzzy on the details, but suffice it to say, the MOPP is more complicated than a kd-tree.
MOPPs don't store the shape information at the leaves, just shape keys for the original shape collection by default. This means that you can include convex shapes (e.g. in an hkpExtendedMeshShape or hkpListShape) in addition to triangle meshes.
For lots of static bodies, I'd recommened that you put them into a single container ( hkpExtendedMeshShape or hkpListShape as above) and wrap a MOPP around them. Basically, the queries that you mentioned will all have to spend time in either the broadphase (if they're all separate bodies) or in the midphase (if they're all wrapped in a MOPP), and MOPP queries are faster than broadphase ones. For 10's or 100's of objects, it might not matter much, but for 1000's of objects, raycasts will get quite expensive, as will adding and removing bodies from the world. But if they're in a MOPP, that only counts as 1 body as far as the broadphase is concerned.
For serialization, it's not necessary to also serialize the shape collection that it was build around BUT you need to make sure that you reproduce the original shape collection after you load the MOPP. Essentially, you can think of a MOPP as something that takes an AABB and gives back shape keys for child shapes within that AABB. So if the shape keys at runtime are different from when the MOPP was built (say, if you add the hkpExtendedMeshShape subparts in a different order), you'll get crashes or weird collisions.
Any cost from the indirection in the shape hierarchy would be negligible next to the algorithmic cost from having lots of bodies in the broadphase (as described above).
Hope that helps clear things up a bit.
-Chris