we have an application with basically two (last) sequence of actions in the domain decomposition:
one set of tasks (subset a) calls
call mpi_win_lock(some_rank_from_subset_b) call mpi_win_get(some_rank_from_subset_b) call mpi_win_unlock(some_rank_from_subset_b)
the others (subset b) are stuck in the MPI_Barrier at the end of the domain decomposition. This performs nicely (passes domain decomposition within seconds) with MVAPICH on our new Intel Xeon machine and on another machine with IBM BlueGene/Q hardware.