I am on a corei7 quad core machine with ASUS P9X79WS motherboard and Xeon Phi 3120A card installed.
Operating system is RHEL 6.4 with mpss 3.1 for phi and parallel_sutdio_2013 SP1 installed.
Just for detail, the phi card has 57 cores, with capability of about 1003 GFlops for double precision.
I am seeing some performance issues that I don't understand.
When I time MKL's parallel DGEMM on phi card, it is getting 300GFlops, which is about 30% of peak.
Note that I am doing native execution.