I would like to clarify the basic cache parameters in "Intel Xeon Phi Coprocessor System Software Developers Guide".
In Table 2.4 "Cache Hierarchy", the table says duty cycles of L2 is "1 per clock", but the text body says "The L1 cache can be accessed each clock, whereas the L2 can only be accessed every other clock". It may mean the duty cycles is 2 clocks.
"Intel Xeon Phi Core Micro-architecture" has discrepancies with the table, too. The document says "The data cache allows simultaneous read and write allowing cache line replacement to happen in a single cycle", which may mean the ports of L1 is read AND write. It also says "The cache is divided into two logical banks", which may mean the number of banks is TWO. It says "L2 cache can deliver 64 bytes of read data to corresponding cores every two cycles and 64 bytes of write data every cycle", which may mean the duty cycle is two.
I am referring to the documents "Intel Xeon Phi Coprocessor System Software Developers Guide" as of 328207-002EN June, 2013; and "Intel Xeon Phi Core Micro-architecture" copyrighted by Apress 2012.
I am tuning my code for the L2 cache bandwidth, so I care about the peak performance very much.