we have been developing HD video capture PCIe (Gen2x8) cards, which are installed in HPC servers with Intels Dual-Xeon NUMA architecture. With the SandyBridge-v1/IvyBridge-v2 architecture everything worked fine. Now with the new Haswell-v3 servers we have the following problem:
The video streams (PCIe slot -> RootComplex) start stuttering every few seconds or minutes. When this happens all Tx posted data credits have expired. We observed this situation (all PD credits consumed) already in with the IvyBridge architecture, however, the system recovered quickly from this situation and the temporary bandwidth drop was easily compensated for by the FIFOs in the Tx signal path (no visual degradation in the video streams). This is not the case with the Haswell architecture: sometimes the PD credits are being returned quite slowly – even at times when no new Tx packets are being issued. Typically in this case we observe PD credits being freed up in small steps only: 0 – 4 – 8 – 12 - … It then takes tens of microseconds until the system has recovered. When everything is working as expected the PD credits are being freed up in much larger chunks. The described behavior is noticeable even on low Tx bandwidths (>= 2.2 GBit/s).
We stripped our software to a minimum to ensure that the data we capture is not processed at all - just transferred to memory via DMA. We double-checked the driver software and also made some tests with different memory allocation methods and DMA transfer setups.
We are using Linux and did the tests with kernel 3.7 (OpenSuse 12.1) and 3.10 (CentOS 7.1). We also tried servers from ASUS and Supermicro.
None of these different test scenarios helps us to get rid of the problem resp. to find a hint whats going on.
Has anyone an idea what the cause of such problems?
Is there a difference between IvyBridge-v2 and Haswell-v3 regarding PCIe credits handling (buffering, flow control)?
Are there tools from Intel helping us to find out what's going regards.
Thanks and kind regards