I am working on a PCIe video capturing device for Linux platforms (Ubuntu and Yocto). In our application, latency is a crucial point. Using kernel DMA APIs we were able to manage "zero-copy" transfers of large shared image buffers (OpenGL textures) directly between our PCIe device and the Intel GPU.
In 32bit environments with less than 4GB of RAM it works even without IOMMU: As our PCIe device is 32-bit based, we have to use the IOMMU on 64bits platforms with more than 4Gb of RAM. The generic SW IOMMU implementation failed as it is very limited in mapped space. Intel IOMMU of my Skylake works very well.
However my question is: What are the limits/capabilities (size of mapping tables, maximum of continuously mapped space, maximum number of entries etc...) of the Intel IOMMU on various platforms? I googled a lot and the only thing I found was the VT directed IO spec. But there are no numbers for limits described but only where to find the corresponding registers in hardware. Are these limits processor/platform/architecture dependent? Are there any documents regarding the limts and/or capabilities?