For applications such as high frequency trading (HFT), search engines and telecommunications, it is essential that latency can be minimized. My previous article Optimizing Computer Applications for Latency, looked at the architecture choices that support a low latency application. This article builds on that to show how latency can be measured and tuned within the application software.
Machine learning applications are very compute intensive by their nature. That is why optimization for performance is quite important for them. One of the most popular libraries, Tensorflow*, already has an embedded timeline feature that helps understand which parts of the computational graph are causing bottlenecks but it lacks some advanced features like an architectural analysis.