One of my performance focus areas for this year is vectorization. I am excited to start creating more content and spreading the message about this technology, as it has been a little bit underappreciated in the past. So to kick things off, I am going to launch a blog series and a 1-hour overview webinar.
First, information about the webinar.
As part of my focus on software performance, I also support and consult on implementing scalable parallelism in applications. There are many reasons to implement parallelism as well as many methods for doing it - but this blog is not about either of those things. This blog is about the performance advantages of one particular way of implementing parallelism - and, luckily, that way is supported by several models available.
Prepare applications for optimization on the Intel® Itanium® processor family. The first issue in getting high performance code on Itanium-based systems is to get the code ported or written to run correctly in the 64-bit environment. It is not uncommon for code that functions correctly in a 32-bit environment to have latent bugs that will be exposed when the code is moved to a 64-bit environment.
Measure the time a program and its functions take to execute as part of the diagnosis phase of performance optimization. Such measurements are extremely valuable as a simple means to become familiar with how an application behaves during execution.
Use either the Linux time command or the clock function in the C library, and profile the application during compilation. The time command is used as follows:
It gives the following information: