Should I use VML to compute the sigmoid function of a vector?
If yes, is there a better way than doing 4 loops:
As you pointed the elimination of div operation , by calculating the inverse of 1+e^-t could be one of the optimization and I suppose that can be done by compiler.
Choice of approach for implementation of the vector sigmoid function could rely on several considerations including vector size, accuracy, and performance. Intel MKL Manual recommends to use VML, if vector size is larger than 40 elements. If vector size in your application is less than 40, Intel(R)R C/C++ compiler might be better choice. Vector math functions support three level of accuracy: high accuracy (HA), low accuracy (LA), and enhanced performance (EP). See additional details in "Data types, Accuracy Modes, and Performance tips" section in"Vector Math Functions" chapter of Intel MKL Manual. Start with EP version of VML functions, which is the fastest. If accuracy of EP VML functions is not sufficient, please try LA or HA versions of VML. Data available at http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/vml/vmldata.htm gives idea about "accuracy vs performance" aspects of Vector Math functions. If size of your vector is significant, you would get additional performance benefit from threading supported by VML (see, for example, performance graph for the exponent function at http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/vml/functions/exp.html)
Eventually, it makes sense to experiment with two versions of the sigmoid function, VML based, and compiler based, and to choose the one which meets your requirements. Combining math functions of Intel(R) MKLand Intel compiler for the development of the vector sigmoid function may be another option.