By Taylor Kidd, Intel Corporation
This article is essentially a collection of blogs I wrote on the same subject. The differences are simply a degree of formalism.
Intel MKL 11.3 has introduced Intel TBB support.
In interpreted languages, it just takes longer to get stuff done - I earlier gave the example where the Python source code a = b + c would result in a BINARY_ADD byte code which takes 78 machine instructions to do the add, but it's a single native ADD instruction if run in compiled language like C or C++. How can we speed this up? Or as the performance expert would say, how do I decrease...