English | 中文 | Русский | Français
2,856 Posts served
8,606 Conversations started
My last post ended with me having used Debian's apt package manager to install the libtbb2 and tbb-examples packages on my Debian 4.0 (Etch) machine. When I attempted to make the examples, error messages were displayed, including missing *.h include file messages.
Had it not been so late at night, I would have continued my investigation, since I was pretty certain the problems could easily be resolved. This indeed was the case, as Alexey Kukanov pointed out in the comment he posted yesterday.
Installing the libtbb-dev package
As Alexey pointed out, to get the header/include files, I need to install the libtbb-dev package. As you can see, the libtbb-dev file list includes all the *.h files.
So, I installed the libtbb-dev package using
apt-get install libtbb-dev
then returned to the instructions listed in the README.Debian file that comes with the tbb-examples package, changing into the tbb-examples/examples directory and entering:
make
This time the examples were built and executed.
TBB benchmarks on an Intel Pentium Hyper-threaded system
My HP nx9600 laptop (my Debian Etch system) is not a multicore machine. It's got an Intel Pentium 4 processor with Hyper-Threading. Conversations with people on the #tbb IRC channel made me curious as to what type of performance I'd see when running multithreaded applications on a processor with hyperthreading.
Sure enough, the performance when TBB examples were run using two threads was often (but not always) better than the performance when the examples were run using a single thread. For example, the Convex Hull TBB example problem, which I extensively investigated in November 2007, returned these timings for 1 - 8 threads (unbuffered push_back version):
| Threads | Initialization Time (sec) |
Calculation Time (sec) |
| 1 | 0.84 | 3.02 |
| 2 | 0.62 | 2.41 |
| 3 | 0.58 | 2.60 |
| 4 | 0.60 | 2.45 |
| 5 | 0.55 | 2.48 |
| 6 | 0.62 | 2.59 |
| 7 | 0.57 | 2.45 |
| 8 | 0.56 | 2.44 |
Similarly, the tacheon examples showed improved performance with TBB applied:
| Program | CPU Time (sec) |
| tacheon.serial | 20.52 |
| tacheon.tbb1d | 14.74 |
| tacheon.tbb | 15.56 |
However, in some other cases the performance for more than one thread was significantly slower. For example, here are the results for 1 - 4 threads for the fibonacci example:
./fibonacci 500 Fibonacci numbers example. Generating 500 numbers.. Serial loop - in 0.340486 msec Serial matrix - in 4.977169 msec Serial vector - in 12.881134 msec Serial queue - in 61.615816 msec Threads number is 1 Shared serial (mutex) - in 12.008520 msec Shared serial (spin_mutex) - in 5.413665 msec Shared serial (queuing_mutex) - in 11.099723 msec Shared serial (Conc.HashTable) - in 218.821127 msec Parallel while+for/queue - in 81.017869 msec Parallel pipe/queue - in 122.565921 msec Parallel reduce - in 26.709551 msec Parallel scan - in 10.159765 msec Parallel tasks - in 27.199792 msec Threads number is 2 Shared serial (mutex) - in 42.876632 msec Shared serial (spin_mutex) - in 16.839141 msec Shared serial (queuing_mutex) - in 20.759846 msec Shared serial (Conc.HashTable) - in 256.510596 msec Parallel while+for/queue - in 78.621992 msec Parallel pipe/queue - in 139.240823 msec Parallel reduce - in 23.885403 msec Parallel scan - in 16.329680 msec Parallel tasks - in 25.740075 msec Threads number is 3 Shared serial (mutex) - in 50.188160 msec Shared serial (spin_mutex) - in 14.955134 msec Shared serial (queuing_mutex) - in 21.668117 msec Shared serial (Conc.HashTable) - in 229.680181 msec Parallel while+for/queue - in 686.667691 msec Parallel pipe/queue - in 129.353032 msec Parallel reduce - in 24.940150 msec Parallel scan - in 18.163549 msec Parallel tasks - in 26.354788 msec Threads number is 4 Shared serial (mutex) - in 44.824017 msec Shared serial (spin_mutex) - in 14.007360 msec Shared serial (queuing_mutex) - in 22.169669 msec Shared serial (Conc.HashTable) - in 273.406177 msec Parallel while+for/queue - in 78.259395 msec Parallel pipe/queue - in 134.116594 msec Parallel reduce - in 25.003553 msec Parallel scan - in 18.081328 msec Parallel tasks - in 26.773415 msec Fibonacci number #500 modulo 2^64 is 2171430676560690477
Coming up: TBB packages on Ubuntu
Now that TBB packages are available for Debian unstable (sid), the packages should become available in Ubuntu before long. People on the #tbb IRC channel jumped on this right away. Ubuntu Hardy Heron is currently scheduled for release in April 2008. However, the Debian import freeze for Ubuntu Hardy Heron occurred on December 13, 2007. Undeterred, our TBB friend Sadiq Jaffer submitted a tbb debian sync request to the Ubuntu team, requesting inclusion of the Debian TBB packages in Ubuntu Hardy Heron. So, we'll see what happens. Thanks, Sadiq!
More credit where it is due: Athena Capital Research
Before I close this series on TBB and Debian, I'd like to recognize the support that has been provided to the Debian TBB team by Athena Capital Research. Athena is funding, among other things, Roberto Sanchez's work on packaging TBB for Debian. A quick perusal of the Athena site shows us why Athena would be interested in a project like Threading Building Blocks: they focus on quantitative investing, "specializing in fully automated trading strategies".
This is an area that has been a core interest of mine in the past. In the 1980s and 1990s I spent thousands of hours inventing mathematical models for investing (see my final report on these efforts, if you're interested). My models were based on a quite simple input data set: end-of-day securities prices or index values.
The advent of cheap processing power means much more complex models can be applied in near real time to identify temporary anomalies in market pricing. Exploiting these anomalies can be one of the safer ways to make short-term profits in the market. But, to do so in a fast-changing market, you need a lot of processing power, and very fast computation (faster than your competitors, who will also be crunching the same input data using their own algorithms). The company whose algorithms are multithreaded will have a better chance to spot temporary pricing anomalies sooner, enabling them to execute the necessary trades before a competitor has a chance to react.
It's great to see companies like Athena Capital Research and Intel contributing to the growth of open source projects like Threading Building Blocks, Eclipse, and the Scythe Statistical Library.
Kevin Farnham, O'Reilly Media, TBB Open Source Community, Freenode IRC #tbb, TBB Mailing Lists
| January 11, 2008 12:19 PM PST
Kevin Farnham | Alexey: thanks for letting us know about this. |
| January 14, 2008 9:25 AM PST
Igor Borski |
Hi Kevin, hope you've posted a bug on tbb-examples so it should depend on libtbb-dev (or at least reccommend it) -- Regards, Igor |
| January 20, 2008 1:22 PM PST
Kevin Farnham | Igor: this change has been made by the Debian TBB packaging team -- the next tbb-examples package release will recommend the libtbb-dev package to users. |
| January 24, 2008 2:08 AM PST
Mabeth Borres |
Guys, Don't use parallel_for (etc) for simple loops. If you really want to see some speedups, put in some complexity in the loop. Fibonacci and integer summations are simple loop implementations. Add some math calls into your summation loops (cos(), sin(), etc..). Then you will find that there is a speedup (serial vs. parallel). Also initialise more than one thread (optimum at 8 threads). |

Alexey Kukanov (Intel)
14,341
Status Points:
14,341