TBB Multithreading of the par2cmdline Utility

I've taken a look at the source code for Vincent Tan's par2cmdline 0.4 with Intel Thread Building Blocks 2.0 project. This project was the grand prize winner in the Coding with TBB Contest that ran from late July through August. Contest entrants had to show how they applied Threading Building Blocks to multithread an application, to improve its performance on multicore computers.

I wrote briefly about the contest winners in my "Threading Building Blocks Contest Winners" post last week. I'm planning to investigate each winning entry and talk about what the developers did to develop their TBB-multithreaded applications in upcoming blogs, starting with this one.

Nicely documented code!


The first thing that stands out as you look at Vincent's source code is the effort he took to document the changes he made in multithreading the code. Vincent started with Version 0.4 of par2cmdline, the command line version of Parchive, the Parity Archive Tool. Parchive's original purpose was:

to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet


Current development has as its objective taking the recovery process beyond the file-level barrier, which will allow "for more effective protection with less recovery data."

Working with the command line version of Parchive, Vincent applied TBB to produce a multithreaded version. The TBB-threaded source code is available on Vincent's par2cmdline 0.4 project page.

When you unpack the source package, open file README_FIRST.txt to see a discussion of the project. What Vincent did -- which makes his project a fine tutorial on how to multithread an existing application using Threading Building Blocks -- is isolate all of his changes into #if blocks using a variable named WANT_CONCURRENT. So, as you scroll through the source code, you can readily identify Vincent's new TBB-threaded code, and immediately below (in the #else sections), you see the original code for which new code was substituted. In some cases, of course, there is no #else clause, where the only change was addition of new TBB-related code.

Modifications to thread par2cmdline using TBB


A grep of the source code for WANT_CONCURRENT shows that the following par2cmdline source files were modified:

    • commandline.cpp - 4 edits, about 20 new lines

    • commandline.h - 2 edits, 2 new lines

    • par2cmdline.cpp - 2 edits, 3 new lines (2 informational)

    • par2cmdline.h - 2 edits, about 30 new lines

    • par2creator.cpp - 7 edits, about 130 new lines

    • par2creator.h - 3 edits, 8 new lines

    • par2repairer.cpp - 25 edits, about 220 new lines

    • par2repairer.h - 6 edits, about 85 new lines



From this we see that about 500 new lines of code were added to par2cmdline in order to multithread it using TBB. The "new" lines include sections of code where old code was replaced (conveniently placed into the #else sections by Vincent, so we can easily see what was replaced). Hence, what I'm saying is that there are about 500 TBB-related lines of code in Vincent's TBB-threaded application.

The par2cmdline software has about 14000 lines of source in total, in 24 *.cpp files and 25 *.h files. So, Vincent was able to thread the application with TBB by modifying 1/6 of the source files, adding lines of code that represent about 3.5% of the lines of code in the original single-thread application.

These statistics will of course vary with the application. I'm just presenting them to get the data out there for developers who may be considering multithreading an application using Threading Building Blocks.

Up next...


I'm going take some time now and study the actual code changes and additions Vincent made in the par2cmdline source. In my next post, I'll talk about some of what he did to complete the TBB-threading of the application.

Kevin Farnham
O'Reilly Media
TBB Open Source Community

For more complete information about compiler optimizations, see our Optimization Notice.