Porting TBB onto Alpha Platform

Porting TBB onto Alpha Platform

chenxuhao的头像

This
manual is a guide for porting Intel Threading Building Blocks (TBB) [1] to Alpha
ISA [2].

1. Background

1.1. TBB

Intel
Threading Building Blocks (TBB) [1] is a C++ multithreading runtime library developed
by Intel.

1.2. Alpha
ISA

Alpha
processor [2] is a 64-bit RISC processor introduced by DEC Corporation.

1.3. gem5
simulator

gem5[3,
4, 5] is an execution-driven full-system architecture simulator. It supports
Alpha, x86, ARM, MIPS, and SPARC ISA, while the Alpha version works most
stably.

2. Porting
TBB onto Alpha

Key
issue: if you try to compile TBB for Alpha directly, there will be errors about some atomic primitives which are not implemented, such as __TBB_CompareAndSwap4()
and __TBB_CompareAndSwap8(). An
possible way is to implement these macros for Alpha architecture, but it is obviously
costly. Fortunately, we can use the build-in Generic GCC* Atomics Support. For
this, the gcc compiler for Ahpla must be at least v4.3.6, and minimal tbb
version is TBB 3.0 Update 7. In fact, youd better use tbb 4.0 update 3 or
later. I use the tbb40_20120201oss. To use the build-in Generic GCC* Atomics
Support, you should add -DTBB_USE_GCC_BUILTINS option when compiling TBB.

2.1. Obtaining
Cross-compiler

My
host OS is Ubuntu 10.04.

Download
crosstool-NG and install it. http://ymorin.is-a-geek.org/projects/crosstool

./configure
--prefix=/location/to/install/crosstool-NG

make

make install

Then
generate the Alpha cross-compiler using crosstool-NG:

/location/to/install/crosstool-NG/bin# ./ct-ng
menuconfig

Configure
the options like this:

Target options

Target Architecture
- alpha

Variant - ev67

Operating System

Target OS - linux

Binary utilities

binutils version -
2.19.1

C compiler

gcc version - 4.3.6

Additional
supported languages: C++

C-library

C library - glibc

glibc version - 2.9

Threading implementation to use - nptl

Just
keep other options as default. After configuration, save and exit.

Next
build the cross-compiler:

/location/to/install/crosstool-NG/bin# ./ct-ng
build

Note
that if your gcc version is 4.4.3, there may be errors during building. This is
due to gcc 4.4.3, and you can just change to another version of gcc.

The
building will last for about one hour. The generated cross-compiler will be at x-tools
directory of the system root path, for example, in my system, it is at /root/x-tools
directory.

So
far, we have obtained an appropriate cross-compiler.

2.2. Cross-compile
TBB Library

Next,
lets cross-compile the TBB library.

First
unpack TBB source at TBB_HOME=/home/tbb40_20120201oss.

To
specify the location of cross-compiler, modify line 44 of linux.gcc.inc in TBB_HOME/build:

#CPLUS = g++

CPLUS =
/root/x-tools/alphaev67-unknown-linux-gnu/bin/alphaev67-unknown-linux-gnu-g++

#CONLY = gcc

CONLY =
/root/x-tools/alphaev67-unknown-linux-gnu/bin/alphaev67-unknown-linux-gnu-gcc

AR =
/root/x-tools/alphaev67-unknown-linux-gnu/bin/alphaev67-unknown-linux-gnu-ar

RANLIB =
/root/x-tools/alphaev67-unknown-linux-gnu/bin/alphaev67-unknown-linux-gnu-ranlib

Since
we will run the binary in simulator, wed better to compile TBB as a static
linked library. Modify line 118 of Makefile.tbb in TBB_HOME/build:

$(TBB.DLL): $(TBB.OBJ)
$(TBB.RES) tbbvars.sh $(TBB_NO_VERSION.DLL)

# $(LIB_LINK_CMD)
$(LIB_OUTPUT_KEY)$(TBB.DLL) $(TBB.OBJ) $(TBB.RES) $(LIB_LINK_LIBS)
$(LIB_LINK_FLAGS)

$(AR) rcs libtbb.a $(TBB.OBJ)

$(RANLIB) libtbb.a

Compile
the TBB library in TBB_HOME:

# make arch=alpha64 compiler=gcc CXXFLAGS="-DTBB_USE_GCC_BUILTINS" runtime=cc4.3.6_libc2.9

Note
that arch
must be alpha64, not alpha, or errors will occur.

Test
whether the compilation is successful:

# make arch=alpha64 compiler=gcc CXXFLAGS="-DTBB_USE_GCC_BUILTINS" runtime=cc4.3.6_libc2.9
test

If
errors about execution occurs, its OK, because the generated binary is for Alpha
ISA, it certainly cannot execute on x86 platform.

2.3. Cross-compile
TBB application

Lets
write a test application Matrix Multiplication (matrixMul_tbb.cpp),
and compile it:

#/root/x-tools/alphaev67-unknown-linux-gnu/bin/alphaev67-unknown-linux-gnu-g++
-o mm_tbb matrixMul_tbb.cpp -static -static-libgcc
-I/home/tbb40_20120201oss/include/
-L/home/tbb40_20120201oss/build/linux_alpha64_gcc_cc4.3.6_libc2.9_release/
-ltbb -ldl -lrt -lpthread

Then
an Alpha binary mm_tbb will be
generated. Run it and an error occurs:

# ./mm_tbb

bash: ./mm_tbb: cannot execute binary file

Of
cause, the Alpha binary cannot execute on x86 platform. Never mind, we will then
run it on Alpha platform.

2.4. Run
the Binary in Simulator

Download
gem5 simulator and install it in /GEM5_HOME. Copy Alpha binary mm_tbb to
disk image linux-parsec.img. See details in [6].

Start
the simulator in /GEM5_HOME:

#
./build/ALPHA_FS/gem5.opt configs/example/fs.py -n 2

Launch
another terminal, and check the booting process of simulator:

/GEM5_HOME/util/term# ./m5term 3456

After
booting the system, let us run the application:

...

mounting filesystems...

loading script...

Script from M5 readfile is empty, starting bash
shell...

# source tbbvars.sh

# ls

benchmarks
lib mm_tbb splash usr

bin
libtbb.so mnt sys var

dev
libtbb.so.2 modules tbb

etc
linuxrc parsec tbbvars.sh

hello
lost+found proc test

iscsi
sbin tmp

# ./mm_tbb

Using Matrix Sizes: A(100 x 100), B(100 x 100),
C(100 x 100)

start timing

TBB matrixMul, Throughput = 0.0758 GFlop/s, Time
= 0.02638 s, NumOps = 2000000

Test
passed!

So
far, we have successfully cross-compiled the TBB version of Matrix Multiplication
to run on Alpha ISA.

Acknowledgement

Much gratitude goes to Raf Schietekat, Vladimir
Polin (Intel), Sergey Kostrov, Anton Potapov (Intel), Alexey Kukanov (Intel), and
Anton Malakhov (Intel) for their valuable advice.

Reference

[1] Intel
Threading Building Blocks, http://threadingbuildingblocks.org/

[2] Alpha
Architecture Handbook, http://www.compaq.com/cpq-alphaserver/technology/literature/alphaahb.pdf

[3]
The gem5
Simulator, SIGARCH Computer Architecture News, CAN11

[4]
Binkert, N.
L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S.
K. The M5 Simulator: Modeling Networked Systems. IEEE Micro 26, 4 (Jul/Aug
2006), 52-60.

[5]
Martin, M. M.
K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R.,
Moore, K. E., Hill, M. D., and Wood, D. A. Multifacet's general
execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput.
Archit. News 33, 4 (2005), 92-99.

[6] Mark
Gebhart et al., Running PARSEC 2.1 on M5, The University of Texas at Austin, Technical
Report TR-09-32

Join the revolution, and to be a giant. The PARLAB at Berkeley, UPCRC Illinois, and the Pervasive Parallel Laboratory at Stanford.
1 条帖子 / 0 全新
如需更全面地了解编译器优化,请参阅优化注意事项