I implemented a program (for benchmarking purpose) using ArBB. The
program computes multiplication of sparse square matrix by dense vector.
Standard 3-arrays zero-based CRS format for matrix representation is used, that's why I defined such structure:
// non-zero values (NZnum - its size)
// indeces of columns (NZnum - size)
// indeces of rows (N + 1 - size, N - number of rows in matrix)
Function to compute multiplication:
void ArBBMultiplicate(crsMatrixArBB A, dense x, dense &b)
dense x_arbb = gather(x, A.Col);
x_arbb = x_arbb * A.Value;
nested row_blocks = reshape_nested_offsets(x_arbb, A.RowIndex);
b = add_reduce(row_blocks);
call(ArBBMultiplicate)(A_arbb, x_arbb, b_arbb);
I built Intel64 version of source code using Intel C++ Compiler.
source code works correctly until the size of the matrix N <= 60000
and the number of nonzero elements in every row NZnum <= 500
approximately, but I want to use N = 200000 and NZnum = 5000.
Our serial C/C++ implementation and parallel OpenMP, TBB, Cilk+ versions work with such parameters without any problems. In our ArBB implementation I have out of memory message during program execution:
"A memory allocation attempt was unsuccessful: OUT_OF_MEM: ArBB Heap out of usage [ArBB AMM Runtime Error]
The vector memory exhausted: Failed to alloc global data block!"
Are there any ways to create workable ArBB-implementation for such parameters?
- CPU: 2 processors Intel Xeon E5520 (2.27 GHz)(4 cores for each processor)
- RAM: 16 GB
- OS: Microsoft Windows 7
- Development Environment: Microsoft Visual Studio 2008
- Compiler: Intel C++ Composer XE 2011