Hello,
Recently, I started learning TBB. I tried to implement RBF calculation using the following program. Unfortunately, but working on 4 cores obtained performance similar to or worse than serial programs. I will be grateful for a hint on how to improve the program.
Thanks,
Stan
#include <iostream>
#include "parallel_1.h"
#include <tbb\tick_count.h>
#include <tbb\parallel_reduce.h>
#include <tbb\task_group.h>
#include <mkl_vml.h>
#include <mkl_blas.h>
