I am building the attached program in OLCF's RHEA supercomputer (https://www.olcf.ornl.gov/computing-resources/rhea/) with Intel compiler icc (ICC) 14.0.4 20140805. Armadillo has a naive implementation of cscmm. I run the program with MKL_NUM_THREADS=1 in rhea to multiply the sparse matrix of size 83328x124992 with a dense matrix of size 124992x50. The following is the output.
./a.out 83328 124992 50 0.00001
The output of the test code