Vectorization messages: fno-builtin

Vectorization messages: fno-builtin

Hi All.

Using ICC-v11.0, when I perform compilation for having vec-report3, with "-fno-builtin" & w/o "-fno-builtin" alongwith -O3, I see big differences with vectorizations messages being generated for multi CPP files package as follows -

-----with option "-fno-builtin" ----------
main.cc(169): (col. 27) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(554): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(554): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(560): (col. 1) remark: loop was not vectorized: not inner loop.
main.cc(562): (col. 9) remark: loop was not vectorized: subscript too complex.
main.cc(566): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(566): (col. 1) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(566): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(583): (col. 1) remark: loop was not vectorized: not inner loop.
main.cc(584): (col. 5) remark: LOOP WAS VECTORIZED.
main.cc(589): (col. 1) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(600): (col. 1) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(604): (col. 1) remark: LOOP WAS VECTORIZED.
main.cc(645): (col. 1) remark: loop was not vectorized: unsupported loop structure.
main.cc(1606): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(1637): (col. 21) remark: LOOP WAS VECTORIZED.
main.cc(1686): (col. 25) remark: LOOP WAS VECTORIZED.
main.cc(3603): (col. 22) remark: LOOP WAS VECTORIZED.
main.cc(3603): (col. 22) remark: loop skipped: multiversioned.
main.cc(3709): (col. 22) remark: LOOP WAS VECTORIZED.
main.cc(3709): (col. 22) remark: loop skipped: multiversioned.
main.cc(3901): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(3902): (col. 13) remark: loop was not vectorized: low trip count.
main.cc(3909): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(3930): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(3968): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(3988): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(3989): (col. 13) remark: loop was not vectorized: low trip count.
main.cc(969): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(988): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(1031): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(1044): (col. 13) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(3209): (col. 20) remark: LOOP WAS VECTORIZED.
main.cc(3209): (col. 20) remark: loop skipped: multiversioned.
main.cc(3226): (col. 11) remark: loop was not vectorized: unsupported loop structure.
main.cc(3104): (col. 21) remark: LOOP WAS VECTORIZED.
main.cc(3104): (col. 21) remark: loop skipped: multiversioned.
main.cc(3121): (col. 12) remark: loop was not vectorized: unsupported loop structure.
main.cc(2899): (col. 7) remark: loop was not vectorized: existence of vector dependence.
main.cc(2902): (col. 10) remark: vector dependence: assumed ANTI dependence between trnStep0 line 2902 and trnStep0 line 2900.
main.cc(2900): (col. 10) remark: vector dependence: assumed FLOW dependence between trnStep0 line 2900 and trnStep0 line 2902.
main.cc(2900): (col. 10) remark: vector dependence: assumed FLOW dependence between trnStep0 line 2900 and trnStep0 line 2902.
main.cc(2902): (col. 10) remark: vector dependence: assumed ANTI dependence between trnStep0 line 2902 and trnStep0 line 2900.
main.cc(2900): (col. 10) remark: vector dependence: assumed OUTPUT dependence between trnStep0 line 2900 and lb_rho_ptr line 2902.
main.cc(2902): (col. 10) remark: vector dependence: assumed OUTPUT dependence between lb_rho_ptr line 2902 and trnStep0 line 2900.
main.cc(2907): (col. 7) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(2914): (col. 7) remark: loop was not vectorized: unsupported loop structure.
main.cc(2975): (col. 22) remark: LOOP WAS VECTORIZED.
main.cc(2975): (col. 22) remark: loop skipped: multiversioned.
main.cc(2990): (col. 13) remark: loop was not vectorized: unsupported loop structure.
main.cc(2786): (col. 13) remark: LOOP WAS VECTORIZED.
main.cc(2609): (col. 21) remark: LOOP WAS VECTORIZED.
main.cc(2644): (col. 17) remark: loop was not vectorized: unsupported loop structure.
main.cc(2663): (col. 17) remark: loop was not vectorized: existence of vector dependence.
main.cc(2664): (col. 21) remark: vector dependence: assumed ANTI dependence between (unknown) line 2664 and (unknown) line 2664.
main.cc(2664): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 2664 and (unknown) line 2664.
main.cc(2664): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 2664 and (unknown) line 2664.
main.cc(2664): (col. 21) remark: vector dependence: assumed ANTI dependence between (unknown) line 2664 and (unknown) line 2664.
main.cc(2667): (col. 17) remark: LOOP WAS VECTORIZED.
main.cc(904): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(905): (col. 30) remark: loop was not vectorized: unsupported loop structure.
main.cc(911): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(1878): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(1885): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(1905): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(1805): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(1761): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(1571): (col. 13) remark: loop was not vectorized: unsupported loop structure.
main.cc(1575): (col. 21) remark: loop was not vectorized: modifying order of operation not allowed under given switches.
main.cc(1235): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(1236): (col. 13) remark: LOOP WAS VECTORIZED.
main.cc(1241): (col. 9) remark: LOOP WAS VECTORIZED.
main.cc(1245): (col. 9) remark: LOOP WAS VECTORIZED.
main.cc(1256): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(1258): (col. 17) remark: loop was not vectorized: subscript too complex.
main.cc(1262): (col. 9) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(1262): (col. 9) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(1262): (col. 9) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(1303): (col. 29) remark: loop was not vectorized: statement cannot be vectorized.
main.cc(1325): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(1344): (col. 21) remark: loop was not vectorized: unsupported loop structure.
main.cc(752): (col. 1) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(3920): (col. 9) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(996): (col. 9) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(1315): (col. 9) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
--

Total 84 lines of vectorizations messages with "-fno-builtin" for above.

But without using "-fno-builtin", the number of vectorizations messages are 203 lines as below -

--
main.cc(169): (col. 27) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(554): (col. 1) remark: LOOP WAS VECTORIZED.
remark: loop was not vectorized: operation cannot be vectorized.
main.cc(570): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(570): (col. 1) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(570): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(587): (col. 1) remark: loop was not vectorized: not inner loop.
remark: loop was not vectorized: operation cannot be vectorized.
main.cc(594): (col. 1) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(605): (col. 1) remark: loop was not vectorized: vectorization possible but seems inefficient.
remark: loop was not vectorized: operation cannot be vectorized.
main.cc(653): (col. 1) remark: loop was not vectorized: unsupported loop structure.
main.cc(1660): (col. 37) remark: loop was not vectorized: statement cannot be vectorized.
main.cc(1691): (col. 21) remark: LOOP WAS VECTORIZED.
main.cc(1741): (col. 25) remark: LOOP WAS VECTORIZED.
main.cc(3664): (col. 22) remark: LOOP WAS VECTORIZED.
main.cc(3770): (col. 22) remark: LOOP WAS VECTORIZED.
main.cc(3994): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(3995): (col. 13) remark: loop was not vectorized: low trip count.
main.cc(4003): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(4026): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(4065): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(4086): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(4088): (col. 13) remark: loop was not vectorized: low trip count.
main.cc(979): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(980): (col. 13) remark: vector dependence: assumed ANTI dependence between (unknown) line 980 and (unknown) line 980.
main.cc(980): (col. 13) remark: vector dependence: assumed FLOW dependence between (unknown) line 980 and (unknown) line 980.
main.cc(980): (col. 13) remark: vector dependence: assumed FLOW dependence between (unknown) line 980 and (unknown) line 980.
main.cc(980): (col. 13) remark: vector dependence: assumed ANTI dependence between (unknown) line 980 and (unknown) line 980.
main.cc(980): (col. 13) remark: vector dependence: assumed FLOW dependence between (unknown) line 980 and ligand_atom_type_ptrs line 980.
main.cc(980): (col. 13) remark: vector dependence: assumed ANTI dependence between ligand_atom_type_ptrs line 980 and (unknown) line 980.
main.cc(999): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and logFile line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and logFile line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and logFile line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and logFile line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between (unknown) line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between (unknown) line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed FLOW dependence between (unknown) line 1000 and (unknown) line 1000.
main.cc(1000): (col. 24) remark: vector dependence: assumed ANTI dependence between (unknown) line 1000 and (unknown) line 1000.
main.cc(1044): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(1058): (col. 13) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(3269): (col. 20) remark: LOOP WAS VECTORIZED.
main.cc(3287): (col. 11) remark: loop was not vectorized: unsupported loop structure.
main.cc(3163): (col. 21) remark: LOOP WAS VECTORIZED.
main.cc(3181): (col. 12) remark: loop was not vectorized: unsupported loop structure.
main.cc(2948): (col. 7) remark: loop was not vectorized: low trip count.
main.cc(2959): (col. 7) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(2971): (col. 7) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(3034): (col. 22) remark: LOOP WAS VECTORIZED.
main.cc(3050): (col. 13) remark: loop was not vectorized: unsupported loop structure.
main.cc(2847): (col. 13) remark: LOOP WAS VECTORIZED.
main.cc(2666): (col. 21) remark: LOOP WAS VECTORIZED.
main.cc(2702): (col. 17) remark: loop was not vectorized: unsupported loop structure.
main.cc(2722): (col. 17) remark: loop was not vectorized: existence of vector dependence.
main.cc(2723): (col. 21) remark: vector dependence: assumed ANTI dependence between (unknown) line 2723 and (unknown) line 2723.
main.cc(2723): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 2723 and (unknown) line 2723.
main.cc(2723): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 2723 and (unknown) line 2723.
main.cc(2723): (col. 21) remark: vector dependence: assumed ANTI dependence between (unknown) line 2723 and (unknown) line 2723.
main.cc(2727): (col. 17) remark: LOOP WAS VECTORIZED.
main.cc(913): (col. 9) remark: loop was not vectorized: not inner loop.
main.cc(914): (col. 30) remark: loop was not vectorized: unsupported loop structure.
main.cc(920): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 924.
main.cc(924): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 924 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 925.
main.cc(925): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 925 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 929.
main.cc(929): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 929 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 929.
main.cc(929): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 929 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 930.
main.cc(930): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 930 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 934.
main.cc(934): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 934 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 934.
main.cc(934): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 934 and (unknown) line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 923 and (unknown) line 935.
main.cc(935): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 935 and (unknown) line 923.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and time_seed line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between time_seed line 923 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and seed line 923.
main.cc(923): (col. 21) remark: vector dependence: assumed OUTPUT dependence between seed line 923 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 924.
main.cc(924): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 924 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 924.
main.cc(924): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 924 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and logFile line 925.
main.cc(925): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 925 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and (unknown) line 925.
main.cc(925): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 925 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 925.
main.cc(925): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 925 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and (unknown) line 926.
main.cc(926): (col. 28) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 926 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and seed line 928.
main.cc(928): (col. 21) remark: vector dependence: assumed OUTPUT dependence between seed line 928 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 929.
main.cc(929): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 929 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 929.
main.cc(929): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 929 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and logFile line 930.
main.cc(930): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 930 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and (unknown) line 930.
main.cc(930): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 930 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 930.
main.cc(930): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 930 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and seed line 933.
main.cc(933): (col. 21) remark: vector dependence: assumed OUTPUT dependence between seed line 933 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 934.
main.cc(934): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 934 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 934.
main.cc(934): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 934 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and logFile line 935.
main.cc(935): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 935 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between param line 921 and (unknown) line 935.
main.cc(935): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 935 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between param line 921 and seed line 935.
main.cc(935): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 935 and param line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and time_seed line 923.
main.cc(923): (col. 45) remark: vector dependence: assumed OUTPUT dependence between time_seed line 923 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and seed line 923.
main.cc(923): (col. 21) remark: vector dependence: assumed OUTPUT dependence between seed line 923 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 924.
main.cc(924): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 924 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 924.
main.cc(924): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 924 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and logFile line 925.
main.cc(925): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 925 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and (unknown) line 925.
main.cc(925): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 925 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 925.
main.cc(925): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 925 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and param line 926.
main.cc(926): (col. 28) remark: vector dependence: assumed OUTPUT dependence between param line 926 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and (unknown) line 926.
main.cc(926): (col. 28) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 926 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and seed line 928.
main.cc(928): (col. 21) remark: vector dependence: assumed OUTPUT dependence between seed line 928 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 929.
main.cc(929): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 929 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 929.
main.cc(929): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 929 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and logFile line 930.
main.cc(930): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 930 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and (unknown) line 930.
main.cc(930): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 930 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 930.
main.cc(930): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 930 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and param line 933.
main.cc(933): (col. 31) remark: vector dependence: assumed OUTPUT dependence between param line 933 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and seed line 933.
main.cc(933): (col. 21) remark: vector dependence: assumed OUTPUT dependence between seed line 933 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 934.
main.cc(934): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 934 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 934.
main.cc(934): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 934 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and logFile line 935.
main.cc(935): (col. 24) remark: vector dependence: assumed ANTI dependence between logFile line 935 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 921 and (unknown) line 935.
main.cc(935): (col. 21) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 935 and (unknown) line 921.
main.cc(921): (col. 21) remark: vector dependence: assumed FLOW dependence between (unknown) line 921 and seed line 935.
main.cc(935): (col. 21) remark: vector dependence: assumed ANTI dependence between seed line 935 and (unknown) line 921.
main.cc(1933): (col. 33) remark: loop was not vectorized: statement cannot be vectorized.
main.cc(1940): (col. 13) remark: loop was not vectorized: existence of vector dependence.
main.cc(1961): (col. 9) remark: loop was not vectorized: existence of vector dependence.
main.cc(1966): (col. 31) remark: vector dependence: assumed ANTI dependence between (unknown) line 1966 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed FLOW dependence between (unknown) line 1963 and (unknown) line 1966.
main.cc(1963): (col. 17) remark: vector dependence: assumed ANTI dependence between (unknown) line 1963 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed FLOW dependence between (unknown) line 1963 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed FLOW dependence between (unknown) line 1963 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed ANTI dependence between (unknown) line 1963 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed FLOW dependence between (unknown) line 1963 and (unknown) line 1966.
main.cc(1966): (col. 31) remark: vector dependence: assumed ANTI dependence between (unknown) line 1966 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 1963 and (unknown) line 1966.
main.cc(1966): (col. 31) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 1966 and (unknown) line 1963.
main.cc(1963): (col. 17) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 1963 and (unknown) line 1966.
main.cc(1966): (col. 13) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 1966 and (unknown) line 1963.
main.cc(1961): (col. 9) remark: LOOP WAS VECTORIZED.
main.cc(1860): (col. 33) remark: loop was not vectorized: statement cannot be vectorized.
main.cc(1817): (col. 33) remark: loop was not vectorized: statement cannot be vectorized.
main.cc(1625): (col. 13) remark: loop was not vectorized: not inner loop.
main.cc(1628): (col. 17) remark: loop was not vectorized: low trip count.
main.cc(1263): (col. 9) remark: loop was not vectorized: not inner loop.
remark: loop was not vectorized: operation cannot be vectorized.
remark: loop was not vectorized: operation cannot be vectorized.
main.cc(1277): (col. 9) remark: LOOP WAS VECTORIZED.
remark: loop was not vectorized: operation cannot be vectorized.
main.cc(1294): (col. 9) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(1294): (col. 9) remark: loop was not vectorized: vectorization possible but seems inefficient.
main.cc(1294): (col. 9) remark: PARTIAL LOOP WAS VECTORIZED.
main.cc(1335): (col. 9) remark: LOOP WAS VECTORIZED.
main.cc(1360): (col. 9) remark: loop was not vectorized: unsupported loop structure.
main.cc(1380): (col. 21) remark: loop was not vectorized: unsupported loop structure.
main.cc(761): (col. 1) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(4015): (col. 9) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(1008): (col. 9) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
main.cc(1349): (col. 9) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate.
--

The -fno-builtin option means the compiler is told to disable inline expansion of intrinsic functions or not to use Compiler instrinsics which are normally defined alongwith instructions for particular architecture. Also, don't recognize built-in functions that do not begin with __builtin_ as prefix.

Query: Which vectorization report has to be used for performing effective Vectorization? (Usually the top above one but still why other vectorization messages are hidden or not taken care, Is it a better feature towards ICC-v11.0 approach?)

~BR

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I got some info from the compiler engineer.
In case of library calls, when "-fno-builtin" is used, some properties of the library call will be lost (such as no side effects); This that affects compiler's decision on vectorization. It's why you see the different report.

Quoting - Jennifer Jiang (Intel)

I got some info from the compiler engineer.
In case of library calls, when "-fno-builtin" is used, some properties of the library call will be lost (such as no side effects); This that affects compiler's decision on vectorization. It's why you see the different report.

Hi Jiang.

TX but would appreciate atleast some concrete reasons why such behaviour happens such that one can take decision while generating vec-report with which built-in APIs has to be included and which not.

If that -builti-in API is important for having effective vec-report than certainly would prefer adding it else nop.

Looking for some solution still.

~BR

Quoting - Jennifer Jiang (Intel)

I got some info from the compiler engineer.
In case of library calls, when "-fno-builtin" is used, some properties of the library call will be lost (such as no side effects); This that affects compiler's decision on vectorization. It's why you see the different report.

Hello ISN,

Probably, in ISN forum we all learn and explore Intel products but at the same time to understand more and have better insights it's really appreciable if someone from Intel can take initiative in answering the queries as needed in more reasonable way.

It has been almost 2-3 weeks by now that no effective answer has been provided till date for this posting.

If such is the sinerio, somehow one (non-Intel people) can loose their interests in asking or supporting any queries.

~BR

Sorry for the delayed response.

After checking the output msg in the original posting, the vectorized loops are 21 without -fno-builtin, 18 with -fno-builtin. So definitly without "-fno-builtin" is better.

I'm also asking for more information as well. Two side effects might be the functions may be using SSE registers, or may update the global variables that is used in the loop.

Jennifer

Here is more generic answer:

The side effects mean memory accesses could cause loop carried dependencies that preventing loop from vectorization, as the compiler does not know what is the function call would do.

Quoting - Jennifer Jiang (Intel)

Sorry for the delayed response.

After checking the output msg in the original posting, the vectorized loops are 21 without -fno-builtin, 18 with -fno-builtin. So definitly without "-fno-builtin" is better.

I'm also asking for more information as well. Two side effects might be the functions may be using SSE registers, or may update the global variables that is used in the loop.

Jennifer

Idon't think with"PARTIAL LOOP WAS VECTORIZED" message we can conclude the section of code has been effectively vectorized, rather one should consider further performing or tuning the code, one can get message like "LOOP was VECTORIZED" for effective vectorization to take place if limit doesn't becomes a bottleneck.

With "fno-builtin", the number of messages stating "LOOP WAS VETORIZED" is 15 but without "fno-builtin" it's 14, so if we take only "LOOP WAS VECTRORIZED" messages than with "fno-builtin" option, the code has been better vectorized than w/o "fno-builtin".

I don't know if one can conclude effective vectorization with these analogy until unless one understands the behaviour of "fno-builtin" API orit's specific builtin attributes notbeing used to generate the vec-reportmessages.

Can we dig more into it to conclude finally?

~BR

Quoting - Jennifer Jiang (Intel)
Here is more generic answer:

The side effects mean memory accesses could cause loop carried dependencies that preventing loop from vectorization, as the compiler does not know what is the function call would do.

As qouted by you "The "side effects" mean "memory accesses could cause loop carried dependencies that preventing loop from vectorization", as the compiler does not know what is the function call would do."

Do you think it's a bug or compiler limitation "as the compilerdoes not know what is the function call would do?

~BR

This is a completely standard feature of parallel programming and vectorization (since at least 30 years ago). If a function can be in-lined, the compiler has the opportunity to analyze its suitability for vectorization. If a programmer-defined function is suitable for vectorization, often the best course is to push the inner loop (or parallel construct, since I was scolded about this yesterday) inside that function.

Quoting - srimks
Do you think it's a bug or compiler limitation "as the compilerdoes not know what is the function call would do?

No it's not a bug.
Our compiler would know the instrinsic functions in our own libraries, not other libraries.

Jennifer

Quoting - Jennifer Jiang (Intel)

Quoting - srimks
Do you think it's a bug or compiler limitation "as the compilerdoes not know what is the function call would do?

No it's not a bug.
Our compiler would know the instrinsic functions in our own libraries, not other libraries.

Jennifer

Jennifer.

I repeat one of my above message as it seems you missed interpretating.

With "fno-builtin", the number of messages stating "LOOP WAS VETORIZED" is 15 but without "fno-builtin" it's 14, so if we take only "LOOP WAS VECTRORIZED" messages than with "fno-builtin" option, the code has been better vectorized than w/o "fno-builtin".

I don't know if one can conclude effective vectorization with these analogy until unless one understands the behaviour of "fno-builtin" API or it's specific builtin attributes not being used to generate the vec-report messages.

Can we dig more into it to conclude finally?

~BR

As explained in discussion 62977 at http://software.intel.com/en-us/forums/showthread.php?t=62977, a loop that contains a function call cannot be vectorized unless the function call can be inlined. We also mentioned that the -fno-builtin option tells the compiler to disable look up and inline expansion of intrinsic functions. When you compile your code without the "-fno-builtin" option (the default), the compiler will be able to replace certain parts of original code with calls to intrinsic functions when profitable (e.g. replacing the memory operation in the test case in discussion 62977 with a call to _intel_fast_memset for better performance). If profitable, the compiler may also inline the call. If a loop has calls to functions in the intel svml library such as sin, cos, etc, the compiler will inline those function calls and enable the vectorizer to vectorize those loops.

Because of the above, most code **but not all** could potentially benefit when compiled without -fno-builtin. For example, the loop in the test case in discussion 62977 *would not* vectorize when compiled as default (e.g. without -fno-builtin) because the compiler would replace the memory operation in the loop with a call to _intel_fast_memset as it believes is a better optimization than vectorizing it. The loop will vectorize if compiled with -fno-builtin, because the compiler will not convert the loop body to a memset call, but the performance will not be as good as memset call according to compiler's heuristics. Similarly, the test case in this thread produces more vectorized loops (15 loops vs 14) when compiled with -fno-builtin option, but the actual performance might not be as good as the case without -fno-builtin. The best way to determine which option works best is to measure the actual application performance.

In summary, use the "-fno-builtin" option when you do not want any of the compiler intrinsics to be generated, such as when you can't/don't want to link to an intel library. The -fno-builtin option could produce different vectorization and optimization results for different types of code. Some code may run faster and some may run slower with/without the option. The best way to judge the performance affects is by measuring the actual runtime performance rather than by interpreting the optimization reports.

Does that answer your question?

--mark

Quoting - Mark Sabahi (Intel)

As explained in discussion 62977 at http://software.intel.com/en-us/forums/showthread.php?t=62977, a loop that contains a function call cannot be vectorized unless the function call can be inlined. We also mentioned that the -fno-builtin option tells the compiler to disable look up and inline expansion of intrinsic functions. When you compile your code without the "-fno-builtin" option (the default), the compiler will be able to replace certain parts of original code with calls to intrinsic functions when profitable (e.g. replacing the memory operation in the test case in discussion 62977 with a call to _intel_fast_memset for better performance). If profitable, the compiler may also inline the call. If a loop has calls to functions in the intel svml library such as sin, cos, etc, the compiler will inline those function calls and enable the vectorizer to vectorize those loops.

Because of the above, most code **but not all** could potentially benefit when compiled without -fno-builtin. For example, the loop in the test case in discussion 62977 *would not* vectorize when compiled as default (e.g. without -fno-builtin) because the compiler would replace the memory operation in the loop with a call to _intel_fast_memset as it believes is a better optimization than vectorizing it. The loop will vectorize if compiled with -fno-builtin, because the compiler will not convert the loop body to a memset call, but the performance will not be as good as memset call according to compiler's heuristics. Similarly, the test case in this thread produces more vectorized loops (15 loops vs 14) when compiled with -fno-builtin option, but the actual performance might not be as good as the case without -fno-builtin. The best way to determine which option works best is to measure the actual application performance.

In summary, use the "-fno-builtin" option when you do not want any of the compiler intrinsics to be generated, such as when you can't/don't want to link to an intel library. The -fno-builtin option could produce different vectorization and optimization results for different types of code. Some code may run faster and some may run slower with/without the option. The best way to judge the performance affects is by measuring the actual runtime performance rather than by interpreting the optimization reports.

Does that answer your question?

--mark

I think you did answer partially, still difference between useability of "intel_fast_memset()" calls and vectorizations are undecided.

~BR

With the default compilation option (e.g. not using -fno-builtin) the compiler does **both** vectorization and use the built-ins anywhere in the code (not just in the loops) that could benefit from such optimizations. If the compiler heuristics indicate to the compiler that using a built-in would be more profitable than vectorizing a loop, it will use a built-in. When you disable the use of built-ins (e.g. compiling with the-fno-builtin option),parts of your application (not just the loops) that could potentially benefit from using the built-ins won't take advantage of such optimizations. In this case you may get more loops vectorized (e.g. loop body won't be replaced with calls to built-ins making the loop unvectorizable) but you willloseoptimizations (those from the use of built-ins) in other parts of the application so the overall performance might be worse.

To choose the best optimizations for your application, start with thedefault options, and see what kind of actual performance you get. Then change the compilation options (e.g. add -fno-builtin) and see how it affects the performance.

--mark

Quoting - Mark Sabahi (Intel)

With the default compilation option (e.g. not using -fno-builtin) the compiler does **both** vectorization and use the built-ins anywhere in the code (not just in the loops) that could benefit from such optimizations. If the compiler heuristics indicate to the compiler that using a built-in would be more profitable than vectorizing a loop, it will use a built-in. When you disable the use of built-ins (e.g. compiling with the-fno-builtin option),parts of your application (not just the loops) that could potentially benefit from using the built-ins won't take advantage of such optimizations. In this case you may get more loops vectorized (e.g. loop body won't be replaced with calls to built-ins making the loop unvectorizable) but you willloseoptimizations (those from the use of built-ins) in other parts of the application so the overall performance might be worse.

To choose the best optimizations for your application, start with thedefault options, and see what kind of actual performance you get. Then change the compilation options (e.g. add -fno-builtin) and see how it affects the performance.

--mark

Mark,

I think for some small application what you are saying is fine but when we have large multi C/C++ package files and vectorizing such multi package fileswould be difficult to justify with & w/o -fno-builtin. But still, your answer gives some reasons and explain the useability of with & w/o builtin. Thanks.

Could you share the lists of builtin API's for ICC-v11.0 and it's descriptions if you know any ICC documents or articles has?

Is there any ways to call invidual builtin's for specific purpose within a file rather simply calling allby default as currently what the compiler does? (I mean calling of invidual builtin within code becomes user prospectives rather compiler.)

~BR

I don't think anyone but you is harping on this question of whether comparing -fno-builtin with default is worth the effort. You don't seem to be interested in discussing which situations might benefit from avoiding the default, so I don't see your point. Clearly, people at Intel have demonstrated the benefit of the built-ins on some standard benchmarks.
With normal options, you can assure the use of the Intel optimized memcpy() memset() or memmove() by writing those functions into the source code. Specifying one of those is recommended over the corresponding C++ STL. You could also use the names of the Intel optimized built-ins directly, if you want your code to break when using a non-Intel compiler. You might name the specific function from the Intel C support library when making a direct call from Fortran, when you want your code to work only in Intel tools environment.
You could also use the current glibc versions of those functions, if you want optimization for both Intel and AMD. In principle, standard use of #undef and the like would enable you to link the one of your choice.
One could speculate on whether the question of __intel_fast_memcpy() would ever have arisen, if the glibc version hadn't been so poor for so long. Perhaps it's fortunate that such speculation hasn't absorbed attention here.
In 11.x compilers, there is a separate option to engage or disengage automatic calls to svml, if that is your interest.

Leave a Comment

Please sign in to add a comment. Not a member? Join today