Using reducers

Using reducers

I may not have found the right documentation, but page 13 of the CilkPlus specification invites me to write code like the following:

template<class T>
class summation : public cilk::monoid_base<T> {
public:
    typedef T value_type;
    inline void reduce (T * left, T * right) const {*left += *right;}
    inline void identity (T * loc) const {*loc = 0;}
};

I can find nothing stating that I should include a header, and it objects:

waldo.cpp(2): error: name followed by "::" must be a class or namespace name
  class summation : public cilk::monoid_base<T> {
                           ^

waldo.cpp(2): error: not a class or struct name
  class summation : public cilk::monoid_base<T> {
                           ^

waldo.cpp(2): error: class or struct definition is missing
  class summation : public cilk::monoid_base<T> {
                                              ^

waldo.cpp(7): warning #12: parsing restarts here after previous syntax error
  };
   ^

compilation aborted for waldo.cpp (code 2)

This happens for both 13.1.2.183 and the beta.

9 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

cilk::monoid_base is defined in reducer.h

There is an introduction to reducers and their usage in the Intel C++ Compiler documentation which is available at http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-win/hh_goto.htm#GUID-0F63EF23-250C-4093-AB10-822DD1423405.htm

The Cilk reducer files include doxygen annotations which make finding things in them much easier.  See the ReadMe.html in the include/cilk directory for instructions on how to build it.

The reducer library is deliberately written as a set of examples.  You're probably looking for an opadd reducer.  See include/cilk/reducer_opadd.h.

    - Barry

Thanks very much - that helps a lot, but I am very puzzled.  I am trying to write code that assumes only standard C++ and the CilkPlus specification, as is my normal practice - however, the specification describes reducers but not the need for header files (unlike the C++ standard).  Is this an omission in the specification?

An opadd reducer won't cut the mustard, unfortunately - all of the defined reducers are single operand, and what I need is a dot product, which has two operands.  Writing my own reducer shouldn't be a problem, though I am surprised that something as important as the arithmetic dot product isn't provided in the defined list.

Thanks for the comment, Nick.  You are right that the specification should make it clear that certain headers are required. We'll try to fix that in the upcoming revision. We'll also consider the inclusion of a dot-product reducer into the  Cilk Plus library.  If you want to write one and contribute it to the library, see the instructions for contributing code on the cilkplus.org web site.

Dot products are written using sum reducers.  For many of us, this is the most common use for sum reducer; I wouldn't call it "writing my own."

If using C++, inner_product() seems sometimes to optimize better, for the cases where it is applicable, but sum reducers (which might be wrapped in a macro or inline function) do seem more readable.

I need to investigate what the built-in ones will really do, both in terms of the way they are used and the code they generate.  They may be flexible enough to handle pairs of array sections efficiently, but that's not how I read the specification; I may have misunderstood, of course.  I agree that standard C++ inner_product() isn't what is needed for clarity.  Nor is accumulate(); also, while they may optimise better for serial CPUs, their specification isn't exactly parallel-friendly.

There are also good numerical reasons to implement them slightly differently, too.

In order to optimize sum reduction, options such as icl /fp:fast or gcc -ffast-math (specifically -fassociative-math, optionally with -protect-parens) must be in effect to permit batching sums.  Those are inherent in the Cilk(tm) Plus reducers for icc. Too many departures from adherence to language standards are lumped together in these options.   I don't know of any visible control over how many sums are used; the usual argument is that the only option desired is to disable batching of sums and performing them in literal sequential order.  icc typically uses twice as many batched sums as gcc.  I haven't succeeded in installing the gcc development cilk plus branch so haven't tested that.

The recently added icc  #pragma unaligned might be expected to eliminate the influence of peeling for alignment on batching of sums.  I don't think that option is available for Cilk(tm) Plus reducers.

It has been a struggle for compiler developers to gain full optimization of inner_product() and accumulate(), given that the intermediate translation doesn't yield explicitly countable loops (number of operands trivially calculable prior to loop entry), but they have done well enough that it can't be considered a practical consideration.

There's also the admitted issue of how to handle both simd and threaded parallelism in a reducer.  Cilk(tm) Plus reducers currently deal only with simd, but the definition seems intended not to be so restricted.  

OpenMP 4.0 reducers allow specification of threaded or simd or both modes of reduction.  Intel C/C++ doesn't implement OpenMP min/max reducers, so currently it seems better to rely on std::max or ::min (or Cilk(tm) reducer) where appropriate.  OpenMP 4.0 already covers user defined reducers (originally proposed for 3.1), but those seem unlikely to appear on Intel platforms in the next year.  I heard that indexed max/min reducers may be proposed for OpenMP 4.1.

Yes.  The former is where Fortran beats C++ into a cocked hat - the C++ library is obdurately serial, which is precisely what is not wanted.  Unfortunately, some people are trying to make it more so :-(   But I am currently mainly concerned with the usability, and hence the RAS of programs.

I do not regard it as reasonable to bind the actual implementation into the code, because such architectural details are much more short-lived than programs, and are not the sort of thing that the average programmer can or should understand.  I am hoping to find a usage that (in principle) would hide such details from the programmer, and allow reasonable optimisation on a range of machines.  Again, this is where Fortran scores well, but CilkPlus looks like the best hope for C++ at present.

发表评论

登录添加评论。还不是成员?立即加入