_Cilk_for syntax questions

_Cilk_for syntax questions

Hi folks.

I am working with Balaji in the GCC implementation of CilkPlus, and I have a few syntax questions.  I am looking at version 1.1 of the language extension specification.  BTW, if this is not the right forum to ask about standards/syntax questions, please let me know where I may direct such questions.

For _Cilk_for (and similarly for the for loop associated with a #pragma simd), I see:

1. "No storage class may be specificed for the variable within the initialization clause.".  I assume static, extern, and globals are not allowed, but "auto" variables are allowed?  It is not clear from the spec, since auto is also a storage class.

2. There is no talk of casts in the conditional expression in a _Cilk_for.  Can I assume they are disallowed?:

    _Cilk_for (c=0; (int)c < 1234; ++c) // Can we assume this cast is disallowed?

3. The grammar expression "shift-expression" is used but never formally defined.  Is this defined in another document?

4. In the description for #pragma simd, the grammar expression "conditional-expression" is similarly not defined.

5. The specs say "the total number of iterations (loop count) can be determined before beginning the loop execution" and later "the increment and limit expressions may be evaluated fewer times than in the serialization".  Does this mean I can safely hoist any increment or limit expressions before the loop.  That is, rewrite this:

_Cilk_for(i=0; i < foo(); i += bar()) 

into:

tmp1 = foo();
tmp2 = bar();
_Cilk_for(i=0; i < tmp1; i+= tmp2)

Could someone please clarify all this?

Thanks.
Aldy

publicaciones de 13 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Arch D. Robison (Intel)

Questions like yours are good for improving clarity of the specification, so feel free to post them here.  

  1. The intent is to allow only the default storage class (auto) for local variables.  Though I think that explicitly specifying "auto" storage class should be allowed.  The current icc implementation seems to allow "_Cilk_for( auto int i=1; i<10; ++i ) {}" for C++ and "_Cilk_for( auto i=1; i<10; ++i ) {}" for C99.
  2. The control variable must appear without a cast in the loop test expression.  Even no-op casts are disallowed.  For example, icc rejects "_Cilk_for( int i=1; (int)i<10; ++i ) {}".
  3. Yes, shift-expression is defined in the C++ and C standards.  For example, see section 5.8 of the working draft http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3376.pdf .
  4. Section 5.16 of the working draft for C++ defines conditional-expression.  Be aware that the C standard defines it slightly differently.  See Section 6.5.15 of the C11 final draft http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf .  
  5. You can hoist foo() and bar() as long as you use a zero-trip guard for bar().  The reason is that if foo() returns 0, the serialization evaluates bar() zero times and bar() could legimately have a hazard for the zero-trip case (such as dividing by zero).  Here's the example with the guard:

    tmp1 = foo();
    if(0<tmp1) {
        tmp2 = bar();
        _Cilk_for(i=0; i < tmp1; i+= tmp2) ...
    }

Ah, this is much clearer.  By the way, could the specification be updated to reflect the above (particularly the "auto" specifier, as well as the cast issue)?

Another thing, on the loop condition, is an implicit "!=" allowed?

#pragma simd
for (int i=100; i; --i)
a[i] = b[i];

A compiler may fold the above loop condition into "i != 0".  Is this allowed?  I assume so, but the spec doesn't explicitly specify.

Thanks again.

Imagen de Pablo Halpern (Intel)

Thanks for the questions Aldy.  I have made a note to clarify them in the next revision of the Cilk Plus Language Spec..  Hopefully, we'll be able to put out a new version in a reasonable time.  (How's that for being uncomittal?)

STL style loop control

for(x; x != end; ++x)

has been problematical for optimization. It's very helpful when the number of iterations can be calculated in a straightforward way before launching for().  Maybe Cilk(tm) Plus has opportunity to simplify out the weird possibilities short of a linked list or such.

One of the weird possibilities is the insistence (for 32-bit mode) that the loop is defined for the case x > end, even when x is a signed int type, for which wraparound isn't full defined.   As that case must lead to a fault in 64-bit mode, it is excluded from consideration.  Anyway, it makes less sense to allow that version for the case where x < end will do and make the loop clearly countable.

Besides, if you allow that syntax, it's helpful to have warning diagnostics in case what looks like a typo is a typo.

Hi TimP.

Yes, I understand the problem with != and ==, but the reason I ask is because the 1.1 spec specifically allows for "!=":

"The operator denoted OP shall be one of !=, <=, <, >=, or >".

I have verified and icc does NOT allow "!=" in the condition.  I take it the "!=" in the spec is a typo?

Imagen de Pablo Halpern (Intel)

Cita:

Aldy Hernandez escribió:

Hi TimP.

Yes, I understand the problem with != and ==, but the reason I ask is because the 1.1 spec specifically allows for "!=":

"The operator denoted OP shall be one of !=, <=, <, >=, or >".

I have verified and icc does NOT allow "!=" in the condition.  I take it the "!=" in the spec is a typo?

I'm not sure how you verified this, since icc does allow !=.  It is not a typo.  This form is extremely important for STL code, which almost always uses !=. The specification explicitly disallows wrap-around (it is undefined behavior), so the loop is countable.  There is a table in the specification that maps the increment expression to the loop-count formula.

-Pablo

This is how I verified.  Am I doing something wrong?

(For that matter, even "==" seems to be disallowed).

Aldy

houston:~$ cat a.cc
int NNN;
int *a, *b;

void foo()
{
int i;
#pragma simd
for (i = 999; i != NNN; i--)
{
a[i] = b[i];
}
}
houston:~$ icc -c a.cc
a.cc(8): error: the for statement following an SIMD for pragma must have a conditional of the form <index> {<,<=,>=,>} <expr>
for (i = 999; i != NNN; i--)
^

compilation aborted for a.cc (code 2)
houston:~$ icc --version
icc (ICC) 13.1.1 20130313
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

Imagen de Pablo Halpern (Intel)

My apologies, Aldy.  I thought you were talking about cilk_for (as per the title of this thread) instead of #pragma simd.

In this case, the spec is correct but the implementation is wrong. There is an open bug report on this problem.

gcc should follow the spec, rather than the icc 13.1 implementation. (I.e., do as I say, not as I do. :-) ) The table for cilk_for gives the correct computation for the loop count for both cilk_for and #pragma simd.

Pablo. Actually, I was talking about either cilk_for or pragma simd since the spec says that for #pragma simd, "The loop's control clause and body are subject to the same restrictions as in a _Cilk_for loop".  I assume this to mean that the syntactic restrictions for both _Cilk_for and pragma simd are the same.

But point taken, "!=" is allowed.

Thanks.

Hmm, one more question.

If != is allowed, the loop count is not easily calculated if the stride is not a constant (well, at least not without branches):

// incr = -1
for (i=N; i != limit; i += incr)
  [body]

Is it intended that the compiler calculate the sign of INCR beforehand to calculate the loop count?  That is, generate something akin to:

if (incr > 0) count =  (limit - N) / incr;
else count = (N - limit) / -incr;
for (i=N; count > 0; --count) 
  [body] 

Sorry for the barrage of questions.  I just want to make sure I'm not misunderstanding things.
Aldy

Imagen de Arch D. Robison (Intel)

Yes, the compiler must calculate the sign of INCR before computing the trip count.  The table for "Loop count" in the spec has the if-then-else.   

Imagen de Arch D. Robison (Intel)

By the way, C11 and C++11 define division of signed numbers in a way that makes the branch unnecessary.  That is the identity (a-b)/c==(b-a)/-c holds.  This identity holds for most modern hardware too (certainly IA-32 and Intel64).  gcc's internals might have a flag somewhere that tells you if you can rely on that identity (the property might be called something like "division rounds towards zero".)

Inicie sesión para dejar un comentario.