More Examples

More Examples

Ritratto di lazydodo

Since the Example thread in the previous two stages always was a hit lets have one for this one.

Since i'm not entirely sure about my implementation i'll post a few easy to verify ones and we'll work our way from there to the more interesting stuff, as always huge disclamer : 'I validated these using the same code that solved the intel examples correctly, i cannot however rule out any mistakes, if you are getting different results its probably me screwing up not you'

The format is the same as the intel examples with one extra column

[Source] [ByteAdd] [DwordAdd] [SolveCycles]

8AF59446C631DDCF10B5493FCBA37056 E76117A796CA78991315228184795EA8 39F890F96E633900FDCCB8F8DCF9D096 7196
255827EB68F924D5734A6FB8A5357EA6 0CD68CE5D0E3445B97009E188FACC76A 0005E57CB24F5304B9371AC60ED533DA 21593
80BE9ECCEF2D2E1654B5F11D25F77F06 5033D9D66621C0FFE36D1FCFBACB9C2B A41B72AC586B0CE69AB35EF9EBB51876 7179
1DC95205DF9604403434AA2E4DAE8191 1102BDA8D2181BF23C3849F48CB0831C DD2CAEF30CA06E346FE4DF4AE0130B4F 5039
EE430DEE716BDB4120542945B3E2B021 FFC26872229809122B4FDA94CEBAB0AE 915FA2CE123D51A723696017558562BF 11419

36 post / 0 new
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di duncanhopkins

Hi,
Thanks for these.
I would confirm your result but as yet I have not been able to get the Intel 4774 example to work. :(

Any chance you could post the first 100 cycles or so of these. I would be eternally greatful.

Thanks.

Ritratto di VoVanx86
Quoting duncanhopkins Hi,
Thanks for these.
I would confirm your result but as yet I have not been able to get the Intel 4774 example to work. :(

Any chance you could post the first 100 cycles or so of these. I would be eternally greatful.

Thanks.

Hi.
In the Intel 4774 example cycles starts from 0 ( not from 1 )
Is this a mistake?
My examples:
start iteration: 1

fab688c62e82bb30ee9e81f8168c602a 38040301052B0163A103400502060501 05ED2F440000B17B0000000100000036
result: 10 000 000 000

b960952f17e313f7641ea809a3b9de7b 38040301052B0163A103400502060501 05ED2F440000B17B0000000100000036
result: 1 000 000 000

start iteration: 0

c1f1870b77abb0f35a07800d1b0dd407 38040301052B0163A103400502060501 05ED2F440000B17B0000000100000036
result: 10 000 000 000

1c934d6484c6f344e43ca5bc739309b4 38040301052B0163A103400502060501 05ED2F440000B17B0000000100000036
result: 1 000 000 000

Ritratto di Rama Kishan Malladi (Intel)

Hi,
Can you clarify why you indicate that the cycles start from 0 and not 1? The cycles start at 1 and not 0.

Also, I don't understand your example. Can you please help me understand?

Thanks
-Rama

Ritratto di VoVanx86
Quoting Rama Kishan Malladi (Intel) Hi,
Can you clarify why you indicate that the cycles start from 0 and not 1? The cycles start at 1 and not 0.

Also, I don't understand your example. Can you please help me understand?

Thanks
-Rama

Hi,

In 4774 exaple (if you take 4774 cycles) first action is DWORD adding.

Cycle 0:
DWORD ADD
Cycles 1-36:
BYTE ADD
....
4774 cycles

If i start from 1 cycle i can't reach 0 in 4774 cycles in this example.

Example code:
int iteration=0;
while(1)
{
if(++iteration%37) //may be "if(iteration++%37)" in your code ???
{
BYTE ADDING;
}
else
{
DWORD ADDING;
}

}
}

Ritratto di duncanhopkins

If I change my loop code from:

36 * Add BYTE
1 * Add DWORD

to

1 * Add DWORD
36 * Add BYTE

I manage to terminate on cycle 4774.

I think the confusions is that the examples/walkthough start by doing BYTE adds and then DWORD adds but part of the description says:

"If Cycles modulo 37 is zero then use DWORDs instead of BYTEs", which means that you have to start on cycle 0 to get a DWORD add first.

Ritratto di VoVanx86

Where is truth?
Is example with 4774 cycles are right or wrong?

Ritratto di duncanhopkins

Both I think!!

My code is a loop that does a DWORD and 36 BYTE adds. So no modulo.
If I start with cycles = 1 then I get 4774 for my ternimation count. But if your using modulo to work out if you need todo DWORD or BYTE then it looks like you need to start at 0 and adjust you printed result by 1.

Ritratto di VoVanx86

But,
Rama Kishan Malladi (Intel) wrote:

Can you clarify why you indicate that the cycles start from 0 and not 1? The cycles start at 1 and not 0.

My code is a loop that does a 36 BYTE and 1 DWORD adds. and this is right code (how i can see in Problem Description)

And i can't use Demo Scenario and Test Input with my programm...
So, i wrote both examples (with 1 DWORD -> 36 BYTE scenario and 36 BYTE -> 1 DWORD)
P.S. I think we need new examples from Judges.

Ritratto di Rama Kishan Malladi (Intel)

Hi,
I will verify and confirm the correctness of the demo scenario.

Thanks
-Rama

Ritratto di duncanhopkins

"Since the Example thread in the previous two stages always was a hit lets have one for this one.

Since i'm not entirely sure about my implementation i'll post a few easy to verify ones and we'll work our way from there"

Using the DWORD BYTE BYTE add I get the same results as you have posted.

Ritratto di lazydodo

The way i interperted it was

Cycles start at 0

cycles % 37 = 0 so DWORD ADD

add 1 to cycles (cause we did an add)

Check result

cycles % 37 != 0 so BYTE ADD

add 1 to cycles (cause we did an add)

Check result

etc etc etc

Ritratto di lazydodo
Quoting duncanhopkins "Since the Example thread in the previous two stages always was a hit lets have one for this one.

Since i'm not entirely sure about my implementation i'll post a few easy to verify ones and we'll work our way from there"

Using the DWORD BYTE BYTE add I get the same results as you have posted.

Lets wait for the jury to give the final verdict on the order and i'll get some more examples up with higher cycle counts as results.

Also when you feed your code random values for start,add bytes and add dwords, do you notice an distinct upper bound on this problem?

Ritratto di VoVanx86
Quoting lazydodo

Lets wait for the jury to give the final verdict on the order and i'll get some more examples up with higher cycle counts as results.

Also when you feed your code random values for start,add bytes and add dwords, do you notice an distinct upper bound on this problem?

With random values: "upper bound" = 37*2^32.

At 158913789952cycle i got initial values.

Ritratto di lazydodo

The problem statement says 'or' you think we could get away with

int main(int argc, char **argv)
{
printf("158913789952\n");
return 0;
}

and claim its because of the out of order execution of the threads? :P

Ritratto di VoVanx86
Quoting lazydodo The problem statement says 'or' you think we could get away with

int main(int argc, char **argv)
{
printf("158913789952\n");
return 0;
}

and claim its because of the out of order execution of the threads? :P

If you can't get initial values or zero with number of cycles < 37*2^32, then at 158913789952 cycle you get initial values (anyway)

37*2^32 = upper bound of cycles

Ritratto di dotcsw
Quoting VoVanx86
I think we need new examples from Judges.

At this point I'd rather have an update to the Problem Description and Walkthrough. I've already written my program to match the results under Demo Scenario and Test Input because it seemed likely that "correct output" would be determined by comparison to the standard reference implementation. My experience was essentially the same a Duncan's: implemented per walkthrough, failed demo scenario, figured out why, and fixed it.

This topic has become less about examples and more about an error in the problem statement. How did you generate your examples, lazydodo? Did you start at all zeros and run your program in reverse? Once I figure out how to create good examples I will contribute some.

- Rick

Ritratto di lazydodo

I randomized addBytes and addDwords and cycles, initalized Start with 0x00's then ran the program in reverse till cycles hit 0.

Ritratto di dotcsw

As promised, here are some bigger examples following lazydodo's format of [Source] [ByteAdd] [DwordAdd] [SolveCycles in decimal]:

365607bc731db54c82afa48a829ce0b4 7f8b22d034a08b997d84e9f12c799ebe 55f825ec17b9471020afad2615eaa31c 431787735
34c1f6c2136fac5ada956b90ee253820 4d23828a159066b4771a7d5723c4c8c9 4654f2135282743d1ce5b57503876323 1483695460
4851cba6debcd7a06543fd16172b5978 1bbffa833f0e0f733ca510f5044ef24a 39499f0b48e0f3ec3cca214b7a1e0474 2626626062
9673f20645881f0bb588fb1181bb9f77 6b53ca5b7896676914e035e73fa73ce1 3d69394871bef1270be1427d62319b0b 4076988879
d6b610c012e049601632f70094c145c0 1bbffa833f0e0f733ca510f5044ef24a 39499f0b48e0f3ec3cca214b7a1e0474 15332557248

- Rick

Ritratto di Oswald Van Ginkel

I agree that there is a problem with the example (especially the 4774 which requires a zero cycle).

Quoting:

"If for example we take the lowest byte then the program flow should
start with 0x70 and add 0x01 every cycle up to 37 cycles. On the 37th
cycle the buffer is converted to a collection of DWORDs and 0x00000036
is added instead of the per byte addition. If Cycles modulo 37 is zero
then use DWORDs instead of BYTEs."

My deductions:

Since at zero cycles nothing should in fact have been done, the solution to all problems is in fact 0 cycles. No additions occurred yet and thus the starting buffer is equal to itself!

exlusively or

You should in fact start with cycle 1. (we count in natural numbers and set boundaries with whole numbers)

Therefore modifying your program to start on a non-existing cycle 0 just to get to that faulty 0%37=0, which Intel must have included in their loop in order for their faulty examples to rise into existance would clash with the problem desciption and I quote:

"Write code that receives an input in the size of 16 BYTEs and runs
until all bytes are zero or repeat the original input bytes. Every
cycle of the program a value is added to each BYTE separately. Every 37th cycle, the buffer is converted to four DWORDs and a value is added to each DWORD separately. For 38th
cycle, the DWORD is converted back to 16 BYTEs and the addition process
repeats (another 37 cycles each). Since addition of fixed size values
is cyclic itself, after enough iterations through this process, the
value of the array will either repeat the initial values or each
becomes zero. The goal of your code is to compute the number of cycles
required to reach all zero values simultaneously or detect the
repetition of the original input values."

Thus no zeroth cycle would be included...

Intel must revisit their implementation - as is being done, thanks.

_________________________________________________________________________________________

Also the words:

"every cycle up to 37 cycles"

is misleading, since "up to" almost seems to include the 37th cycle as a byte addition (cf. WordWeb dictionary - which lists "equal to" as a synonym of "up to") and then only this operation is excluded by:

"If Cycles modulo 37 is zero
then use DWORDs instead of BYTEs."

I'd prefer "every cycle up to and including the 36th cycle."

_________________________________________________________________________________________

That concludes this. If you think the above is just too much then I condole you, since I always thought that programmers where into language.

Regards and friendly greetings too all.

Ritratto di Oswald Van Ginkel

By the way...

The moment I submitted this post and a few before I encountered a php var_dump of an array which could disclose information which is possibly not intended for me to see.

If your web development/maintenance team could be informed of this, I think that they would attach some value.

Ritratto di Rama Kishan Malladi (Intel)

Hi,
Thanks for the detailed feedback. First, I have informed the web team that there is a var_dump before the post gets published on the forum. I have seen it myself and we will get it fixed by the web development team as feasible.

-Rama

Ritratto di Rama Kishan Malladi (Intel)

Hi,
I was able to verify the demo example and reproduce the inputs after 4774 cycles under the following conditions:

1. The starting cycle is 0.
2. The DWORD addition is done at cycle 0, followed by BYTE additions for the following 36 cycles. Then on 37th cycle do another DWORD addition and so on...
3. You have to adjust the output cycles printed, offset it by +1 as the cycle start at 0.

Sorry for some confusion we created and misprint in the problem statement. The walkthrough would change according to the above description. Please find the updated walkthrough below:

Input Source is: Byte A: 0x40, Byte B: 0x10
Input BYTE adder is: BYTE A: 0x01, BYTE B: 0x20
Input WORD adder is: 0x1133
Starting Point: A = 0x40, B = 0x10
Before Cycle 0: WORD BA = 0x1040
Cycle 0: BA + 0x1133 = 0x2173
Before Cycle 1: BYTES A = 0x73 and B = 0x21
Cycle 1: A = A + 0x01, B = B + 0x20 Completion: A = 0x74, B = 0x41
Cycle 2: A = A + 0x01, B = B + 0x20 Completion: A = 0x75, B = 0x61
Before Cycle 3 : As WORD the combined value of BA is 0x6175
Cycle 3: BA = BA + 0x1133 Completion: BA = 0x72A8
Before Cycle 4 : As BYTES A = 0xA8 and B = 0x72
Cycle 4: A = A + 0x01, B = B + 0x20 Completion: A = 0xA9, B = 0x92
...

Thanks
-Rama

Ritratto di duncanhopkins

"
As promised, here are some bigger examples following lazydodo's format
of [Source] [ByteAdd] [DwordAdd] [SolveCycles in decimal]:
"

I am also getting these results.

Ritratto di dotcsw
Quoting Rama Kishan Malladi (Intel) 1. The starting cycle is 0.
2. The DWORD addition is done at cycle 0, followed by BYTE additions for the following 36 cycles. Then on 37th cycle do another DWORD addition and so on...

Excellent. I believe this was the best way to resolve the discrepency (although the last sentence should read "cycle 37" or "the 38th cycle", see below.)

3. You have to adjust the output cycles printed, offset it by +1 as the cycle start at 0.

No correction is necessary here. There are two concepts: the cycle index and the number of cycles. We are to report the number of cycles, not the index of the last cycle. It doesn't matter if the index of the first cycle is 0, 1, or -382, it still counts as a cycle. All you've done is to change the indeces from being 1-based to 0-based. We're C programmers (for the most part). We can handle it. :-)

Quoting Oswald Van Ginkel Since at zero cycles nothing should in fact have been done, the solution to all problems is in fact 0 cycles. No additions occurred yet and thus the starting buffer is equal to itself!

This fallacy arises from confusing zero cycles with cycle number 0. Each cycle begins with an addition and ends with a comparison. That includes cycle number 0 - the first cycle. The minimum number of cycles is 1.

- Rick

Ritratto di hitesh.sharma

Hereby I also confirm that for both demo scenario & long real scenario the results appear, as expected, only if we start with cycle 0.

Logic
cycle is zero
if cycle mod 37 is zero
add DWORD and cycle++
else
add Byte adder and cycle++

Output = cycle + 1(as it start from zero)

Ritratto di Oswald Van Ginkel

I get these times for the above (dotcsw's examples) - increasing complexity on 1.8Ghz Core 2 Duo, single process thread

17.687 seconds
30.984 seconds
48.593 seconds
180.703 seconds

Ritratto di lazydodo

You still have some room left to optimize, the last one is doable sub 5 seconds (multi-threaded, on the 32 core windows mtl box).

Ritratto di dotcsw

I think you two are saying the same thing. 180 seconds / 64 threads < 5 seconds. Get your program multi-threaded, Oswald, and you should have a strong entry.

Has anyone come up with a way to create tests that repeat their initial values (in fewer than 37*2^32 cycles)? We could use some non-trivial examples like that.

- Rick

Ritratto di krivyakin
Quoting dotcsw I think you two are saying the same thing. 180 seconds / 64 threads < 5 seconds. Get your program multi-threaded, Oswald, and you should have a strong entry.

Has anyone come up with a way to create tests that repeat their initial values (in fewer than 37*2^32 cycles)? We could use some non-trivial examples like that.

- Rick

Unlikely you will can get linear scalability on this task...

In this test repeating initial values is on 256th step:

00000000000000000000000000000000 11111111111111111111111111111111 11111111111111111111111111111111

Ritratto di duncanhopkins
Quoting krivyakin Unlikely you will can get linear scalability on this task...

Do you want to explain that one?
I must admit at first sight of this problem my thoughts were 'so how am I going to thread that one'

Ritratto di lazydodo

05138273891739871938712983792873 00000000000000000000000000000000 00000002000000300000000005010000 79456894940

Have to admit not entirely sure about this one, should repeat source at 79456894940 cycles, can someone validate?

Ritratto di VoVanx86
Quoting lazydodo 05138273891739871938712983792873 00000000000000000000000000000000 00000002000000300000000005010000 79456894940

Have to admit not entirely sure about this one, should repeat source at 79456894940 cycles, can someone validate?

Confirmed

Ritratto di duncanhopkins
Quoting VoVanx86
Quoting lazydodo 05138273891739871938712983792873 00000000000000000000000000000000 00000002000000300000000005010000 79456894940

Have to admit not entirely sure about this one, should repeat source at 79456894940 cycles, can someone validate?

Confirmed

Also confirmed, after a bug fix.

Ritratto di duncanhopkins
Quoting duncanhopkins
Quoting krivyakin Unlikely you will can get linear scalability on this task...

Do you want to explain that one?

I just test my code on the MTL and I am getting linear scaling of 95% (gradient of speed up per core).

That is until I hit the hyper threads then I get a lot less.

Or have I miss understood your comment?

Ritratto di dotcsw
Quoting lazydodo should repeat source at 79456894940 cycles

That's a neat test because it repeats at approximately the midpoint of the search space: 37*231. I had rigged up another test that repeats at exactly 37*225. Like yours, the input exhibits a contrived pattern:

e5fea336e5fea336e5fea336e5fea336 39499f0b39499f0b39499f0b39499f0b 1bbffa801bbffa801bbffa801bbffa80 1241513984

Thanks for your contribution,

- Rick

Accedere per lasciare un commento.