Recent posts
https://software.intel.com/en-us/recent/510238
enPhase 2 of the Threading Challenge 2011 - Cancelled
https://software.intel.com/en-us/forums/p1-a3-running-numbers/topic/281495
<p>We know that many of you have been waiting for details of Phase 2 of the Threading Challenge 2011 to be announced. With our apologies, we must announce that Phase 2 of the Threading Challenge 2011 has been cancelled due to contest resources being re-assigned to new software development projects. So, there is not an adequate number of contest team members to manage and judge Phase 2 of the competition in a timely manner.We are sorry to disappoint those that were looking forward to the next phase of the competition, but we encourage you tolook for future competition news in 2012.</p>
<p>Thank you for all your interest and participation in the Threading Challenge 2011.</p>
Wed, 19 Oct 11 09:16:32 -0700Jeff Kataoka (Intel)281495Congrats
https://software.intel.com/en-us/forums/p1-a2-consecutive-primes/topic/282482
<p>Hi all,</p>
<p>I would like to congratulate all participants, but specially to Rick for winning the second problem and for achieve a fantastic performance. From my side, I am happy to have obtained the third position. My trial division algorithm is nothing special, and even when some improvements can double its speed (gross estimation), it is not good enough to beat Rick's solution. Vovanx86 also did a fantastic job, and there are also some very fast solutions who did not get a better result because of correctness problems. </p>
<p>Once again, Congratulation to everybody. Good job:o)</p>
<p>-Miguel</p>
Fri, 05 Aug 11 01:15:07 -0700jmfernandez282482The suspense is killing me!
https://software.intel.com/en-us/forums/p1-a2-consecutive-primes/topic/282648
<p>I keep checking in here to see if there's any announcement on judging Consecutive Primes. It took six weeks to finish judging the first round (Maze of Life). It has now been seven weeks since this problem closed and no word yet on when the winners will be announced. I sure hope they can score the third round (Running Numbers) before the Grand Prize becomes moot.</p>
<p>One challenge for the judges in scoring Consecutive Primes is going to be deciding where to draw the line on pre-computed primes and powers. Rama <a href="http://software.intel.com/en-us/forums/showpost.php?p=151356">suggested a limit of 20</a> but the contestants had widely differing interpretations on this point. This should be interesting!</p>
<p>- Rick</p>
Mon, 25 Jul 11 15:29:10 -0700dotcsw282648Post-mortem
https://software.intel.com/en-us/forums/p1-a2-consecutive-primes/topic/283077
<p>Problem 2 was a fun one (the one i liked best to be honest) these are the optimalisations i used:</p>
<p>1) Prime generation</p>
<p>Generating primes is fun and can be done quickly but there's one thing even better not having to calculate them! However a table of all primes < 2^32 turned out to be about 800+ megs, too massive,the load time will have eaten me alive. Plus a binary search to find the proper start end end indexes wouldn't have been too snappy either.</p>
<p>But what if there's a way to cut that massive table down in size and pretty much give us a direct index to the numbers we want? Well all primes > 5 you can express as 30k1, 30k7, 30k11, 30k13. Which is exactly 8 bits! Cuts the table down to a nicer 136 megs and it'll give you instant primes!</p>
<p>2) Determining if its a power or not</p>
<p>2a) Do we even need to bother checking a number? </p>
<p>Numbers that are a power of something have a funny property in binary, if you look at the lowest nibble it will *NEVER* be 2,6,10,12 or 14, so that quickly gets rid of about 30% of numbers with a single 'and' instruction and a few compares.</p>
<p>2b) The numbers that are left after that:</p>
<p>The sum of all primes in play is 425649736193687430, which square root is 65241837.5119591 meaning the highest number we would ever see as a base is 65241837. So i figured i'd loop though all numbers [2..65241837] ^ [2..100] see if its below 425649736193687431 and store the base+power+value (only the lowest power+base for numbers that are multiple powers) in a lookup table (i bet by now you all are going 'damn this guy *really* likes his lookup tables' ) </p>
<p>Well turns out that table gets *BIG* really quickly but is quite managable for powers > 2. </p>
<p>But what about the powers of 2 then? </p>
<p>Well powers of 2 have the interesting property that the lower nibble always is 0,1,4 or 9, so if its that run a good old sqrt and see if its a square or not (i tried a lookup table here too but sqrt turned out to be faster)</p>
<p>Further improvements for powers > 2</p>
<p>Initially i had them all in a big sorted list which i did a binary search on which worked well but due to the size of the table not the best performer. So the speed it up i turned it into a really basic hash table using bits 21-43 of the number as a hash which gave me less then 10 numbers in most buckets which is stupidly fast to search through.</p>
<p>3) Threading</p>
<p>Just a parralel loop though the primes adding them up, not much to it really this was by far the easiest of the 3 problems to thread.</p>
<p>Most of my time on this problem was spend trying to figure out why the 40 core windows MTL box *REFUSED* to use all of the cores using both openmp or TBB, you always ended up on a random processor group (either cores 0-9 or 10-39) but never on all of them. Turns out that in the intel v11 compiler which was on the box OpenMP was not aware of processor groups (new in win7/2008R2) and TBB (which was aware) had a subtile bug in the code that assigned threads to cores. Found the bug made a quick work around (Details are somewhere in a thread in the TBB forum) and figured my solution would definitly have an edge over other ones that would end up not using all cores... and then intel moved us all out of the box cause it 'had issues' (i said it before, i'll say it again: BOO!) </p>
<p>In the end i ran out of time and the code ended up being a bit (and by a bit, i mean ALOT) messy but functional.</p>
<p>Warning due to the *MASSIVE* lookup tables the code is a whopping 112 megs compressed.</p>
Tue, 28 Jun 11 19:08:47 -0700lazydodo283077Post-mortem
https://software.intel.com/en-us/forums/p1-a1-maze-of-life/topic/283078
<p>Since the post mortem thread in A3 was kinda interesting lets have a go at the other two problems as well</p>
<p>This problem was an interesting one for me, I figured its a threading contest, i bet we get to use threads! However the best solution seemed to be single threaded. (Weird!)</p>
<p>The best optimalisation I could come up with was a lookup table for every 5x5 block on the field you could determine the inner 9 cells by just looking in a big ass table, turning this problem from a heavy test problem into a simple lookup problem. and by running a single 5x5 block it was easy to determine possible spots to move to in the next cycle. One last optimalisation was to check the current cell's location compared to the target cell and try the possible directions that would get us closer (or atleast in the right direction) to the target cell first. </p>
<p>I kinda misread or mis understood the problem description and figured there would be more points for performance then for the 'shortest' route (boy was that a mistake) and opted to have a go at trying to be fastest, looking back yeah big mistake, but hey on timing points alone still got 12 points out of it.</p>
<p>(warning due to the lookup table download is about 16 megs)</p>
Tue, 28 Jun 11 18:11:52 -0700lazydodo283078It's been fun
https://software.intel.com/en-us/forums/p1-a3-running-numbers/topic/283090
<p>The past nine weeks have been fun, educational, and a lot of hard work! I'm glad I got to know some of you here on the contest forums.</p>
<p>I signed up for the Apprentice Level to learn more about multi-threaded programming. Well... mission accomplished. This third problem also taught me the finer points of SSE, which was something I'd been meaning to learn.</p>
<p>So enjoy the summer and we'll meet again for Phase 2 after the Developer's Conference. If I'm fortunate enough to win the grand prize this year or any year, my pledge is to move up to the Master Level. Anyone who wins the Apprentice Level is a threading master by definition. I'm <a href="http://software.intel.com/en-us/articles/intel-threading-challenge-2010-winners/">looking at you</a>, duncanhopkins. Give the rest of us a chance! ;-)</p>
<p>Now I need to go lie down,</p>
<p>- Rick LaMont</p>
Mon, 27 Jun 11 20:08:14 -0700dotcsw283090__m128i type
https://software.intel.com/en-us/forums/p1-a3-running-numbers/topic/283129
<p>I am having an issue with the compilation in the MTL. I am using MSVC++ in home and I have a linux MTL account, so I am implementing my solutions in a portable way. For this problem I think that using SIMD instructions is the right choice, so I am using the __m128i type. </p>
<p>Well, In the Windows related headers the type is defined as a union with fields like "__int8 m128i_i8[16]" and so on. But I am getting a compilation error when compiling in the MTL machine. The problem seems to be a different union definition, but I can find a compatible union definition in the MTL headers, so I don't understand why I am getting that error. I have solved this issue using my own independent type, but I would like to know why I am getting that error. Maybe the compiler is using other definition in a different header file? More people with same problem?</p>
Sat, 25 Jun 11 06:18:41 -0700jmfernandez283129Get Bonus Points for Your Threading Challenge entry scores by participating in the Forums
https://software.intel.com/en-us/forums/p1-a3-running-numbers/topic/283170
<p>Just a reminder: You can get bonus points added to your problem entry score by participating in each problem's forum. Earn 5 points for each forum post, up to a maximum of 25 points per problem. The forum points can make a difference in your final entry score. So, take advantage of this for this last problem.</p>
Thu, 23 Jun 11 11:50:55 -0700Jeff Kataoka (Intel)283170P1:A1 Maze of Life, Apprentice Problem - Judging and Scoring Criteria & Methods for Selecting Our Winners
https://software.intel.com/en-us/forums/p1-a1-maze-of-life/topic/283209
<p><strong>Threading Challenge 2011 -Maze of Life, Apprentice Problem 1: Judging & Scoring Criteria and Methods</strong>
</p>
<p>As of June 21, 2011, we announced the winners forApprentice Problem 1, Maze of Life. Our group of Judges used the following judging and scoring criteria and methods for selecting the winners. In addition, you will find a link to the testing results and the scores further below.</p>
<p>Apprentice Level Problem Set 1 (P1A1) Maze of Life<strong>Key Scoring Principles</strong>
<p>Basic scoring principles used for the contest entries judging are described at the <a href="http://software.intel.com/en-us/articles/intel-threading-challenge-2011-official-rules/">official rules</a> page. Here is a short summary: each contest entry was scored according to the following criteria: 1) up to 100 points for solutions performance (speed); 2) a maximum of 25 bonus points for a contestants activity in the forum, calculated as 5 bonus points for each valid forum post/reply.</p>
<p>There may be multiple solutions to a given puzzle. A bonus of 5 points per puzzle awarded to those entries that find the shortest path. Sometimes bonus was awarded to multiple entries outputting the same length path, even if those entries discovered different solutions. Please find the detailed problem set description at the <a href="http://software.intel.com/en-us/contests/threading-challenge-students-2011/codecontest.php">Apprentice Level</a> contest page.</p>
<p><strong>Input Data Sets Used for Performance Scoring</strong>
<p>Ten different input data sets were used to compute the execution score for this problem. The simplest one is a 7x7 Maze of Life grid given as an example in the problem set description. The hardest is a large 300x300 grid. Full archive of input data sets can be <a href="http://software.intel.com/file/m/37224">downloaded here</a>. </p>
<p><strong>Points in Performance Scoring</strong>
<p>Each input data set was judged individually based on a ranking scheme. The weights of data sets used in performance scoring were equivalent. The overall performance score was calculated as a sum of all ten input data sets individual performance and bonus points.</p>
<p>We allowed a total of 120 seconds execution (2 minutes) maximum for each input set; for those runs that took longer than 120 seconds or had runtime errors during execution, zero performance points were awarded. Some entries that could not be built on the MTL got zero points as well.</p>
<p>Successful contest entries that found smart cell path in less than 2 minutes were ranked based on their execution time and got performance points according a reciprocal rank scale. For example, the fastest solution[s] of a data set got 5 points, next solution got 2.5 points, then 1.67 points and so on and so forth.</p>
<p>On top of that all the solutions that found a shortest path in particular input data set got 5 bonus points. Thus a successful solution could get a maximum of 10 points per input data set if it finds the shortest path and demonstrates the best performance.</p>
<p><strong>Execution Results and Point Spread</strong>
<p>Weve received 28 contest entries in the Apprentice Level Maze of Life problem set.</p>
<p>Six entries successfully solved all 10 grids. One entry solved 9 out of 10. Two entries solved 8 grids, four solved 7 Unfortunately, nine entries were incomplete and therefore unable to solve any test data sets. </p>
<p>All the timings and performance points are available in the final Maze of Life scoring table below.</p>
<p><strong>Bonus Points for shortest paths found</strong>
<p>As for the results in terms of shortest paths, they vary. Only one participant provided a solution that always found a shortest path on the grid. Interestingly enough, none of the entries that solved all 10 grids were able to find more than 2 shortest paths. Therefore, they did not get significant bonus points. All the bonus points are available in the final Maze of Life scoring table below.</p>
<p><strong>Forum Activity and Bonus Points</strong>
<p>Additional bonus points were given for contestants forum posts made before the problem entries were closed. Five points per post (maximum 25 points possible) were awarded.</p>
<p><strong>Entry points and penalties.</strong>
<p>Each contest entry got 100 entry points. A penalty of 50 points was taken off in case the entry is not able to solve simple data grid given as an example in the problem set description.</p>
<p><strong>Winners</strong>
<p>The problem winners based on highest point total are:</p>
<p>1) VoVanx86</p>
<p>2) krivyakin</p>
<p>3) jmfernandez </p>
<p>These three contestants provided the solutions which resolved the maximum number of our test grids. They also had the fastest overall code execution and a fair amount of bonus points for the shortest paths.</p>
<p><a href="http://software.intel.com/file/m/37185">Access Judging Score Card for Maze of Life, Apprentice Problem 1</a></p>
Tue, 21 Jun 11 16:40:08 -0700Jeff Kataoka (Intel)283209Has anyone tried 574395734 cycles?
https://software.intel.com/en-us/forums/p1-a3-running-numbers/topic/283369
<p>I was fustrated that my code cannot work the 4774 cycles until I found out on this forum that adding cycle 0 makes it work. I agree with the rest of you that if cycle 0 is required then the walkthrough is very misleading.</p>
<p>But I don't feel good with one test case only, I want to verify the other test case as well. Has anyone tried the 574395734 cycles test case? I want to confirm that "cycle 0" is necessary before I proceed.</p>
Sun, 12 Jun 11 13:05:48 -0700kayson283369