Starting late? How to get started ... quickly

Introduction

If there still people who, just like us, weren't abble to start before, or simply want to quickly modify their current solution this article might interest you. Basically, this article will explain you how to start coding and testing as soon and quickly as possible, giving you a few hint and some suggestions.

So first, if you haven't read the subject and the rules, go take a look to these two links:

Goals and rules

The Main page:

/fr-fr/articles/Acceler8EN It briefly describes what should be the inputs and outputs of your program as well as how the intel team will grade your work.

The rules:

/fr-fr/articles/Acceler8RulesFR This page gives more detail about the grading procedure and the general rules you have to comply with in order to take part in this contest.

Additional information

However, the technical aspect of the contest is only vaguely tackled in these pages and some crucial aspects have only been discussed on the forums so here is a sum-up of things you should care about when implementing your solution:

About the first argument: number of thread

As it has been discussed over several topics on the forums, the first argument of your program is the maximum number of worker threads that you are allowed to launch. You can choose to launch fewer threads but never more. A worker thread is a thread that is greatly involved in running the algorithm, so you can have an administration thread (generally the thread that is used to start your program) that will not be considered as a worker thread. The CPU load of this admin thread has to be very low in comparison with those of the worker threads. Generally during the execution of the algorithm, this admin thread will be waiting for the worker threads to finish their respective tasks with some sort of synchronization mechanism (semaphores, signals, messages, conditions...) (source. Paul Guermonprez, http://software.intel.com/fr-fr/forums/showpost.php?p=166269)

About multipe solutions

This has been discussed an excessive amount of time on the forums. It may happen that for a given input array, there exist more than one solution (ie. multiple subarrays with the maximum sum). In the case of a multiple solutions problem, if your program only outputs a single solution among all the possible solutions, your program will be considered valid. However, more points will be given to programs that output every solution. So if you can return all the posible solutions instead of only one without causing huge extra amount of time, go for it! (source. Paul Guermonprez, http://software.intel.com/fr-fr/forums/showpost.php?p=166269)

Parallelization

First step: writing a sequential algorithm

Once you have acquainted yourself with all these rules, start coding as soon as possible! The first thing you will want to do is finding an efficient sequential algorithm. Fortunately, this problem is a classic and many people have already worked on it. You may find a lot of algorithm that solve this problem for 1-dimensional arrays but it can easily be extended to handle 2-dimensional arrays. The most famous solution to this problem is the Kadane's 2D algorithm O(N^3). This algorithm is quite effiecient and quick to implement. There are actually many post on the forum about this algorithm, but since you don't have much time, here is a basic version of the algorithm (it can be optimized): http://alexeigor.wikidot.com/kadane

Second step: parallelizing

Multi-core machines

Once you have a working sequential algorithm, you will want to parallelize it. But just before that you should ensure that you have access to a multi-core machine so that you can test your program and measure the effectiveness of the parallelization solution you are implementing. You normally have been granted access to the MTL machine (40 cores) but if you have access to a dual or quad core machine at home, you should consider using it for your tests instead because the MTL is crowded and the timing results might end up less accurate. So my advice here is, use the MTL only once in a while to test whether your program is scalable, but during the rest of the time you should stick to your own machine.

Test your solution on the MTL

If you want to test your work on the intel machine, you should upload your sources to your MTL account using SCP (Windows users can use WinSCP, OS X users can use Fetch, Linux user can use the nautilus-connect-server command that is shipped with major distributions). Then you can test your program using SSH (Windows users can use Putty, the rest of the world can open a terminal and use the ssh command). However the MTL machine is not directly connected to the internet, so before you can access the MTL you have to connect to the private network the MTL belongs to. If you are a Windows user, you can follow the steps indicated in the mail from intel that contains your login information. If you are a Linux user, you may want to follow this excellent article: http://software.intel.com/fr-fr/articles/utiliser-mtl-manycore-testing-lab-linux/

Quick tips

Since you are certainly running late, you might want to consider some quick and easy way to parallelize your solution: using OpenMP or TBB. Here is a topic that will certainly find interesting: http://software.intel.com/fr-fr/forums/showthread.php?t=100517&o=a&s=lr

If you think you still have some time, you might want to consider other solutions using pthreads and low level synchronization mechanisms such as semaphores and mutexes.

Before submiting your makefile, be sure to try to compile your program with one of the many compilation optimization available: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html (only for programs written in C & C++)

Good luck!

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.