smith (2/8/2008 2:16:59 PM) wrote:
how does par_for know how many cores you have available if other processes are running?
A thread pool is created to run programs that use the parallelism features. The number of threads in the pool is determined at runtime based on the number of logical cores available on the system. Tasks are then scheduled to those threads. The other processes running on the system are not directly taken into account.
Iam providing more accurated and completed answers below for this question
Xinmin Tian (Intel)
For __par for, the execution model is not based on tasks. It is simply an OpenMP fork-join model with static-even loop partitioning based on the number of cores in the system.
Also, we have two modes for multi-processes running case, you can choose by setting the library environmentKMP_LIBRARY=[turnaround | throughtput]
In a dedicated (batch or single user) parallel environment where all processors are exclusively allocated to the program for its entire run, it is most important to effectively utilize all of the processors all of the time. The turnaround mode is designed to keep active all of the processors involved in the parallel computation in order to minimize the execution time of a single job. In this mode, the worker threads actively wait for more parallel work, without yielding to other threads. Note that, please avoid over-allocating system resources.
In a multi-user environment where the load on the parallel machine is not constant or where the job stream is not predictable, it may be better to design and tune for throughput. This minimizes the total time to run multiple jobs simultaneously. In this mode, the worker threads will yield to other threads while waiting for more parallel work.
The throughput mode is designed to make the program aware of its environment (that is, the system load) and to adjust its resource usage to produce efficient execution in a dynamic environment. Throughput mode is the default.
It's great to see all these releases of work in progress, but it would
be good to get some idea of the underlying aim of the technologies. So
for example in this case, the code fragments presented are all things
that can already be done relatively easily with OpenMP. Can you explain
what you are aiming for with this compiler beyond what OpenMP can
provide already? At present I don't see any benefit to using these
language extensions over just using OpenMP.
The parallel programming extensions are intended for quickly getting a program parallelized without learning a great deal about APIs. A few keywords and the program is parallelized. If the constructs are not powerful enough in terms of data control then there may be a need to look into other more comprehensive parallel programming methodologies, such as OpenMP.
As you have observed, everything you can do with these extensions can be equally well done with OpenMP. However, these features can be described in just a few pages while OpenMP requires serious study. Obviously, with simplicity you give up many of the features and flexibility of OpenMP.
Thanks for the response. I'm not convinced though - I don't think OpenMP really "requires serious study", at least to get to a level to do simple parallelism. In my experience the really tough part is getting developers to think in a parallel way. Once they are there, the incremental effort of getting them to understand an interface like OpenMP is pretty small. Most people I have shown OpenMP to just "get it" right away once they understand the principles of parallelism, particularly since most of what they want to do can be accomplished with a simple#pragma omp parallel for
In addition, the fact that these pragmas can easily be disabled to revert code to serial form, and that they work across many compilers is very appealing compared with the approach adopted here. So basically I'm not convinced there is a demand for a simpler threading model than OpenMP. Rather there is likely more demand for providing easier access to functionality beyond what OpenMP can do. Just IMHO of course. :-)
Thanks for the comments.
Since we are examining ideas for future language extensions, we'd like to know ifyouropinion changes at all if these featureswere an integral part of the language and all compilers supported them.
Secondly,can you elaborate on "functionality beyond what OpenMP can do"? Are there things you want to do that cannot be done with OpenMP?
I don't think it makes a great deal of difference if these features are language extensions - OpenMP is already fine for this kind of functionality. Just to make the point another way, much of the functionality provided here can be provided in a standard OpenMP-compliant compiler with simple #define statements:
# define ___pragma(x) _Pragma(#x)# define __critical ___pragma (omp critical)# define __par ___pragma (omp parallel for)
Some would require the Intel taskq extension for now, but since task queues are part of the OpenMP 3.0 spec, that too should be available to all compilers at some point:
# define __parallel ___pragma (omp parallel) ___pragma (intel omp taskq)# define __spawn ___pragma (intel omp task)
When I mentioned functionality "beyond OpenMP" I did not have anything specific in mind. It was more a reinforcement of the statement that simply duplicating existing OpenMP functionality did not seem very worthwhile.
If I had to give examples of beyond OpenMP functionality, I might pick some of functionality provided by TBB, for example. TBB is great, but there is a significant step up in complexity when going from OpenMP to TBB, and that might be a domain in which someone can come up with some easy to use functionality that is not available in OpenMP. But even then, those requests might better be directed to the OpenMP standards committee instead.
Note - I'm not opposed to threading language extensions in principle, C++0x sounds like it may have some useful extensions for example.