/Qparallel for light user of parallelism.

/Qparallel for light user of parallelism.

Dear Forum,

I am just a hobbyist programmer. My platform is a HP xw8400 workstation, with one XEON E5335 2.00GHz quad core processor. I use the Intel C++ Compiler Pro version 11.0.066 through a VS2008 IDE. I develop programs for native 64 bit execution hoping that is better for handling big data volumes fast. I use quite basic level of C++ language: mostly FOR loops, arrays and sorting.

Unfortunately or not, lately, my favorite self-developed application (a sort of mathematical analysis tool) has got too far into heavy processing. One single analysis takes close to a week of nonstop processing to complete.

My feeling is that performance can be improved. I read Intels paper of January 14, 2010, about Automatic Parallelization, and followed the instructions therein. Now the work-intensive part of my program contains no WHILE loops, no BREAKs and no GOTOs. True, some function calls remained to access the qsort() function of C. I wonder if it is too many: there are three sortings in the innermost loop. I have specified the /Qparallel option.

Parallelization and vectorization goes virtually smoothly, as the compiler always reports success - several pages of success-messages. Still, the execution time remains just a little (max. 5%) less than that for the .exe produced by the plain MS compiler. As a relatively long-time user on the Intel C++ compilers, I am quite disappointed, because in the earlier days (for my earlier programs_) I used to have 50% of time reduction. Or even more.

Beside slowness, peak CPU usage doesnt exceed 25%, and also that is concentrated mostly in one core of the quad core Xeon. (One core loaded nearly to 100%, and the remaining three just watching the first almost unloaded.) It doesnt look very much like a workload distributed evenly

I would greatly appreciate some advice from a colleague with deeper knowledge in this area. I am enclosing the tremendous command line records, which have been produced by the IDE as a result of my experimentation/guesswork upon the Property pages.

Thanks!

C/C++, Command line:

/c /O3 /Og /Ob2 /Oi /Ot /GT /Qip /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MD /GS /arch:SSE3 /fp:fast /Yu"StdAfx.h" /Fp"x64\\Release/F-PRINT_mp_AutoTest_5.pch" /Fo"x64\\Release/" /W3 /nologo /Zi /QaxSSE3 /QxSSE3 /Qparallel

Linker, Command Line:

kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"C:\\Cpp_OwnCode\\Megoldasok\\F_Print\\F-PRINT_mp_AutoTest_5\\x64\\Release/F-PRINT_mp_AutoTest_5.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"x64\\Release\\F-PRINT_mp_AutoTest_5.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /TLBID:1 /DEBUG /PDB:"C:\\Cpp_OwnCode\\Megoldasok\\F_Print\\F-PRINT_mp_AutoTest_5\\x64\\Release\\F-PRINT_mp_AutoTest_5.pdb" /SUBSYSTEM:CONSOLE /STACK:100000000,100000000 /OPT:REF /OPT:ICF /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\\Cpp_OwnCode\\Megoldasok\\F_Print\\F-PRINT_mp_AutoTest_5\\x64\\Release\\F-PRINT_mp_AutoTest_5.lib" /MACHINE:X64

33 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If you are spending most of the time in qsort(), you will need a parallel version of qsort(), or to run simultaneous qsorts in parallel tasks. /Qparallel won't be able to accomplish this.

Hi
Unfortunately I have an doubt that as with only the auto improving compiler ,you can resolve
your problem of time.For an possibility to reduce same half time you must rewrite new code.
Several thread parallel have really effects if you can decrease unnecessary with
condition commons . ( One Thread have find and inform other to stop)
If impossible or difficult , with long time that you have showed , require using MPI
merging several computer. is the better way not expensive for reducing long times wait.

Do not think that other have secret miracle , and you, not having.

Find Mpi or Mpich2 on www ,functionality is very easy.

(One single analysis takes close to a week of nonstop processing to complete.)

If me ; my personal technical choice Mpi with no hesitation...
Best regards

You will have to look at your code for opportunities of parallelization. These opportunities need not be loops.

I suggest you experiment with OpenMP. Without a complet description of your program it will be difficult to offer good advice. OpenMP has parallel sections. Should your one large code section have opportunities for different sections of the code to runconcurrently then parallel sections might provide better performance. Parallel sections are relatively easy to incorporate into a program (but this will require some understanding on your part). With a bit of work you might be able to reduce a run time of one wee down to two days. This is certainly worth the investement in time (say while you are waiting a week for results).

Jim

www.quickthreadprogramming.com

Thanks a lot for the responses! I understand the common message: it is not possible to stick to a convenient light status if I want to get over this problem. (Very bad news. Have to learn proper parallelization.) My secret hope was some glaring error would be pointed out in the command lines content, and my code could be made start rocketing at once.

Actually, I am following Jims advice already the program has been running in the background all the time since I started asking for help. Also, in theory, I am quite ready to change the code to improve efficiency. There are two difficulties. One is the central model the logic of which I have to stick to in order to ensure the results will reflect some real behavior of the system being modeled. The other is the simplicity of elements I have been using by intent. Basically I have FOR loops embedded in five layers: not so complex, just big. That is why I liked first the idea of /Qparallel.

Knowing my options, I think, I will certainly look deeper into the science of OpenMP. (My present processing speed will provide enough time for that_) Just two more questions. First Tims mentioning a parallel version of qsort() any idea how/where to get it? Second is about the uneven distribution of the workload among the cores I reported last time. Is it because of my fault (e.g. I never bothered specifying any threads), or possibly by some other reason?

Thanks a lot, again!

Laszlo

My knowledge of parallel sort doesn't go beyond what you should have read already on wikipedia. It must be a big academic topic, and there ought to be some information on companion forum sections, e.g. TBB.

Hi

I observe with your answer that you not have understand that i have write.
I you want make as same Professional...
1] Observe how working software example Mathlab or other same type. you can discovering that engine database is used for call under backend.
You not having to rewrite as existing probably
largely better that you have. You can create asynchronous try (sort) with the call database
under the backend.(top level) (used as same, already sufficiently difficult.)
About the time for result that you give, can be probably not acceptable in reality for an professional programming.
Request the people working the programming meteorology or other same hardness tasks sectors
if judicious to use OpenMp or Tbb directly without already existed functions used by backend engine database. probably he burst out laughing.
Also if task is very big is imperative to mount clusters with MPI for to increase side hardware help.
About experience....
Only those who leave the water would tell justly you if it are cold ....
Refer documentation Database Postgresql, Oracle or Db2 for learning the potential functionality already exist appropriated
for help in your task.
for the operate sorting asynchronous is dificult to use OpenMp Tbb with API backend
database, Engine have already internal threading very powerful but with prudence and attention control , you can perfectly make.
The backend of Postgresql (libpq,libpq ) have most power full,
(alignment structure fields record as bottom level) is very important for memory control size and your threading potentially added
if you must drive your programming in large volumes database fields.
My new answer for help you , only , you wrote you are not an programming professional.

Best regards.

>>True, some function calls remained to access the qsort() function of C. I wonder if it is too many: there are three sortings in the innermost loop. I have specified the /Qparallel option.

You stated the runtime is very long.

Is the sort time significant as compared to other execution time?

If sort time is significant:
Is the result of the sort required for next calculation?
Or is it for purposes of organizing results datafor reports?

If sort for reports
export the unsorted data from the loop tomemory buffers if possible
or write unsorted data to file.
Then have seperate thread or seperate application sort the results data concurrently with the remainder of your calculations.

Maybe a good first step (baby step) for OpenMP is:

Compile with OpenMP enabled but with no OpenMP statements. Once that is working (libraries link into applicaiotion without errors). Then...

insert:

double T1 = omp_get_wtime();
...
double T2 = omp_get_wtime();
...
etc...

at major way points in your big routine
and insert arount the call to Qsort.

At the bottom of the code section insert

double E1 = T2 - T1;
double E2 = T3 - T2;
...
Then print out, or write out the results.

If the loop frequency is fast then in global scope

double E1 = 0.0;
double E2 = 0.0;
...

And change the prior code that calculated elapse time into E1, E2, ...
to accumulate elapse time into E1, E2, ...

E1 = E1 + T2 - T1;
E2 = E2 + T3 - T2.
...

Then at end of program, or at some timeinterval, display the elapse time then zero out the elapse time.

By placing the time collection statements appropriately, you can determine to some extent where parallelization may occur.

When parallelizing the code, you do not want multiple sections writing to the same variables.

Jim

Jim

www.quickthreadprogramming.com

Firstly: While you have turned "/Qparallel" on, it will have no effect, since you have not enabled OpenMP (under Code Generation, I think).I would encourage you to post (at least) some of your code. That much idle time on a long-running algorithm tends to indicate either a heavy IO burden or that the CPU core is spending most of its time waiting for memory. There are usually people here happy to look at someoneelse's code with an eye to optimizing it :)- Oliver

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Oliver,

The idle time, in this case, indicates only 1 of 4 cores is working.
The program should be examined for opportunities of parallelization.
If Laszlo is not comfortable with doing this alone, I suggest he entices a university student with a case of beer (to be dispensed after the programming assistance).

Once the initial issues have been addressed, Laszlo will become familiar with what is required and then could take over the task of further refinements.

Jim Dempsey

www.quickthreadprogramming.com

The OpenMP option is not a prerequisite to enabling /Qparallel. If both are set, OpenMP pragmas take precedence over /Qparallel, both using the same OpenMP library, so they can be made to work together.

Are you sure, Tim? In the past I've found that /Qparallel did nothing until I also enabled OpenMP (otherwise it produced parallel-ready serial code). But that was a few compiler versions ago and it may no-longer be valid (as you state).

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

/Qparallel and /Qopenmp may be interchangeable in the link step; you would need one, but not both, if either were in use in the compile.

(This is Oliver: For some reason, it is making me edit your post instead of letting me reply to it!)

Dear,Laszlo
I am sorry for the imaged expression that i having used, and as multiple sens can be interpreted.
I consider largely better all programmer that working C/C++ bottom level. (Your case) .
I have used this image,only ,here , in my state ,and probably also to several state, when we
want build an project very perfectly accorded with an large part C language , you have always burst out laughing of
control quality engineering or manager charge , answer same you reinvent the wheel
too expensive for an task specialized as used unique. absolute no relation with
level capacity, probably the inverse.
I sorry if you have understand sens is not that i think sincere.
About God more exactly (May God forgive me !!)
About OpenMp (more exactly, thread bottom level,only for me use is authorize)
As you having engine database and it can help you i hope,also erase my unintentional fault .
Sometime I use (trigger) with dummy (insert) as conditions for create an reference commune point to all threads.
Best regards.

Best Reply

Hi Laszlo,

Those are indeed excellent loops for parallelization, plenty of workload there :)

One possible approach might be as follows.

What I've done is switch from using doubles to using a 64-bit unsigned long. This saves you losing precision and removes the need for int-to-double transformations, replaced with an efficient bit-shift.

static const unsigned int ARRAY_SIZE = 1000000;

void KRU_qsort_1m (int Sorr_Bemenoe[ARRAY_SIZE][2])
{
   unsigned long long SorrG_A[ARRAY_SIZE];

   #pragma omp parallel for shared(SorrG_A, Sorr_Bemenoe)
   for (unsigned int f=0; f> 32);              
        Sorr_Bemenoe[f][0] = (int)(SorrG_A [f]);                    
   }
}

I might be wrong about the need for the "shared" on the pragma omp.

I removed the for loop that initialized the SorrG_A to zero, since you are going to be overwriting every value anyway.

Now - I'm assuming that you were converting to doubles to get a large enough value to store the two numbers as a reverse-ordered key? That's what I have achieved here.

Also note that I have provided the compiler with a safety hint for optimization, by specifying that "f" is infact unsigned, so the compiler does not have to worry about negative values :)

Additionally, I wrote "++f", which takes getting used to in terms of readability, but it seems to make the Intel Compiler happier.

However -- this code is still likely to be a problem. The first performance hit comes from

 unsigned long long SorrG_A[ARRAY_SIZE];

which allocates space for that array every time you enter the loop onto the stack, which takes time.

In theory, you should be able to do an in-place qsort on the array you are passing to yourself.

Here is a complete example.

1- I created a "struct" called Bemenoe which contains the two ints you are using. It is internally exactly like an array of two ints but it is easier to work with. There is no overhead to using the struct. (I'm guessing that "Sorr" means array?)

2- I changed the compare function - it is probably not the most efficient way to do the compare, but it will be more efficient than doing the sort out-of-place.

3- I do the sort in-place - that is, I don't need to create an extra work space for the sort, which saves a lot of CPU time.

I test compiled this under Linux with:

icpc -O3 -Wall -openmp -o qsorttest qsorttest.cpp

qsorttest.cpp:

#include 
#include 

static const unsigned int ARRAY_SIZE = 8 ;

struct Bemenoe
{
    int value ;
    int key ;
} ;

static int compare(const void* const lhs_ptr, const void* const rhs_ptr)
{
    const Bemenoe* const lhs = (Bemenoe*) lhs_ptr ;
    const Bemenoe* const rhs = (Bemenoe*) rhs_ptr ;

    // Left key is lower
    if ( lhs->key < rhs->key )
        return -1 ;

    // keys are equal
    if ( lhs->key == rhs->key )
    {
        // but left key is lower
        if ( lhs->value < rhs->value )
            return -1 ;
        // values are also equal
        if ( lhs->value == rhs->value )
            return 0 ;
    }

    // right side is lower.
    return 1 ;
}

int main(int argc, char* argv[])
{
    Bemenoe Sorr_Bemenoe[ARRAY_SIZE] = { { 1, 3 }, { 2, 3 }, { 2, 2 }, { 1, 5 }, { 8, 1 }, { 4, 3 }, { 7, 7 } , { 1, 1 } } ;

    qsort(Sorr_Bemenoe, ARRAY_SIZE, sizeof(Bemenoe), compare) ;

    for ( unsigned int i = 0 ; i < ARRAY_SIZE ; ++i )
    {
        printf("%u: %d, %dn", i, Sorr_Bemenoe[i].value, Sorr_Bemenoe[i].key) ;
    }

    return 0 ;
}
Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

The output that I get is:

osmith@ubuntu:~$ ./qsorttest
0: 1, 1
1: 8, 1
2: 2, 2
3: 1, 3
4: 2, 3
5: 4, 3
6: 1, 5
7: 7, 7

Which is sorted by the second (key) value before the first (value) value.

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Another way to write the compare function, which is far more efficient, is as follows:

static int compare(const void* const lhs_ptr, const void* const rhs_ptr)
{
    const Bemenoe* const lhs = (Bemenoe*) lhs_ptr ;
    const Bemenoe* const rhs = (Bemenoe*) rhs_ptr ;

    // Diff will be < 0 if the left-hand key is smaller,
    // 0 if they are the same, and > 0 if the right hand key is smaller.
    const int diff = lhs->key - rhs->key ;

    // If the keys are equal, then the values decide which entry
    // is the smaller entry.
    if ( diff != 0 )
        return diff ;

    return lhs->value - rhs->value ;
}
Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Hello, Oliver!

(I see you have found my last post so very valuable you decided to keep it just to yourself. I fully understand_)

I really thank you for the trouble you took to correct my code! Actually, I will need now some time to study your solution. Your level of C is a little higher than mine. I must be OK, anyway.

Laszlo

P.S. Sorr stands for sorrend, which means order/rank in Hungarian.

Hehe - I'm really not sure why it insists on editing that reply of yours for me -- it's a shame because I'm sure some of the other readers might have better solutions :/

Please don't be afraid to ask any questions, no matter how "trivial" you might think them :)

The "const" identifiers: For this piece of code they are probably not necessary, they are a "hint" to the compiler (and yourself!).

A const on the left means "contents do not change" while a const on the right means "pointer does not change".

char somechars[64] ;
char* p = somechars ;  // P points to somechars.
const char* constp = somechars ;
char* const pconst = somechars ;
const char* const constpconst = somechars ;

// This is allowed
*p = 'a' ;  // Set the first char of somechars to the letter 'a'.
p++ ;       // p now points to the second char of somechars.

*constp = 'a' ; // Not allowed - cannot change the contents of constp.
constp++ ;      // Allowed - constp the pointer is changeable.
// (and now constp points to the scond char of somechars)

*pconst = 'a' ;  // Allowed - we have only restricted the contents of pconst.
pconst++ ;       // Not allowed - we've said that pconst the pointer does not change.

constpconst = 'a' ; // Not allowed
constpconst++ ;     // Also not allowed

Where "const" comes in useful is high-end optimizations, it helps the compiler make certain more aggressive optimizations (although I suspect in some cases it can work it out for itself)

It is also helpful as a self-guide if you have to track down a value being corrupted or modified. Const tells you "not this one!" :)

The one that puzzles some people is

const unsigned int arraysize;

I do a lot of cross-platform/cross-compiler work, and this makes some compilers very happy. It tells them they can treat the value like a "named literal". You aren't going to be changing it. Less and less compilers require this markup these days, but I find it a good habit to exercise because when I come back to a function, it provides me with one additional piece of information when I start reading.

"static const unsigned int ARRAY_SIZE = 100000"

The const says - I won't be changing this. The static says "only visible in the current compile module". That tells the compiler not to create storage space for the variable, and to use the actual value directly. However, it gives you a name for that particular variable so that you can write clearer code and if you decide to change the size of the arrays,you won't accidentally change some other use of the same numeric value.

"static" infront of a function means "only used inside this file". Again, it helps the compiler with some optimizations. It means that you won't be calling it from another file that the compiler does not know about.

You may already know some of these - but I wanted to clarify no extra hidden meanings you were unfamiliar with :)

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Hi, Oliver!

The last long run of my program has just finished. Good news, the results show no error in the semantics, i.e. the model did what was expected of it. At the same time (as the results show) the algorithm itself requires slight adjustment. It is quite normal, just - if nothing happens - I will have to wait for another week to see new results.

I am quite definite about next launching it along with my first parallel constructs in the code I take that as simply a must. Mainly thanks to your hand-made examples now I can even use HELP on parallelism more sensibly than before. (Actually thats what I am busy with now: reading HELP.)

In short thanks a lot for everything!

Laszlo

Please don't hesitate to ask more questions or post more code snippets if you'd like more help. Either here or you can email me at osmith@playnet.com.Oliver

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

(I dont want to make this a never-ending story, buthave to show you the following.)

In the HELP of my 11.00.066 Intel Compiler,just have found this text about Using OpenMP:

Assume that you compile the sample above (named parallel.cpp), using the commands similar to the following, where ... /c (Windows) instructs the compiler to compile the code without generating an executable:

icl /Qopenmp /c parallel.cpp

This compiler option happens to be the first one in my command line. (See the post above.) Originally, it was produced through the Property Pages/C/C++/Preprocessor/Keep Comments Yes/No filed during my trial/error experimentation.

Can this explain the lack of parallelism in the former execution of my program? Oram I totally confused_

As I think we've attempted to explain, /Qopenmp parallelizes by implementing #pragma omp pragmas which you must insert in your program, in accordance with OpenMP standard. openmp.org has tutorials and references. /Qparallel has the compiler looking for ways to achieve a similar effect; it is influenced by other pragmas, such as #pragma loop count. Both require recompilation from source code. If both switches are set, /Qparallel will attempt operation only outside of OpenMP regions.
As both use the OpenMP run-time library, /Qparallel and /Qopenmp are equivalent for link step.
The wording in the VS property page for setting /Qopenmp may be confusing.

"/c" is being used because you have multiple source files. Each source file is compiled into a half-way stage, an "object" file (.obj). When all of the sources have been built successfully, the compiler then "links" these together to create the executable.As Tim says, Configuration Properties > C/C++ > Languagae > OpenMP Support (/Qopenmp) only enables supportfor OpenMP if you manually add openmp commands to your code.What you want is Configuration Properties > C/C++ > Optimization > Parallelization (/Qparallel)This will attempt to automatically use parallelization.You probably also want the following options:Optimization:Optimization: Full Optimization (/Ox)Inline Function Expansion: Any Suitable (/Ob2)Enable Intrinsic Functions: Yes (/Oi)Global Optimizations: Yes (/Og)Interprocedural Optimizations: Multi-file (/Qipo)Optimize for Windows Application: Yes (/GA)Code Generation:Runtime Library: Multi-threaded (/MT)Enable Enhanced Instruction Set: SSE2 (if all your machines have it) (/arch:SSE2)Add Processor-Optimized Code path: None--- We will manually set the APOCs in the Command LineCommand Line:Advanced Options: /QaxSSE3,SSSE3,SSE4,SSE4.1,SSE4.2,/Qopt-subscript-in-range

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Thanks, again, Oliver!
I am in the OpenMP tutorials by the ears. Also here comes the weekend (I am in the office now), when I can start my new testing session. Will try a lot of new things primarily your last command line options.
Laszlo

I don't know if this will be any help, it wades in a little quickly, but I made an OpenMP Tutorial based on my personal experience wading through all the instructions etc:http://wiretap.wwiionline.com/programming/openmpintro.zipI decided not to publish this elsewhere primarily because it requires both Visual Studio and the Intel Compiler, since VS doesn't yet support OpenMP 3.0 (although they told me it would in 2010-Release).- Oliver

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

One other thing about the qsort.

qsort is a 'generic' sorting algorithm, it cannot assume anything about your data. Now if your data is indeed more or less 'random', then qsort will probably do the best possible job (although apparently not parallellized).

Often however, other things are possible.

If you know for example that there is already some ordering in the data, other sorting algorithms might be useful.

Also, if you would add one new value to an array and then call qsort, that would perform much worse than directly inserting it at the proper location in the array. Or use a hashtree - with a hashtree, inserting data takes a bit more time, but it will automatically be ordered.

Hello, Oliver!

Unfortunately, I have some bad news about my compiler. I specified for it all the options you posted here last time. It wouldnt accept only one of them, namely the /Qipo (Interprocedural Optimizations: Multi-file). Here are the error messages I received:

1>(0): internal error: backend signals

1>xilink: error #10014: problem during multi-file optimization compilation (code 4)

1>xilink: error #10014: problem during multi-file optimization compilation (code 4)

Finally I had to specify Single-file, in order to have the code compiled.

There was the same lot of reports about successful parallelization and vectorization as a result of the compilation, but practically no change in the behavior of the resulting EXE. Still only one of the cores carrying the full load, and the number of threads 5, no more.

I wonder if you have any idea what the matter is_

Thanks anyway!

Laszlo

I was getting the backend signals when ICC didn't like my header files. We had #pragma once in many of our headers, and we couldn't use /Qipo until we changed them to "guard ifdefs":#ifndef THIS_HEADER_FILE_NAME_H#define THIS_HEADER_FILE_NAME_H 1...#endifIf you would like me to look at the source code for you, I'd be happy to help you try and optimize it and explain any programming tips... You can send me an email at oliver@kfs.org- Oliver

Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Good morning! I am full speed following your new line of investigation header files and directives. Seems promising.

Thanks again!

Laszlo

(I am sending an e-mail too.)

Quoting kfsone
I was getting the backend signals when ICC didn't like my header files. We had #pragma once in many of our headers, and we couldn't use /Qipo until we changed them to "guard ifdefs":#ifndef THIS_HEADER_FILE_NAME_H#define THIS_HEADER_FILE_NAME_H 1...#endifIf you would like me to look at the source code for you, I'd be happy to help you try and optimize it and explain any programming tips... You can send me an email at oliver@kfs.org- Oliver

Hey Oliver, any chance you could provide a reproducer for that "#pragma once" problem you're describing? I realize it may be a bit trycky, but fif you could provide a tar file of a set of headers and source files I'd love to look into that problem.

Thanks!
Dale

It's a variant of this issue:http://software.intel.com/en-us/forums/showthread.php?t=72709The primary cause of which appeared to be #pragma once in the primary pre-compiled header.I took a brute force approach to fixing it, something like:

#!/bin/perl

our $files = `grep -l '^#pragma once' */*.h`
if ( !$files ) {
  print("No '#pragma once's remaining.") ;
} else {
  foreach $file ( split(' ', $files) ) {
    my $guard = $file ;
    # Trim leading "this" directory.
    $guard =~ s/^.?// ;
    # Insert underscores at camelcase points/path separators
    $guard =~ s/([a-z0-9/.])([A-Z])/$1_$2/g ;

    # make guard upper-case
    $guard = uc($guard) ;

    open(IN, $file) || die "Unable to read '$file'" ;
    open(OUT, "${file}.new") || die "Unable to write '$file'" ;

    # Start the new version with the guard.
    print(OUT, "#ifndef ${guard}n#define ${guard} 1nn") ;

    # Copy everything but #pragma once
    while (  ) {
      print(OUT) unless (/^#pragmas+once/) ;
    }

    # Terminate the guard.
    print(OUT, "n#endif // (${guard})n") ;

    close(IN) ;
    close(OUT) ;
    # Back up the original file
    rename($file, "${file}~") ;
    # Replace the file
    rename("${file}.new", $file) ;
  }
}
Oliver 'kfs1' Smith, Lead Server Programmer, Cornered Rat Software / Battleground Europe

Leave a Comment

Please sign in to add a comment. Not a member? Join today