First timings.

First timings.

My first timings are available. My test is to replicate the attached input 500,000 times for a 765MB file (30.5 million statements). Time on an i7-2600 is about 8 seconds on first run, 4 seconds on second run (after input was cached by OS). Disclaimer: i7-2600 is a very fast CPU.

Fichier attachéTaille
Télécharger misc.txt1.55 Ko
8 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

John,

Can you attach your output file of two copies of your little program?
Include your console text too if you please.

Good job by the way. Have you tried this out on the Many Cores Testing Lab?
And are you on W7 or Linux?

I just unpacked and assembled a Core i7 2600K today. Have to pick up some things before I power up. I have a KVM switch that has VGA and two PS2 plugs for KB and mouse. Motherboard only has USB for KB and mouse. Will have to pickup two adapters. Also forgot to order DVD driveso I cannot install my O/S. Will get them tomorrow. I do not look forwared to installation and migration.

Jim Dempsey

www.quickthreadprogramming.com

John,

I thought of something last night that may be different in the official test interpreter code that will be run through your program.Omitting attention to this might "blind-side" your interpreterprogram. Let's see if I can explain this in simple terms, and hopefully you will understand.

Your test input source has 500,000 times of

{ (bracehere for illustrative purposes)
var x = ...
var y = ...
...
free(y);
free(x);
} (bracehere for illustrative purposes)
{ (bracehere for illustrative purposes)
var x = ...
var y = ...
...
free(y);
free(x);
} (bracehere for illustrative purposes)
...

IOW the above populates and completely voids your symbol table structure. If your code is identifying the voids and using that to slice up the work per thread (makes sense) then consider what will happen if the test program has:

var totalRuns = 0;
{ (bracehere for illustrative purposes)
var x = ...
var y = ...
...
free(y);
free(x);
} (bracehere for illustrative purposes)
totalRuns = totalRuns + 1;
{ (bracehere for illustrative purposes)
var x = ...
var y = ...
...
free(y);
free(x);
} (bracehere for illustrative purposes)
totalRuns = totalRuns + 1;
...
output(totalRuns);

What would happen then in your program (as the voiding of your symbol table only happens once at the end)?

Jim Dempsey

www.quickthreadprogramming.com

@Jim here is the two-copies output from my program.I haven't tried on MTL yet. In the last program MTL was very disappointing because it awas all I/O bound. In this one, maybe not so much.i7-2600 are super-fast and run cool, I think you will like it.

Fichiers joints: 

Fichier attachéTaille
Télécharger misc_double.txt912 octets

@Jim, at this point I'm noit trying to do anything clever to locate and exploit isolated statement blocks of the kind that you describe, for exactly the reasons that you state. Analysing the global dependencies is itself a hard problem and not amenable to parallelism. However I think it may be possible to look for islands of statements that are indeed independent and parallelize those.On a related topic, try this input:var y = 1;var x = y + 1;free(y);var y = x + 1;free(x);... above four lines repeated 9999 more times...output(y);My run:$ ../x64/Release/expr chain.txt f:/temp/footime = 6.76 msecjlilley@soad /cygdrive/g/john/expr/samples$ cat f:/temp/foo20001.0000000000FYI this is Windows7 x64.

I do not know what they have for disk setup on the MTL system. I know this make a big difference.

On my Q6600 (4 core no HT)my QuickThread version of the TBB parallel_pipeline to uppercase words demo programran at (from recall) ~300MB/s (input + output) wheras on a Dell R710 with 5-disk RAID10 the same program ran at ~1,350MB/s. I would think Intel would have a similar setup on the MTL system since you (they) wouldn't want I/O to quash parallelization efforts.

This is off topic from the thread...

I got my i7-2600K up and running. I have a question, perhapse you can help me out.

I had 3 hard disks on my old Q6600 system. Windows XP Pro x64 was installed first with 1 HD installed (NTFS Primary Partition), the second HD was added NTFS (Extended Partition). I then added a 3rd HD and installed/partitioned this with Ubuntu (this was done prior to installing Ubuntu on my 2x Opeteron 270 system). The GRUB boot loader got installed on the first HD (With Windows as a boot option at bottom of pick list). This system ran solid for years.

The i7-2600K has a 60GB HD. I installed Windows 7 Pro x64 on that. Then I thought of adding one of the HD's from my other system. I took disk 3 thinking the GRUB loader (on disk 0) would still load giving me a menu option as before (except the Ubuntu disk is now missing and will not boot). I tested this before I did anything with the removed HD. This did not work out as planned. I reinserted the missing disk into my Q6600, and crap! wouldn't boot GRUB hung.

I've files on those two NTFS HDs... I add the former first HD from the Q6600 to the i7-2600K and to my relief this mounted as D: and all my files were there. Then I though I would add the second HD from the Q6600 to the i7-2600K but this disk would not mount.

DISKPART shows this as a Recovery partition (the label is read OK) but also as Hidden. I am not experienced at using DISKPART. I tried clearing the volume hidden attribute - this did not work. I would like to "condition" the drive such that it mounts as NTFS. The data was there a few hours ago. Any hints would be appreciated.

BTW this HD has my Archives, which I have on many CDs and DVDs. I'd rather recover the partition. I think this is an issue with "New and improved" features of Windows 7.

Jim Dempsey

www.quickthreadprogramming.com

Jim, we've had a lot of trouble getting GRUB and Win7/Server2008 to multi-boot properly, and I'm afraid that I'm no expert on the matter.john

After some scalability improvements, I've run this test on the Windows 40-core MTL (the one with "issues", so they say...) and I get about 26 seconds single-core and around 2 seconds on 20 cores... seems to get worse after that. Times bounce all over the place on that system, not sure why. The changes I made for scalability increased the single-core times, but it made a big difference on the MTL.Meanwhile, try the attached "chain_large.txt". It forces sequential evaluation. Correct answer is4800001.0000000000although I get the following for some reason4800001.0000000008My time on i7-2600 is 494ms

Fichiers joints: 

Fichier attachéTaille
Télécharger chain_large.zip352.1 Ko

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui