Intel® Manycore Testing Lab (Archived)

Performance conundrum - more data and summary

Let me give more data, organize them and provide more detail tosupport my conjecture about "Intel's problem".-Note: I report time ratio of 2 algorithms on different platforms andwith varying number of threads. The two algorithms are (parallel)SixSort and (parallel) Quicksort (my version of Quicksort because whatI found out there is not industrial strength). No, I do not do theMickey Mouse thing of sorting integers. Instead, I sort object;i.e.

Performance conundrum

SixSort is a faster and higher quality in-place sorter than
Quicksort. A complexity analysis suggests that the SixSort/Quicksort
time use ratio is 0.58 on large arrays.

A test on a 16M array yields the ratio 0.59 (against a best in class
Quicksort). On an AMD box.

SixSort can be parallelized easily, like Quicksort. Parallel SixSort
against Quicksort with two threads yields an excellent time ratio of
0.33. On an AMD box.

Tests on multiple Intel boxes give disappointing results. Here what
was observed on MTL:

MPI and SSE and searches

Does the MTL allow MPI yet ? I would like to try out my SSE based code and compare it to my openMP results, especially with the new 40 core upgrade. I have relativley morecontrol over the core allocation with MPI than openMP and would find it easier to implement the SSE bits.

Is there a way to search only the MTL forum threads with the Intel forum search device? When using the search all forums are allocated in a list. This search list does not include the MTL. When doing a search the results from all forums are returned.

Am I looking at this correctly ?

Java Version?

I'm trying to do some work with java and the version keeps coming up as 1.4.2. Here is the output:java version "1.4.2"gij (GNU libgcj) version 4.1.2 20080704 (Red Hat 4.1.2-46)Is there anyway the latest version of java can be loaded up? It's my understanding that java 7 is geared to handle multicore development and I would like to look into that.

any process limits: time? disk space? memory?

Hello all,I can launch my program in batch, using qsub and a small instance terminates fine and gives good results.When I scale up my problem, and it takes longer than about 5 minutes, it seems to end at about that time.I hope there is no time limit right?Another minor problem also occured once, when I got/var/spool/PBS/mom_priv/jobs/12681.acaad01.SC: line 5: 31151 Bus error /home/sels/projects/KUL/RhinoCeros/retime/Debug/retime 0[sels@acano01 Debug]$while writing a few files of between 10 and 100 MBytes.Would that be a problem?

Batch nodes: 32 cpus 5x faster than 33 cpus

This is the way I typically submit batch jobs:

qsub -l select=1:ncpus=40 rl-myjob

Ever since the Memorial Day weekend maintenance, jobs submitted this way have been running about 5 times slower than than they do on the login node. I traced the problem to ncpus values greater than 32. For example, on a small test that uses 64 threads and normally runs in under 30 seconds:

qsub -l select=1:ncpus=32 rl-myjob
# Finishes in about 24 seconds

qsub -l select=1:ncpus=33 rl-myjob
# Takes over 120 seconds

HOWTO: Access local net while on Intel VPN


FYI, I have VNC and XMing working 100%. With the posts on the forum it's dead easy to set up. XMing is slow over long distances (I am in South Africa, even with 4mbps ADSL the latency makes X almost unbearable).

VNC is slightly faster because it only sends screen updates, and has the added advantage of persistent sessions - if the nodes aren't bounced while you are offline, your desktop will still be as you left it when you disconnected VNC. For the "purists", XMing does full XServer rendering on your remote PC, which is quite nice, but has no session persistence.

Intel® Manycore Testing Lab (Archived) abonnieren