Test data

Test data

Just wanted to share....

AnhangGröße
Herunterladen setsin.txt214.76 KB
40 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

How did you generate these sets?

Thank you. These data really useful

写字楼里写字间,写字间里程序员 程序人员写程序,又拿程序换酒钱 酒醒只在网上坐,酒醉还来网下眠 酒醉酒醒日复日,网上网下年复年

Quoting academicrobot
If anyone wants to cross-check partitions for this, mine are attached. The dimensions of these match the dimensions from the source file (cols 2 and 3 from Minh-Nhut Hong's link).

I have the same output. Cheers!

I've done only manual comparation, though. You seem to mix up x and y when you output dimensions.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

I've not checked here, but I did find it strange that the examples given in the problem spec list solutions with height then width as the dimensions. Kind of counter-intuitive to me. But not if you call it rows and columns.

How long did it solve? I recieve few second result, but it seems to be solved much faster.

I am also getting same answers, thanks for posting this! FYI my time is 142ms on i7-2600.

Quoting john_e_lilley
I grabbed the o19 and o20 samples from same site. Samples and my results in zip file. I am running in about 1200ms for o19 and 3800ms for o20 on i7-2600.

I see similar times, though slightly slower (1.5s, 4s) on a Core 2 Quad Q8200...

Quoting john_e_lilley
I grabbed the o19 and o20 samples from same site. Samples and my results in zip file. I am running in about 1200ms for o19 and 3800ms for o20 on i7-2600.

I see similar times, though slightly slower (1.5s, 4s) on a Core 2 Quad Q8200...

[EDIT] Sorry about the double post. Power cuts are frequent in my town...

Quoting john_e_lilley
Large-rectangle problem and my solution; can anyone check it?

I have the same result.

Quoting john_e_lilley
I grabbed the o19 and o20 samples from same site. Samples and my results in zip file. I am running in about 1200ms for o19 and 3800ms for o20 on i7-2600.

On 2xXeon E5620 (2.4GHz, 8 HT cores total)

o19 - 71 ms (1 thread), 14 ms (16 threads)

o20 - 233 ms (1 thread), 30 ms (16 threads)

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Dmitriy! I figure you can always be counted on to turn in some crazy times :-) Have you run it on the MTL yet? I find that their system is not especially fast and scales erratically, but maybe that's just my code.

I don't believe that crazy times as well, but that's what I see on my screen :)
I didn't run it on MTL yet. Yeah, scaling is especially important this year, 40 HT cores can make up to 50x difference.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

I've found that the solver part is generally so fast that the I/O part bottlenecks. Did you have to implement any specialized scanning/formatting code? I found that e.g. cout << integer was way too slow.

Quoting Dmitriy Vyukov

Quoting john_e_lilley
I grabbed the o19 and o20 samples from same site. Samples and my results in zip file. I am running in about 1200ms for o19 and 3800ms for o20 on i7-2600.

On 2xXeon E5620 (2.4GHz, 8 HT cores total)

o19 - 71 ms (1 thread), 14 ms (16 threads)

o20 - 233 ms (1 thread), 30 ms (16 threads)

Is read/write time included to these times? I think these times are fantastic.

On 2xXeon E5620 (2.4GHz, 8 HT cores total)

o19 - 71 ms (1 thread), 14 ms (16 threads)

o20 - 233 ms (1 thread), 30 ms (16 threads)

Great time! I hope that these times are measured without I/O operation. My results are:

o19 - Read time: 280 ms , Elab. Time 105ms , Write time: 1.2 sec (1 thread)

o20 - Read time: 412 ms , Elab. Time 387ms , Write time: 1.8 sec (1 thread)

Core 2 6600 (2.4 GHz)

I'm getting down to 120ms on o20 case with I/O included, so I don't think Dmitriy's times are totally unrealistic, just annoyingly good ;-)

Quoting john_e_lilley
I've found that the solver part is generally so fast that the I/O part bottlenecks. Did you have to implement any specialized scanning/formatting code? I found that e.g. cout << integer was way too slow.

Yup. And both are fully parallel.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Yes, it's a C program of the form:

int main(int argc, char** argv) {
  struct timespec       tstart;
  struct timespec       tend;

  clock_gettime(CLOCK_MONOTONIC, &tstart);

  ...

  clock_gettime(CLOCK_MONOTONIC, &tend);
  time = (tend.tv_sec * 1000ull + tend.tv_nsec / 1000000ull)
      - (tstart.tv_sec * 1000ull + tstart.tv_nsec / 1000000ull);
  printf("exec time: %u msn", (unsigned)time);

  return 0;
}

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Quoting john_e_lilley
I'm getting down to 120ms on o20 case with I/O included, so I don't think Dmitriy's times are totally unrealistic, just annoyingly good ;-)

There is still some time to improve your solution. I don't think I will improve mine any more ;)

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

On Core 2 Duo 2.09GHz

o19 150 ms (1 thread), 90ms (2 threads)

o20 350 ms (1 thread), 220ms (2 threads)

写字楼里写字间,写字间里程序员 程序人员写程序,又拿程序换酒钱 酒醒只在网上坐,酒醉还来网下眠 酒醉酒醒日复日,网上网下年复年

Can I buy a vowel :-)

Dmitriy, are you running Linux or Windows? I've found that basically I can do everything very quickly, up to writing the output, and Windows is really reluctant to create a file and parallel write into multiple parts at the same time. Easy to do on Linux, since you just seek past EOF.

john

I am using Linux.On Windows I would try MMIO or writing to a big file to explicitly specified positions.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

MMIO is the best I/O performance I can find under Windows. Works fantastic on input, but has issues on output. I'm not sure what it is, its highly variable. Windows like Linux will let you create and set file size w/o writing all of the bytes, and then map it in, but I think something gets choked up in the OS trying to map all of the new file pages in, or maybe its not lazy enough on file close. I can deliver good times, not as good as yours, but scaling falls down.

Quoting john_e_lilley
MMIO is the best I/O performance I can find under Windows. Works fantastic on input, but has issues on output. I'm not sure what it is, its highly variable. Windows like Linux will let you create and set file size w/o writing all of the bytes, and then map it in, but I think something gets choked up in the OS trying to map all of the new file pages in, or maybe its not lazy enough on file close. I can deliver good times, not as good as yours, but scaling falls down.

What do you mean when you say that the performance of MMIO, in windows operating system, are highly variable? The same input instance requests different writing times on disk ?

By highly variable I mean two things. First, that per-instance times are variable. Second, that multi-core scaling is erratic.

n! 1s is pretty much the worst case I can think of. 10! produces 270 results giving a file size of 1.85 GB from an input file of 7MB. My serial code solves this in 17 sec on MTL. I'm sure there are significantly better times out there... :(

#include 
#include 
#include 

using namespace std;

int main(int argc, char** argv) {
	int min = 1;
	int max = 11;
	char* fileName = NULL;
	switch(argc) {
	case 2:
		fileName = argv[1];
		break;
	case 3:
		min = max = atoi(argv[1]);
		fileName = argv[2];
		break;
	case 4:
		min = atoi(argv[1]);
		max = atoi(argv[2]);
		fileName = argv[3];
		break;
	default:
		cerr << "SYNTAX: " << argv[0] <<" [number | min max] filename" << endl;
		return EXIT_FAILURE;
	}
	
	ofstream fout(fileName);
	int fact = 1;
	for(int i = 1; i <= max; i++) {
		fact *= i;
		if(i >= min) {
			for(int j = 0; j < fact; j++) {
				fout << 1;
				if((j+1) % 20)
					fout << " ";
				else
					fout << "n";
			}
			fout << 0 << "n";
		}
	}
	
	fout.close();
}

Does anyone have any times for just the I/O for the o20 case, serial or parallel?Since it seems I/O is pretty critical, I started there. And I'm coding in Java, which may bring it's own overheads, so wanted to see what kind of performance I can get.I'm getting 10ms read and 20ms write on my 2yr old machine, 4-core, SATA-disk, running Vista. Is that in the ballpark of what others are getting for I/O alone?

I'm getting pretty good times on a Linux workstation with an i7-920 (similar in speed to your E5620). Significantly better than on Windows with the i7-2600. However, the MTL servers are so much slower than my workstations on both Windows and Linux, it is kind of baffling, except the problem is I/O bound and I don't really have any idea about the performance profile their SAN. I haven't tried the batch submission on Linux, but I doubt it will matter.

Quoting mdma
Does anyone have any times for just the I/O for the o20 case, serial or parallel?Since it seems I/O is pretty critical, I started there. And I'm coding in Java, which may bring it's own overheads, so wanted to see what kind of performance I can get.I'm getting 10ms read and 20ms write on my 2yr old machine, 4-core, SATA-disk, running Vista. Is that in the ballpark of what others are getting for I/O alone?

Since I'm doing MMIO its kind of hard to separate, but I'm finding that scanning, solving and formatting times are roughly equal, perhaps 15ms each on four cores.

Quoting mdma
Does anyone have any times for just the I/O for the o20 case, serial or parallel?Since it seems I/O is pretty critical, I started there. And I'm coding in Java, which may bring it's own overheads, so wanted to see what kind of performance I can get.I'm getting 10ms read and 20ms write on my 2yr old machine, 4-core, SATA-disk, running Vista. Is that in the ballpark of what others are getting for I/O alone?

I've written my solution in c++ running on windows XP and I have similar time. 12 ms to read the input file and 18ms to write the solution.

Quoting john_e_lilley
MMIO is the best I/O performance I can find under Windows. Works fantastic on input, but has issues on output. I'm not sure what it is, its highly variable. Windows like Linux will let you create and set file size w/o writing all of the bytes, and then map it in, but I think something gets choked up in the OS trying to map all of the new file pages in, or maybe its not lazy enough on file close. I can deliver good times, not as good as yours, but scaling falls down.

I get the best time with 12 threads on MTL. Linux has the same problems, I believe the problem is paging subsystem and/or pool of cleared pages. OS need to map in dozens of thousands of clear pages in basically tens of milliseconds.

It's a pity that the problem requires such amount of output...

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Quoting nickbes

Quoting mdma
Does anyone have any times for just the I/O for the o20 case, serial or parallel?Since it seems I/O is pretty critical, I started there. And I'm coding in Java, which may bring it's own overheads, so wanted to see what kind of performance I can get.I'm getting 10ms read and 20ms write on my 2yr old machine, 4-core, SATA-disk, running Vista. Is that in the ballpark of what others are getting for I/O alone?

I've written my solution in c++ running on windows XP and I have similar time. 12 ms to read the input file and 18ms to write the solution.

On 2xXeon E5620 (2 processors, 8 cores, 16 HT threads, 2.40GHz) the best numbers I can get are:

1 thread:

input + processing: 158ms

formatting + output: 72ms

16 threads:

input + processing: 20ms (8x speedup)

formatting + output: 8ms (9x speedup)

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Quoting Dmitriy Vyukov

I get the best time with 12 threads on MTL. Linux has the same problems, I believe the problem is paging subsystem and/or pool of cleared pages. OS need to map in dozens of thousands of clear pages in basically tens of milliseconds.

It's a pity that the problem requires such amount of output...

Yes, the output I/O is such a huge factor. FWIW I've tried MMIO with each thread copying to its own section, serial file I/O, and parallel file I/O (each thread opens separate file handle). Seems that the MMIO and serial I/O are about the same, but the parallel (many handles) I/O is the worst, on both Windows and Linux.

That makes sense. Intuitively, parallel I/O to be the worst due to HDD performance for large files, blocking waiting for head seeks between writes to different areas. But the o20 case, the output is just 15 MB... would have thought this would just write direct to cache. I guess it depends if the HDD drivers are configured for write-through or not.

Quoting Dmitriy Vyukov

Quoting nickbes

Quoting mdma
Does anyone have any times for just the I/O for the o20 case, serial or parallel?Since it seems I/O is pretty critical, I started there. And I'm coding in Java, which may bring it's own overheads, so wanted to see what kind of performance I can get.I'm getting 10ms read and 20ms write on my 2yr old machine, 4-core, SATA-disk, running Vista. Is that in the ballpark of what others are getting for I/O alone?

I've written my solution in c++ running on windows XP and I have similar time. 12 ms to read the input file and 18ms to write the solution.

On 2xXeon E5620 (2 processors, 8 cores, 16 HT threads, 2.40GHz) the best numbers I can get are:

1 thread:

input + processing: 158ms

formatting + output: 72ms

16 threads:

input + processing: 20ms (8x speedup)

formatting + output: 8ms (9x speedup)

Hi Dmitriy, did you test your solution on MTL? Did you discover the same behaviour? My experience is that the execution times obtained in MTL, using windows configuration, are not very fast compared to those obtaned on my machine (old Core 2). Especially those obtained using only one thread. Might be my mistake in the compiler setting ? Has anyone had the same behaviour?

I've also found MTL to be considerably slower than local test bed. For this problem, I think it is the I/O system.

o20
994.542 ms (1 thread)
563.053 ms (4 thread)
Q9000 @ 2.00 GHz

Thanks for sharing the test cases.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen