C++

Bandwidth and Cache Affinity

For a sufficiently simple function Foo, the examples might not show good speedup when written as parallel loops. The cause could be insufficient system bandwidth between the processors and memory. In that case, you may have to rethink your algorithm to take better advantage of cache. Restructuring to better utilize the cache usually benefits the parallel program as well as the serial program.

concurrent_hash_map

A concurrent_hash_map<Key, T, HashCompare > is a hash table that permits concurrent accesses. The table is a map from a key to a type T. The traits type HashCompare defines how to hash a key and how to compare two keys.

Timing

When measuring the performance of parallel programs, it is usually wall clock time, not CPU time, that matters. The reason is that better parallelization typically increases aggregate CPU time by employing more CPUs. The goal of parallelizing a program is usually to make it run faster in real time.

Suscribirse a C++