| Last Modified On : | June 15, 2009 9:01 AM PDT |
Rate |
|
Dynamic Data Allocation Policy for Static Usage to optimize the Performance of Multi-Core Systems
Research Area: Performance Analysis of Multicore Systems
Authors: Mr. Sridutt, 6th Semester, Dept of Computer Science & Engineering & Mr. Shrisha , 6th Semester, Dept of Electronics & Communication.
Faculty mentor: Prof. Santosh .L. Deshpande, Dept. of Computer Science & Engineering.
Name of the Institution: S.D.M College of Engineering & Technology, Dharwad, Karnataka, India
Abstract: Speed of processors today has reached an inflection point amidst the advancements in the technologies of Multi-core Processors. The short fall is because of 3 major setbacks being latencies and pitfalls in the Bus Control Algorithms, Cache management Algorithms and the more cumbersome of all and also the most important the Replication Policies/Strategies.
A Replication Policy is strongly influenced by the number of Read (RO) hits and the Read-Write (RW) hits encountered by cache memories (L1 & L2) which service various parallel threads of a Multi-core Processor. In this paper the use of dynamic data replications for static usage is proposed.
Our model tries to decrease this latency by addressing the major setbacks in Replication Policies/Strategies. The Replication policy presented takes into consideration 3 major parameters: Percentage of L1 & L2 cache shares, replacement preferences among the caches and finally the optimal solution for the shared variable(RW) allocations.
The current cache replacement algorithms that are followed are like the LRU, MRU etc with an overall time complexity of more than n and so is the case with bus control algorithms. And that is why a policy to decide the cache management is used. A policy does not have a mathematical proof like the algorithms but decides based on the data and statistics available to the dispatcher (controller) and thus the worst case complexity is always linear and equal to n that leads to a considerable improvement in efficiency.
Background: The theoretical peak performance of a system (the product of floating-point operations per clock period, the clock rate, and the number of CPUs) provides only a rough indication as to what performance can be obtained on real-world scientific application programs.
To better exploit the reconfigurable hardware devices that are coming to market, a number of technologies are developed to handle billions of transistors available in these new chips. A key idea in these technologies is decoupling the communication from computation. This decoupling allows the IP cores (the computation part) and the inter-connect (the communication part) to be design separately.
Problem with shared memory concept is the communication, Communicating data needs to do a lot of computation before it pays off
•l Infiniband needs ≈10ms to set up communication.
•l 2GHz CPU needs 0.5ns to perform one floating point operation (Flop).
•l 20,000 floating point operations per communication setup. [Ref: 3]
The cash replacement problems generally encountered is hit time which can be decreased by using small and simple caches, way predication and trace caches. Apart from this there is the problem of miss penalty resolved using parallelism.
Compilation system that generates code for processor architectures supporting both explicit and implicit parallel threads do exist. Such architectures are small extensions of recently proposed speculative processors. They can extract parallelism speculatively from a sequential instruction stream (implicit threading) and they can execute explicit parallel code sections as a multiprocessor (explicit threading). Though there exists a problem with these that is implicit conversion of parallel code into serial. A simple example for such an instance is declaring a variable within a loop which causes multiples copies of it to be generated presuming it to be serial code though the loop itself is parallel.
int i,k;
for(i=0;i<10;i++)
{
int j=10; /*Multiple copies of j*/
k=i*j;
......
......
}
A study on exploration and comparison of the gross completion times for three
different schemes to analyze a batch of bitmapped image files showed an increase of 13.825 in efficiency on using batch processing. [Ref: 4]
Problem Statement: Dynamic Data Allocation Policy for Static Usage to optimize the Performance of Multi-Core Systems.[Performance Analysis of Multicore Systems]
Methodology: The use of policy and context based switching to improve the performance for the bus arbitration and cache management. This mainly focuses on following:
Example: The Interleaver Controller can ascertain the data type and can choose horizontal interleaving in case of array operations and vertical interleaving in case of other data structures.
Example : Consider this symbol table provided by the modified Compiler which gives the number of read hits or writes
|
Variable name |
Reads/writes |
Read hits |
Write hits |
|
X Y Z . . . . |
RRWRW RWWRW RRWRR ................... ................... ................... ................... |
3 2 4 . . . . |
2 3 1 . . . . |
SYMBOL TABLE
From the above statistical data the Dispatcher would send variable X to L2 cache since the read hits are more, whereas the variable Y will be sent to L1 cache in view of more write hits.
This would be a virtual method for scheduling multiprocessors and would help us in gaining the instruction level parallelism by deciding the location based on the number of reads and writes.
The simulation class diagram:
|
Data struct data { int x; bool flag; int process no; int read_hits; int write_hits; }s_data;
Basic function send2cache(int x);
Interleaver controller ptr_control( ); type_ascert( );
Background process buffer_merge( ); block_call( );
allocation_control ( s_data variable ); |
|
Data block int x;
Function block save(int x); send2dispatch(int x);
|
|
Read block int a; int b; int c; int d;
write block int x;
function block
send2dispatch(int x); save(int x);
|
data
block
L1 Cache
|
Data block int x;
Function block save(int x); send2dispatch(int x);
|
L1 Cache L2 Cache
Dispatcher
Function Descriptions:
Cache Functions:
Dispatcher Functions:
Basic function
Interleaver Controller functions
Background Processes
Statistical Controller Function
Key Results: The Possible summary of the 5 strategies showing impact on the cache performance
|
Technique |
Hit time |
bandwidth |
Miss penalty |
|
Way prediction caches |
+ |
+ |
NA |
|
Pipelined cache access |
- |
+ |
+ |
|
Early restart |
+ |
+ |
+ |
|
Merging write buffer |
+ |
+ |
+ |
|
Statistical method |
+ |
NA |
NA |
+ indicates that the technique improves the factor
- indicates that the technique hurts the factor
Discussion: Though it is seen that the hit rate factor is hurt in the pipelined access it helps a lot in improving the overall efficiency by decreasing the miss penalty and the other strategies do compensate the same sufficiently.
Scope for future work (if any): Designing of a Compiler capable of returning the number of read and write hits for each variable in the code.
Conclusion: The presented work can achieve considerable gain multiple times thus improving the overall performance by a huge factor. The use of interleaving policy than algorithm by using early knowledge through statistics helps to achieve instructional level parallelism with a worst complexity of n and the best being 1, in contrast to complexities which are greater than n in various algorithms presently being used.
References:
•1. Computer Architecture, A Quantitative Approach, Fourth Edition, Morgan Kaufmann Publishers, John .L. Hennessy & David A. Patterson, Pages: 293 - 301.
•2. Parallel Computers, Architecture and Programming, Prentice-Hall India 2000, V. Rajaraman & C. Siva Ram Murthy.
•3. Concepts of Parallel Computing :: Clustering and Shared Memory, Dr. Alf Wachsmann, Stanford Linear Accelerator Center (SLAC).
•4. A Comparison of Job Completion Times Using Three Different Batch Processing Implementations,Lee R. Clendenning
•5. Advanced Computer Architecture, Tata McGraw Hill Publications New Delhi 2005, Kai Hwang.
Acknowledgments: The authors thank the Principal & the college for their continuous support & help.

sridutt
| ||
shrisha.udupi
| ||
sldeshpande
|