| Last Modified On : | June 30, 2008 2:58 PM PDT |
Rate |
|
by Juan A Rodriguez, Intel Corporation, and
Simonijt Dutta, Intel Corporation
Since most day to day operations are moving online (Core banking, Reservations, Shopping), software performance has become vital to their success. So many times visits to a web site takes long time to load, resulting in frustration and the migration to a different site (similar business). For businesses this can be fatal as they lose customers. Web sites often slow or even go down when traffic increases. Performance/stress testing of your application can help avoid such downtime. There are tools that tell you performance of your application is bad but not necessarily why. But if you have information knowing what to look for, what is good and what is bad, will put your application in better shape later.
With Microsoft® .NET Framework, developers can now build complete business solutions quickly with more functionality and robustness with its rich and easy to use features and functionality. But with this comes increased opportunity for architects and developers to design and build poor, non scalable solutions because architecting and designing these solutions are not really very straight forward. This paper talks about the core performance related issues that one should be aware of in .NET. This paper also talks about some common mistakes which one should avoid and many tips for writing high performance .NET code.
This paper will discuss:
The .NET Framework provides a run-time environment called the CLR, which manages the execution of code and provides services that make the development process easier. CLR provides features such as automatic memory management (GC), exception handing, security, type safety, JIT (Just in time compiler for converting msil to native code) and more. CLR is implemented as a dll called “mscorwks.dll”. It also has support for Base Class Libraries (BCL) which sits on top of CLR, providing libraries for functionalities such as String, File I/o, and Networking, Collection classes, Data Access (ADO.NET) and XML processing. On top of BCL there are presentation layers (Web Forms and Windows forms), which provide UI functionality. Last, one finds the languages that Microsoft ® provides for .NET. Currently there are more than 15 different languages that are targeted for .NET framework.
CLR Execution Model:
Each Language has a compiler which compiles and converts the code to msil (Microsoft® Intermediate Language). There are multiple optimizations that are built into each of these compilers which produce efficient IL code. Then CLR takes over and it has the JIT compiler convert this IL code into native code that CLR can execute. The JIT compiler also has many optimizations built in which can produce efficient native code for better performance. If the code is unmanaged, then we bypass most of this and can directly run unmanaged programs. Note that .NET provides additional features by which we can use pointers to access arrays etc through a feature called “unsafe” for better performance.
Threading support in .NET is implemented in System.Threading namespace. This provides the classes and functions such as creating/destroying threads, synchronization primitives for atomic access that needed to write multi threaded code. This namespace also provides a class that allows us to use the pool of system provided threads called “Threadpool”.
Threadpool basically handles thread creation and cleanup. It recycles threads to minimize the thread creation and clean up overhead. Threadpool also sees other threads running such as GC threads so it can adjust the thread creation logic. A developer may not consider the number of threads that should be used, critical to proper performance. Threadpool also has built in heuristics enabling it to adjust the number of threads. It is recommended to use thread pool when you are thinking about threading your application. ASP.NET already uses Threadpool for processing web requests.
I mentioned earlier that Threadpool automatically decides how many threads are needed for optimal performance. For ASP.NET (web) applications, tune using the machine.config file to reduce the contention. Tune using this method when the following conditions are true (2)
"system.web"
<httpRuntime minFreeThreads="32" -> //Requests will be queued if total # of available threads falls below this number
minLocalRequestFreeThreads="32" -> //Requests from the local host will be queued if total #
//of available threads falls below this number./>
<processModel>
enable="true"
maxWorkerThreads="12" ->
//maximum # of worker threads in a threadpool. This is per CPU.
maxIoThreads="12" -
//maximum number of I/O threads in a threadpool. This is per CPU.
minWorkerThreads="40" -
//minimum worker threads available in the system @ any time.
//This is for the entire system
Note: These values are not recommended values but just used for illustration purposes.
So, how does the formula work?
The number of worker threads = maxWorkerThreads*# of CPU (Cores) in the system – minFreeThreads
16 = 12*4-32 (assuming you are running a 4 core machine). The total number of concurrent requests you can process is 16. But an interesting question arises. How do you know that this actually worked? Look at the “Pipeline Instance Count” performance counter and it should be equal to 16. Only 1 worker thread can run in a pipeline instance count so you should see a value of 16.
You have to be very careful when doing this as performance may degrade if you use random values.
.NET threading API’s and Threadpool make a developer’s life easier, but still there are many threading related issues that can hurt performance and scalability.
void foo ()
{
int a, b;
…. //Some code
//Following code has to be atomically executed
{
}
…. //Some other code
//End of atomic region
}
//WRONG: Increased atomic region. Lock will be held longer thus hurting performance
void foo
();
{
int a, b;
Object obj ; //for synchronization
Monitor.Enter(); or lock(obj) {
…. //some code
//Following code has to be atomically executed
{
}
…. //Some other code
Monitor.Exit(); or }
//End of atomic region
}
//WRONG: Entire function is synchronized. Bad idea.
using System.Runtime.CompilerServices;
MethodImplAttribute(MethodImplOptions.Synchronized)]
void foo
();
{
int a, b;
…. //some code
//Following code has to be atomically executed
{
}
…. //Some other code
//End of atomic region
}
//Correct: Synchronizing just that block which needs atomic execution
void foo
()
{
int a, b;
Object obj;
…. //some code
lock(obj) {
//Following code has to be atomically executed
{
}
}
//end of lock
//End of atomic region
…. //Some other code
}
Use proper synchronization primitives: There are multiple synchronization primitives that are provided by .NET Framework. These vary from fewer features (very fast) to many features (very slow). It is important to use this correctly to get optimal performance. Synchoronization primitives can be defined as:
//Wrong
//Correct
lock (this) {
public class foo {
do something;
Object sync_obj = new Object();
}
lock(sync_obj) {
Do something
}
//Wrong
//Correct
lock(typeof(foo))
public class foo {
{
private static Object sync_obj = new Object();
Do something;
lock (sync_obj) {
Do something;
}
}
Thread1
Thread2
lock(obj_A) {
lock(obj_B) {
lock(obj_B) {
lock(obj_A) {
Do something;
Do something ;
}
}
}
}
ArrayList myAr = new ArrayList();
ArrayList mySyncAr = ArrayList.Synchronized (myAr); //use mySyncAr
Automatic memory management, aka GC is one of the most important features provided by .NET Framework. GC manages the allocation and reclaiming of memory in your application. When ever you call “new” to create a new object, GC will allocate memory from managed heap as long as space is available and once it runs out of memory it triggers collection, reclaim memory so that it can start allocating again. We will go into some detail about GC algorithms, how they work, different GC flavors, and how you can write a GC friendly code.
.NET GC is a generational and mark and compact algorithm. We have 3 generations (Gen0, 1 and 2). .NET GC assumes that most of the objects you create die young, so only a part of your entire manage help can be collected (which is much faster) than collecting the entire manage heap. GC first marks the root objects (to find out those who are alive) and then compacts the heap (moving all live objects to a part of the heap which forms older generation(s). Always, allocations happen in Gen0 heap. The initial gen0 heap is some fraction of the last level cache. The idea is to have gen0 fit in the cache to avoid cache misses.
.NET GC Flavors:
Note: Selecting appropriate GC flavor is essential for optimal performance of your application
Workstation (WKS) GC: WKS GC has 2 variants. Concurrent GC (on) which is the default and can be turned off. Concurrent GC (on) will have less pause time, increasing the UI responsiveness. GC stops the application threads for a shorter duration when absolutely necessary. If you have a throughput kind of application (console app non UI) then turning off concurrent GC might get you better performance. In your application configuration file (ex: foo.exe.config), you can add following [2]
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
WKS GC has 1 heap per process and it has 1 GC thread per process. WKS GC is the default even on multiprocessor systems for any non ASP.NET application. ASP.NET automatically chooses SVR GC if you are on a multi processor system.
Server (SVR) GC: As the name suggests, SVR GC is optimized for server based applications (better scalability). It has 1 GC heap per Processor and 1 GC thread per 1 GC heap. For example, if you are on a 4 processor system, you will have 4 heaps and 4 GC threads operating on each of those heaps. A process can create objects in multiple heaps (for load balancing the allocation on heaps) and as mentioned above it is not the default. To enable Server GC, add the following in application configuration files.
<configuration> [2]
<runtime>
<gcServer enabled=“true"/>
</runtime>
</configuration>
<configuration>
<runtime>
<gcServer enabled="false"/>
<gcConcurrent enabled="false"/> </runtime>
</configuration>
Note: When you ask for Server GC on a UP machine, you get WKS GC with concurrent off. CLR assumes that since you are asking SVR GC, you are more interested in throughput than UI responsiveness and so automatically turn off concurrent GC.
using System;
class Program
{
static void Main(String[] args) {
GC.Collect(2, GCCollectionMode.Optimized);
}
}
We covered threading and GC and now we cover the general VM, code generation and basic ASP.NET and ADO.NET tips for writing better code
int i = 123;
object o = i; (Implicit boxing) //box keyword
int j = (int)o; //unbox keyword
When ever we box, a new object is created on the managed heap and the value is copied in it. If we are doing this frequently, then we will create lot of objects (affect GC) and also the extra code we execute for boxing and unboxing.
Foo myFoo = new Foo();
myArrayList.Add(myFoo);
Foo myFoo = (Foo) myArrayList[i]; //castclass keyword
Collection classes take generic “object” as a parameter. Type casting is required when retrieving objects(your type) from the collection classes. This requires an expensive run time type check by looking at method table of that object. If your object is inherited then this may require traversing one level up which is again expensive. You can avoid this by using generics (similar to C++ template) as shown below which doesn’t require run time type check as it is known at the compile time.
List<Foo> myList = new List<Foo>();
Foo myfoo = myList[i]; //no check reqd
<Wrong>
void foo (int parameter)
{ int ret = 0;
val = …. ;
try
{
ret = val / parameter;
}catch(DivideByZeroException) { return ERROR_VAL ;}
}
<Correct>
void foo (int parameter)
{
if (parameter == 0) return ERROR_VAL; else {… ;}
}
Ngen: Ngen.exe (shipped with CLR) invokes JIT compiler on MSIL to create native code and stores it in the disk. Once the native image is created, runtime uses this image automatically each times it runs the assembly. Using native image will eliminate compiling on the fly using JIT compiler at runtime thus reducing application startup time.
Ngen.exe can help improving application performance by,
Interop: When you build applications in managed code, some times it is necessary to call unmanaged libraries such as calling a COM component. In some cases, you want to use unmanaged code for some performance related reasons as well (such as calling 3rd party highly optimized libraries). CLR provides several ways to do this.
Improving Interop performance: (1)
Improving ASP.NET Performance:
Improving ADO.NET performance:
Till now, we have seen tips, tricks and BKM’s for writing high performance .NET code. What follows is a brief list of performance tools that are available for tuning .NET code. This paper will not detail them.
Perfmon – System level tool. It exposes several CLR, ASP.NET related counters and this should be used as the first tool for analyzing any .Net applications. I will go in to detail on the counters available and some tips in later posts.
Intel® Vtune™ Analyzer: Profiling tool from Intel which supports .NET including ASP.Net applications.
CLR Profiler: Tool from Microsoft which is used to profile memory (allocation) of your application. It is free and downloadable from msdn.
SOS: Manage debugging extensions from Microsoft. Free, Shipped as SOS.dll with CLR. Exposes many CLR internal data structures such as GC, Exceptions, Objects, Locking etc. Can be used to identify functionality bugs (such as OutOfMemoryException) and performance related bugs as well (locking etc).
VSTS Profiler: A built in profiler from Microsoft® Visual Studio Team system 2008. Can sample application and identify hotspots and hot call chains etc
VSTS: The Microsoft® Visual Studio Team system (for testers) has a built in ability to do performance load testing of n-tier web based applications. It is very simple to use including a recording facility for URL’s and also has ability to look @ perfmon counters of all the machines from a client system etc.
In this new Internet era application performance is essential to be successful and to stay ahead of the competition. Including performance engineering throughout the SDLC (software development life cycle) is essential to achieve/exceed performance goals. Performance engineering should be proactive and not reactive (example: When customer complains of a problem). This paper outlines information, tips and BKM’s for improving performance and looking at potential issues in threading etc if you are developing your application using Microsoft® Framework SDK.
1) Improving .NET Application Performance and Scalability
2) http://blogs.msdn.com/maoni
| April 16, 2009 8:51 AM PDT
Milind Hanchinmani (Intel)
| Yes. Since % Time in GC is very low. you donot worry aobut rest. The GC activity is very low. Look some where else |

English | 中文 | Русский | Français
Milind Hanchinmani (Intel)
|
Nick Parker
Thanks,
Nick