Overhead compensation

What is overhead compensation?

Overhead compensation is a key benefit of NProfiler's modern profiling engine. It dramatically improves the accuracy and reliability of the displayed profiling results.

It removes the distortions caused by the profiler (the so-called profiler overhead) from the performance data, so you can see how your code would have performed without the presence of the profiler. If a profiler doesn't carefully remove its overhead, the performance data will often be completely misleading, and you will be trying to optimize the wrong parts of your code.

Most .NET profilers ignore these distortions because estimating and removing them is quite complex. For example, some profilers we tested told us that a method takes 70% of the time, while in reality, it was closer to 20%. Unfortunately, it is not immediately obvious if the displayed execution times are incorrect. You will simply trust the profiler and waste time trying to optimize the wrong methods.

Luckily, with NProfiler you are on the safe side. It has excellent overhead compensation and can estimate and subtract even massively distorting profiler overhead to provide highly realistic performance data.

Try it yourself!

Let's have a look at the following C# code sample. It generates a list of 10,000,000 random numbers and sorts it. You can use it to test different profilers for correctness.

var random = new Random(0);

var list = new List<int>();

for (int i = 0; i < 10_000_000; i++)
{
	list.Add(random.Next());
}

list.Sort();

If we run the code without(!) a profiler and use the Stopwatch class to get real-world execution times, we learn that the list creation takes about 20% of the time, while the sort call takes 80% of the time.

When we profile the code with the tracing/instrumentation modes of various commercial and free profilers, we often get completely incorrect times. Even expensive profilers want to make us believe that the list creation takes most of the time (in reality 20%) while sorting takes almost no time (in reality 80%).

The size of the error depends on the profiling settings. If you choose a lighter/less detailed profiling mode, such as sampling, less overhead is added and the measurement errors will be smaller. If you choose a more detailed profiling mode that uses instrumentation, the errors will be severe for most profilers.

We encourage you to download trial versions of different profilers and test them with our code sample. For your daily work, we recommend that you stick to sampling if you want to use a profiler other than NProfiler.

Why do profilers show incorrect execution times?

Code can be profiled in two different ways, either by sampling or by tracing/instrumentation. NProfiler supports both.

Sampling

Sampling means that the profiler stops all threads several hundred times per second and takes snapshots of the call stacks. Based on these samples, the profiler then estimates the time spent on different methods and lines of code.

Sampling is very fast and doesn't introduce much profiler overhead. Therefore, overhead compensation is less critical for sampling profilers, and we won't discuss it here.

The downside of sampling is that it's less detailed. For example, you cannot collect method and line-level hit counts or performance data for JIT compilation, module loading, and so on. Because of these limitations, profilers also support tracing/instrumentation, which is more detailed.

Tracing/instrumentation

Tracing/instrumenting profilers inject instructions at entry and exit points of methods and between lines of code. The injected code performs time measurements and does bookkeeping. It allows the profiler to collect "real" times instead of statistical estimates, as well as hit counts and event-based performance data for JIT compilation, module loading, etc.

The drawback of instrumentation is that the injected code slows down the profiled application and introduces a lot of profiler overhead. Therefore, overhead compensation is crucial for tracing/instrumenting profilers.

Let's assume that an instrumenting profiler has to inject ten lines of additional code for every profiled line of code to collect line-level data. As a result, the profiled application runs ten times slower, and the measured times are inflated by about 1,000% if the profiler does not subtract the overhead from the results.

It's even worse than that. Depending on the chosen settings, profilers often instrument only the developer's code but not the .NET framework methods to improve performance. The profiled application would become crawlingly slow if all .NET framework methods were instrumented, and we usually don't need line-level data for framework methods.

Unfortunately, this "optimization" gives us unevenly distributed profiler overhead. Due to instrumentation, our custom code runs ten times slower, while non-instrumented .NET framework methods (like the sort call in the example above) run at full speed. The displayed performance data will reflect these massive distortions if a profiler does not compensate for them - and most profilers don't. That's why .NET profilers show incorrect data.

Fortunately, NProfiler is quite good at estimating and subtracting the execution time of the injected code. On average, its performance data should be much more accurate than that of other .NET profilers.

That said, in some cases, NProfiler's overhead compensation will mess up. Because of the superscalar architecture of modern CPUs, it is tough to compensate for profiler overhead correctly. There may be cases in which NProfiler subtracts too little or too much overhead. Please let us know if you encounter such a case so we can investigate it.

As a best practice, we recommend profiling only the namespaces you are interested in. This prevents massive slowdowns of the profiled application and reduces the amount of profiler overhead, making overhead compensation more robust. You can do this by setting "Profiled Methods" to "Custom" when configuring the profiling session.

Limitations

There is one case in which it is entirely impossible to subtract the profiler overhead, namely when it comes to synchronization.

NProfiler allows you to switch between CPU time and blocking time. CPU time is the amount of time your code is running on the CPU. Blocking time is the opposite. It's the amount of time your code is NOT running on the CPU because it's sleeping or has to wait for another thread (synchronization) or I/O.

If thread A has to wait for thread B, the blocking time of thread A will include the profiler overhead of thread B because it's impossible to track those inter-thread dependencies. NProfiler will correctly subtract thread B's profiler overhead from B's performance data but cannot remove it from thread A's blocking time. Therefore if you are mainly interested in blocking time, we recommend using sampling instead of tracing.