Simple, efficient code profiling technique for unmanaged C++ (part 1 of 2)

Publié dans: 

Hello everyone, I am an SDE at IP-Tech working on C/C++ development projects. It has been a hot summer for me, rambling between different professional experiences; Just before finally finding peace at IP-Tech where I even have the time to blog about technical programming issues.

For my first post, I thought about several topics but I will start by the one I’m currently involved with at IP-Tech: code profiling.

Actually, we are currently implementing a document management system as a part of our nearshore projects with Business Document: Apart from ongoing multimodal collaborative UI front ends (Web Interfaces, Desktop ones..) for the interactive document production (editing, describing, creating, and tracking), there is a plan to enable automated, high-volume batch document creation via a silent (non-UI) time-critical process.
For such case, and from a developer point of view, code profiling is of a big importance. It is also that profiling tools and techniques occupy a large area in the software market “knowledge base”, especially where time performance is a big deal. There is a wide range of possibilities the developer can start with and may be limit himself to, including manual instrumentation tricks, IDE’s built-in profilers, and standalone profilers like Intel’s Vtune.

However, what prompts me to this topic is the dissatisfaction I had each time I use a new profiler, profiling class or an IDE add-in.
This article is, consequently, part of an effort to provide a simple yet effective approach on how to deal with the problem of code profiling. Something I will try to finally concretize by developing a tool targeting C/C++ users.

Let first introduce the concept of code profiling as well as some of the key terms related to it. The objective behind code profiling is to draw knowledge about program execution behavior in term of how much time does the execution last in a certain region R, in term of how many calls are there to certain function F and in term of what path or what regions were covered by the execution test.
Hence code profiling is exactly the techniques how to calculate those features and efficiently report them to the user interface where the developer can analyze and interpret the data. Many techniques are available, but all of them use one of the following two methods:

  • Instrumentation: measurement code is added to the target application code, either manually or automatically via a run-time code injection (invisible to the developer).
  • Sampling: the application is run under certain conditions that would enable a separate process to pause its execution at regular intervals for the purpose of collecting key-data information values.

Of course, the first time I started bother myself by code profiling was at the first time I had a big application performance issue and was compelled to solve it. Expressing the need to understand my complex program behavior, I turned to learning and using some of the many available automatic tools. Similarity between that first case and the one I am going to face at IP-Tech are from the reasons behind this article. Indeed, the first case was about a hard-coded PGN parser for a chess game that runs very well for a single game: once you load a certain game item, the parsing takes about 3 or 4 seconds during which it continuously reports the moves results to the GUI where they are rendered, and in a progressive way that makes you forget about the time duration.

Psychology worked well! I have been satisfied by that solution until the day I decided to reuse the same code in order to enable chess position search feature in large databases: Parsing is inevitable if I would like to convert the human PGN notation into a powerful data structure upon which certain operations can be easily applied. No will for a run test, of course: the average 8000games database would consume 8 hours.

The same can go with batch document  production, or knowing you wouldn’t be having my own worries, let’s say you have a PCI-based missile launcher, a DERM system for the interactive target selection and finally a PCI bus driver to receive the positions and launch signal, at the application level, then forward them to the launcher. Everything is fine as long as there is one missile to launch per user command: upon user click, you start translating positions into driver level signals and while every button is disabled on the UI, you rotate the sphere the same Google Earth does, luring your user, while the launcher is positioning itself. You can finally still fool the user by a progress bar or a flashing text before the missile is actually fired. This sadly works fine unless comes the day we ask you to embed your system in a military ship where the radar flow of target information substitutes for the simple user click event, and the simultaneous targeting of multiple aircraft substitutes for the mere expectation to fire a long ranged missile.

Code profiling is of a big importance because in such cases, we are interested in economizing the mere milliseconds that would otherwise translate into intolerable minutes. We are first interested to isolate the IO bound operations and seek optimization for the CPU bound (optimizable) regions of code, read more on the topic in this paper. Then maybe we would return back to rethink an overall model using a parallelization scheme, to put hardware in action and worry about the optimal distribution of IO bound / CPU bound operations calls orders that would decouple the effect of the “non optimizable” from the final batch system result.

Hence, code profiling can already answer the maximum possible optimization, it can, most importantly- detect the bottleneck regions with the key information how to track the real problem. Code profiling is also expected to let you track the difference brought by the local as well as the model (global) optimizations.
I said that I have been dissatisfied by all profilers/profiling techniques for the following reasons:

  • As for sampling and automatic instrumentation methods and tools  : I always have to profile the entire application even if I would like to track execution of only a particular excerpt, which is the often case I am in. The profiling results are too verbose to be able to draw the relevant values only. There are grids about each functions metrics values, even data about threads activities and memory tracks, but never a simple way on how to employ the interface to restrict the rendered data into target region(s).
  • Manual instrumentation: timing a certain area of code is the least one can expect, but this is not enough: I want to easily add the code and easily get rid of it; I want more than the simple duration of certain portion execution, I want to track areas with specific markers that would then translate into time series, I would like to mark sub regions and track their cumulative contributions into the total duration of those main areas. Finally I would like to easily visualize the results as graphs, line graphs for those series, histograms for the sub-regions contributions if possible. Finally and most importantly be able to  record the profiling results to be compared with post-optimization results.

End of part 1 of 2.
Topics to be discussed in part 2:

  • Detailed description of the proposed solution;
  • C++ Implementation of the described profiler.

Updated 12/22/2008 : part 2 is now available.