This is a common question, and one which is profoundly misunderstood. The most important answer is: it doesn't matter. What matters is how big your data is. Thinking about "how big" the code is almost always leads to a lot of effort that produces no payback whatsoever. Some of this is covered in my essay, Optimization: Your Worst Enemy, where I point out that "traditional" approaches to "optimizing" a program are completely without merit in most cases, and the most important thing you can do is figure out where the real costs are, and optimize those.
The most common error I see is a misinterpretation of the data returned by the Task Manager or Process Viewer. The question is usually expressed as "I created this (small, empty, tiny) application and ran it and it takes (500KB, 1.7MB, 3.1MB...), how can I make this smaller?" The answer is, don't bother. Forget it. You are wasting your time.
The problem is that traditional programming style has taught programmers that "smaller is better" and trivial programming systems have presented programmers with a single number that is a "measure of goodness" where the smaller the number, the better the program. This applies only to applications running in trivial contexts. For example, if you have an 8K EPROM to hold your code on an embedded system, you know you have 8,192 bytes of code space, and you may have to play all sorts of sneaky tricks to fit your code into that size. If you have a 128K EPROM, most of those issues disappear, and you can concentrate on solving the real problem. In all cases, our goals in building systems is to reduce the need to concentrate on useless problems so we can focus on the problems we are trying to solve.
Virtual memory is one such methodology that allows us to ignore a lot of issues. Let's ignore data space for a moment, because that actually has genuine problems we need to concentrate on, and just consider code space.
What you can see, at best, using the process viewer or task manager, is the size of a the address space in use. But what difference does this make? If you have a job to do, it takes a certain amount of code to do it. The "optimizations" you might make to reduce code size have virtually no impact on the actual resulting code size. So stop worrying about it. If you really care about the size of a module, you can use the
depends program (in the Visual Studio\Common\Tools directory of the VC++ distribution), and it will tell you the "size" of the module, which is still pretty uninformative, and besides, there isn't anything you can really do about it unless you have some massive static arrays (and if you do, you probably need them anyway, but that's part of the data problem, which is real).
But what are those large code sizes you see in the task manager? What's this 3.7MB MFC image? Isn't this because the tools are sloppy, Microsoft loves code bloat, and the library is overblown?
The code is there to do a job. Consider the following program:
1 int main()
4 return 0;
If you compile this and run it (note that it contains an infinite loop so you can see it running), it takes 548K. How could a program this small take 548K?
The answer is that you are confusing address space with program size. In order to run, this program requires both the NTDLL.DLL and the KERNEL32.DLL modules be mapped into the address space. What you're really seeing are the complete address space footprint of these modules. But these modules are mapped into every application. You might not touch more than a few pages of these libraries. So do they add to the size of your program? Not in the slightest. They're already loaded, they're mapped everywhere, and they are probably used so often they never get paged out. So right away we see that there is essentially no meaning to the "program size" reported by the process manager or process viewer tools.
Now consider a full-blown MFC application, such as one that I wrote recently. It uses the following DLLs:
|Common controls (spin control)|
|Common dialogs (file open/save)|
|Network support (probably required by MFC42.DLL)|
|Audio compression manager|
|Audio compression manager|
|WebCheck support (probably required by MFC42.DLL)|
|Network and browser support|
|Registry and NLS support|
|Network services support|
|RPC support (called by ADVAPI32.DLL functions)|
(the actual program executable appear here in the list)
|Menus, windows, etc.|
|Multimedia library (MIDI)|
Now I don't actually use a lot of these features; I get the full multimedia library even though I'm only doing MIDI and WAV output; I probably get other DLLs mapped in because I'm using the DLL version of the MFC library, MFC42.DLL, and it loads DLLs that load other DLLs. But if I need the code to do the job, I have to have it somewhere, and it might as well be a library that somebody else maintains. And if I'm not using the code, it really doesn't cost me anything to have it mapped into my address space (well, it costs me address space slots, and I do have a limit of something like 524,288 of those (the 2GB user address space), so the fact that a trivial program consumes 128 of these or so, or about 0.024% of them, on the whole isn't going to matter a lot. The above program consumed 3.7MB of address space, slightly over 900 pages, or 0.17% of my address space. So why should I care?)
A long time ago, back in the late 1960s, the notion of the working set evolved as a way of characterizing the behavior of a virtual memory program. The working set represents the set of pages recently accessed. So it doesn't matter how large your code size or data size are, if the working set is small. Since most of the pages, such as those in the above DLL list, are largely untouched, they are not part of the working set, and their presence in the address space is essentially irrelevant.
The working set model has nice implications for scheduling. When a process is scheduled, or, in the case of Win32, when a thread is scheduled within a process, the scheduler makes sure that all of the pages of the current working set are preloaded before actually making the thread feasible. This minimizes the number of page faults that are actually taken, and improves overall system throughput.
When you have a very large executable, it is possible that function A calling function B calling function C calling...etc. will generate a large working set of code. You can get certain performance improvements by causing closely-associated routines to be linked so they fall within a small set of pages. The Microsoft linker provides no such capability. When I first did this, we were working on a timeshared machine which had a whopping 750K of RAM (well, magnetic core in those days). That is, a machine shared among 60 people with the computational horsepower and physical memory of a 8088 PC. In those days, we actually saw performance improvements from what we called "adjacency linking". It takes a lot of effort, and in modern systems it just doesn't seem to matter very much. The last time I worried about this was in 1969 or 1970.
Where it matters
Where address space matters is in the size of your data. And not just the data that is in use; the total size of your heap is what matters. Where the code size has virtually no impact on your overall performance, your data space can kill your performance and/or your reliability. When you want to expend effort understanding your program size, understand your data structures. As the data structures increase, and the heap fragments, or you leak memory, you end up with an ever-growing memory footprint. This starts to be serious, because this can, without much effort on your part, exceed the size of your code working set by two orders of magnitude.
Paging performance will kill you when the algorithm otherwise looks sane. Consider an application that is doing a convolution algorithm on a large bitmap. If it is set up to process the data along one axis, it will access all the pages successively, meaning that you will maximize the number of page faults. If, however, you arrange the data so that you can process all of one page of data before moving to the next page, you will minimize page faults and get at least an order of magnitude improvement, if not two orders of magnitude. If you can further take advantage of the L1/L2 caching, you might get another order of magnitude improvement.
Heap fragmentation is the major cause of large data working set. This is caused by doing allocations and releases in a pattern that leaves holes in the heap that are no longer usable. This is why MFC always allocates
CString values in multiples of 256 bytes; this minimizes the sort of fragmentation that precise allocation causes, making it possible to reuse previous string values left free on the heap. But if you're not careful, you can still fragment the heap badly. When you are programming in C, you should do
mallocs only of certain quantum sizes, for example, 256 bytes, and/or use user-defined heaps (
HeapCreate and support functions). It is fairly easy to do this in C++ if you are defining your own classes, but much harder to do if you are using MFC.
At some point, I'm going to do something about a tool for studying heap fragmentation. MFC provides some nice tools for this. But for now, all I can suggest is exercising due diligence in handling allocation. Use a tool like Bounds Checker for Windows (from NuMega) to verify you are not leaking memory and other resources.
The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.