Memory Management in totalityFiled Under: Weekly Tuesday Dose of goodness
Dear all,
This week, I’m going to talk about Memory Management as across the board as possible. As we know, memory management has always been a hot topic especially with the C and C++ community.
Even with the advent of smart pointers, which basically brings C/C++ up to the level, in terms of memory management, to be the same as Java and C# to a certain extent. The key thing is - basically, a developer doesn’t have to care much about when the memory is released. It’s done automatically.
But even so, memory management is not just about managing memory that will leak if otherwise.
So what is it all about? Read on…
Introduction
First of all, this is not a repeat on articles like legal leaks, design-based leaks, and so on… these are known problems and are part of memory management’s share of problems.
So what’s the management all about?
It’s about controlling the memory in all aspects.
- How it’s being used
- How much it’s being used
- Governance over life of objects
- Object design
- Flow design
Remember, we have at least 4 types of memory -
1) Stack
2) Heap
3) Code
4) Private Heap (Static)
All these types of memory must be managed properly as well. As you can see, the previous articles dealt with the larger portion of the scope, which is the Heap-related memory issues and that’s certainly not the whole picture.
Importance of Memory usage
This paragraph applies only to certain languages or semantics of the language.
It’s important to know how a piece of memory will be used in terms of instantiation. When you instantiate an object, where should be it stored?
Basically we have 3 choices here - Stack, Heap or Private Heap. Private Heap is used for example, counting how many times this function has been accessed. Useful for manual runtime coverage counting though.
What’s the importance?
1) Stack is limited and is copied every time a function is called (you’ll have to store the current state before going into a new state).
2) Heap is big, but can suffer from memory leaks or fragmentation if not well handled.
3) Private Heap is known to the compiler during runtime, so it shouldn’t be a big issue. But it might cause a confusion if it’s used in classes since the notion of static differs from language to language. For example, in C/C++, static simply means one and only, therefore a static variable in a C++ class will be shared across all instances of the same class.
Importance of Memory usage frequency
Here, we’re dealing with 2 major metrics.
1) Size of the object/class
2) Number of such objects that will exists in the memory
By saying this, I’m inadvertently making a statement that memory usage of all applications must be finite.
If there’s a claim that says that you can create as many objects as the memory permits, that’s probably a polite way of telling you the same thing I’ve mentioned above.
Next, we must decide how many instances of this object will exist in memory. If this number is fixed based on design, then the choice of data structure will be obvious - arrays or vectors.
If not, then the designers, technical and business alike need to tie down a range. As I might have mentioned in my previous posts, for example, in a game, the number of explosions should only be around 200-250 at any one time. This is the simplest example of a range.
If this frequency is unrestricted, then it’s either the designers are INSANE, short-sighted or they’re extremely confident that the object, no matter how many instances there’ll be, will be of no consequence to the memory usage.
Importance of Object Governance
In terms of design, an object has a finite life span. The longest life span of an object is never longer than the application’s own life span. However, there’s an exception to this rule; that is if the object is inside the private heap (static).
A few parameters go into object governance. Namely:
1) Deterministic order of construction
2) Deterministic order of destruction
3) Non-Deterministic order of construction
4) Non-Deterministic order of destruction
5) Strict object life span
At least 2 things can lead to non-deterministic construction/destruction:
1) Threads
2) Static objects
A designer must know that certain things must be avoided altogether for a greater good.
1) Avoid thread dependencies, use semaphores and other thread mechanisms only when it’s absolutely necessary
2) Prohibit static dependencies. Since the order of destruction isn’t deterministic, static dependencies often lead to unwarranted crashes, leading to OS-level memory leaks if you have any OS-level objects dangling.
On the 5th point, strict object life span, this is something that ONLY C/C++ has the right language facilities to handle it properly. (Correct me if I’m wrong).
Yes, we have VMs to handle Java and .NET objects. But no, we cannot control when the objects will be terminated. Like it or not, even when we set objects to NULL. It doesn’t guarantee immediate termination. This applies to people in C++ using reference counted pointers with the exception of COM pointers.
If you think that simply by setting a NULL to a Java pointer, .NET reference or C++ smart pointer template, and the object will just vanish… that’s really naive.
Yes, your own test scenarios may prove me wrong - that’s only because you’re certain of the number of reference holders of your memory! In a real application, this number is certainly unknown to due to the vast possibilities of complex application. Yes, it’s deterministic still, but to govern over them is usually expensive.
Even if methodologies such as observer or visitor pattern is used, it doesn’t come free. To keep a huge list of .NET references/Java pointers/C++ templates using the same object is extremely expensive and potentially a performance bottleneck especially in real-time applications.
Obviously, you’ll need to iterator (observer) or visit (visitor) all the objects involved to tell them to go and kill themselves. This iteration PER object instance is extremely expensive and not elegant at all.
Back to the C/C++ capability of resolving this problem at a glance. Basically, normal or standard C/C++ pointers suffer from the lack of intelligence. That is, if one pointer gets deleted explicitly, fulfilling the rules of object governance, it’ll cause the rest of the address holders to hold on to an invalid pointer known as - dangling pointer.
This is very dangerous since there’s no way to check for such pointer types because the memory address is as simple as an innocent postal code.
I’ll give everyone a conclusion on this matter by the end of this post.
Object and Flow design
Object design has always been a pain in the ass for good designers. This is because, no matter how good a technical person is, there’s always a limit to how far he/she can foresee.
Thus, an object design changes from time to time and is dictated constantly by business requirement changes. Even so, the design of the object must be justifiable to its object size. I’ll not elaborate into this as it can be discussed in a separate article.
Flow design is also important, because it determines how memory and objects are being passed from method to method. Thus comes the terms, pass-by-value, pass-by-object, pass-by-reference, pass-by-pointer.
Actually, in C/C++, the notion of:
1) pass-by-value
2) pass-by-reference
3) pass-by-pointer
ARE THE SAME. We should redefine these notions a little:
1) Pass-by-Contents
2) Pass-by-Reference/Pointer
That should be it. When we pass by contents, the argument is taken as a literal object, as it is, as a temporary instance of the object type stored in the stack. When we pass by reference/pointer, for the pointer variant, the pointer itself is copied over only (thus 4 bytes on a IA-32 vs the object size). The pointer is accessed by dereference -> operator.
Let’s just do some sanity checks.
1) What’s the size of a pointer in 32-bits? 4 bytes
2) What’s the size of an unsigned long integer in 32-bits? 4 bytes
3) What’s the size of a double in 32-bits? 8 bytes
As you can see, passing by references or pointers are the same. Passing by value can be a little expensive if there’s too many natives in the parameter list.
Yes, when you do a sizeof(ABC&) yields the same size as sizeof(ABC). But be assured, when you pass by a reference, you pass in an alias which is like a controlled-pointer which you can’t delete directly (yeah, directly eh…).
It’s however, important not to pass in an entire object. This will certainly cause the stack to be filled with countless copies of the same object in multiple calls even though they’ll be eliminated eventually.
A large stack foot print has some performance impact. But more importantly, such a design can easily lead to stack overflow since the stack is very limited. (Just do a recursive loop with a static counter, break on crash and see how many times the counter was counter. Add 8 bytes per call for the stack location and program counter.)
Large stack footprints can be created by 3 ways
1) Recursive functions/methods
2) Pass LARGE objects by its value and has multiple nested calls
3) Has large local arrays or large amount of temporary variables in a method and this method has multiple nested calls.
Conclusion
Let’s just head straight to the conclusion for object governance.
Basically, I cannot share how it’s being done because it’s done by one of our products in Strides Interactive LLP. Known as a cSmartPtr<T>, it can do far more than just a simple reference counter.
Basically it has the following functions that others do not have (or from the best of my knowledge, do not have):
1) Ability to control the life of the object, allowing you to delete them using its internal terminate() function. The rest of the containers will return NULL when checked upon. No observers, no visitors, no iterations!
2) Ability to replace the container’s pointer with another and safely returning the old pointer stored with another separate container.
3) Ability to prevent pointer corruption (ie, 1 pointer passed to 2 different container family) - Optional
If you need a solution to have total governance over your object life span, please email me at strides@stridesdev.org for a RFQ.
P.S. We also offer consultation services for your application on memory management.
That’s a hell lot of a post this week. Enjoy your week ahead!
Signing off,
Jeremy
- Permalink
- Admin
- 20 Apr 2010 10:07 AM
- Comments (3)
April 27th, 2010 at 9:40 pm
To find and localize memory leaks in C++ apps I advise to use Deleaker ( http://deleaker.com/ )
April 28th, 2010 at 9:53 am
Hi ATS,
Does Deleaker detect and provide a path trace to leakages in COM objects as well? As well as, does it provide any alarms should a certain memory block or class type continues to grow?
Regards,
Jeremy
June 22nd, 2010 at 11:09 am
[...] anyway? First of all, we need to understand that there’re 4 types of memory in C++. See this the memory management in totality article for more [...]