Over the last year we have developed a product called JuggerNET [1] that wraps Java types in C# and makes them accessible to all .NET languages via P/Invoke and JNI. The most logical design follows the well-established design pattern for managed wrappers around unmanaged types [2], in our case Java instances. In this design, the lifecycle of an underlying Java instance is tightly coupled to the lifecycle of the wrapping .NET instance, and that means that .NET instances have to clean up behind themselves during garbage collection.
Our product had been working very well for several months and had passed our tests as well as some stringent and exhaustive customer tests, but then one day, a customer reported a crash during a long-running performance test. The crash was not just reproducible, it was exactly reproducible, i.e. the application would always crash at the same point. We immediately suspected a slow resource leak related to faulty cleanup and started investigating the problem with this theory in mind.
Two weeks later we were no closer to solving the problem. All our tests ran clean, i.e. none of the tools we used for analyzing memory usage, .NET resource usage, etc. showed any resource accumulation over time. What was worse, after making some initial changes in the way we used P/Invoke, we could no longer reproduce the problem while our customer was still crashing reliably.
We were just about ready to throw in the towel (i.e. offer some serious, free on-site consulting), when I asked the customer: “By the way, what kind of hardware are you running on?” The answer was: “A dual processor Pentium box.” That’s when the penny dropped and I realized that we were not facing a resource leak at all but rather a threading issue. We quickly verified this by running the crashing application with a single processor affinity and as expected, it worked fine. As the customer’s application was effectively single-threaded, the only other suspect was the garbage collector. At that point, our working theory became that we had a race condition between the garbage collector and our wrapper types.
Well, it turned out to be our bug, but it was also a very educational experience about garbage collection in .NET land.
If you are like most of us, you understand garbage collection at the conceptual level without ever worrying about the gory details like stop-and-copy vs. mark-and-sweep vs. reference-counting, synchronous vs. asynchronous or many other GC details. You simply want to write your code and let this black box mechanism take care of cleanup without hurting application performance too much and without eventually exhausting memory.
What do most of us know about garbage collectors?
That’s usually enough knowledge to allow you to get by, but in the context of our product, we had to dig a little deeper from the very beginning because we needed object finalization for cleaning up the wrapped Java instance. Let’s take a look at the basic design of a wrapper type in C#:
public class MyManagedType
{
// a reference to an unmanaged object
private JNIHandle inst;
public MyManagedType( JNIHandle inst )
{
this.inst = inst;
}
// the object finalizer
~MyManagedType()
{
cleanup( inst );
}
}
Please note that the JNIHandle could also have been a pointer to an unmanaged C++ object, a native file handle, or any kind of generic resource descriptor that refers to something outside of the CLR.
In C#, the destructor is synonymous with the object finalizer. The garbage collector will call the finalizer before removing the object from memory. The downside of this simple design is that you don’t control when exactly finalization occurs. If .NET were to let a lot of objects accumulate before invoking the garbage collector, the JVM might run out of memory (or in a different use-case, you might run out of file handles) before the .NET garbage collector feels the need to run.
In order to address the requirement of timely release of resources, .NET also has a core interface called System.IDisposable. When a type implements this interface, some special semantics apply to the type. The developer can of course manually call the Dispose() method to release resources. Let’s take a look at the new implementation of MyManagedType (just focusing on the changes):
public class MyManagedType : IDisposable
{
...
// call this method to release resources
public void Dispose()
{
cleanup( inst );
}
}
Being able to manually dispose of resources is nice, but it would be much nicer to have some automatic cleanup mechanism, like the automatic destructor invocation in C++ when an instance goes out of scope. There is something like that in C# and it is the using clause. With the above declaration, we can now write the following piece of code:
using( MyManagedType t = new MyManagedType( inst ) )
{
//use t to your hearts content
...
}
What’s happening here? The compiler effectively issues you a guarantee that the Dispose() method will be called automatically at the end of the block (I will qualify this statement later).
One of the questions that often come up in relationship with the IDisposable interface is: “Do I still need a finalizer if I’m using IDisposable?” The answer is yes. Remember that Dispose() does not get called by the garbage collector; you can call it explicitly or you can use the using clause to have it invoked automatically by the compiler, but it does not get called automatically when an object is freed from memory. So how do we combine finalization with explicit cleanup semantics?
public class MyManagedType : IDisposable
{
private JNIHandle inst;
public MyManagedType( JNIHandle inst )
{
this.inst = inst;
}
~MyManagedType()
{
Dispose();
}
public void Dispose()
{
GC.SuppressFinalize( this );
cleanup( inst );
}
}
The first change is pretty obvious: you implement the finalizer in terms of the Dispose() method. The second change is less obvious: you implement the Dispose() method with an additional call to the .NET framework method GC.SuppressFinalize(). This method makes sure that the finalizer will not be called in addition to the Dispose() method. This has two benefits: it prevents double cleanup and it improves performance by unburdening the garbage collector if you called Dispose() manually. So far, we’re following the traditional guidelines for designing a class with finalizers.
In our experience, you also need to make the Dispose() method thread safe. This can for example be achieved by adding synchronization logic via an atomic exchange operation:
public void Dispose()
{
// thread-safe way of clearing unmanaged instance
int temp = System.Threading.Interlocked.Exchange(inst,0);
if( temp != 0 )
{
GC.SuppressFinalize( this );
cleanup( temp );
}
}
We assume here that the JNIHandle type is really an int. Now you have a pretty solid implementation of a managed wrapper around an unmanaged resource. Or at least so we thought.
Even with all these patterns and safeguards in place, we were still experiencing
the threading problem described above. What could possibly be the reason?
I promised you that we would look at the using clause in some more detail, so
let’s come back to it now. We were employing the using clause liberally
in our framework code to make sure that we were not hanging on to Java resources
longer than necessary. The typical usage would be inside a generated block of
code, following this pattern:
public void MyMethod( MyManagedType arg1 )
{
using( MyDisposableHelper h = new MyDisposableHelper() )
{
PinvokeHelper.call( h.Add( arg1 ) );
// h.Dispose() is called for us automatically here
}
}
We expected this code to be more or less equivalent to the following snippet:
public void MyMethod( MyManagedType arg1 )
{
MyDisposableHelper h = new MyDisposableHelper();
try
{
PinvokeHelper.call( h.Add( arg1 ) );
}
catch( Exception e )
{
throw e;
}
finally
{
h.Dispose();
}
}
Coming from a C++ background, we had interpreted the using clause to work similarly to a C++ destructor invocation when a variable goes out of scope (you just can’t escape your own history). Having put all the safeguards in place that make sure that finalization and disposal don’t get in each other’s way, we should have known better. What’s really going on is a little harder to illustrate because the garbage collector –and thereby another thread– is involved, but here’s the picture:
public void MyMethod( MyManagedType arg1 )
{
MyDisposableHelper h = new MyDisposableHelper();
JNIHandle i = h.Add( arg1 );
// done with 'h' now, therefore eligible for finalization;
// on GC thread the following is executed at an
// unpredictable point in time
h.~MyDisposableHelper();
// 'i' is now potentially invalid because of the
// finalization that might have occurred on the GC thread
PinvokeHelper.call( i );
// doesn't really do anything anymore because
// the finalizer has already run
h.Dispose();
}
Our mistake was to interpret the using clause not just as a way to have Dispose()
called automatically but also as a way to prevent the “used” object
from being garbage collected. Just because we think that the used object is
valid till the end of the block does not mean that the garbage collector agrees
with us!
The key to make the using clause work reliably together with our wrapper types
is a hint for the garbage collector:
public void MyMethod( MyManagedType arg1 )
{
using( MyDisposableHelper h = new MyDisposableHelper() )
{
PinvokeHelper.call( h.Add( arg1 ) );
// tell the garbage collector that we don't want
// h collected until after the next call
GC.KeepAlive( h );
}
}
The GC.KeepAlive() method can be used to instruct the garbage collector to not finalize an instance until the point in time at which GC.KeepAlive() is called.
After weeks of debugging, this was all that was necessary to fix our problem. And what did we learn from this exercise?
I hope that this article will help you designing better wrapper classes and will save you the debugging nightmare that we went through.