Saturday, August 02, 2008

A deep dive into COM, ApartmentStates and VS 2005 unit testing

I’ve had some adventures in what is clearly “Legacy” code recently, and had the pleasure on learning all about STA vs MTA apartment states, and what this means for unit testing in VS 2005. Let me warn you that this is an obscenely technical post. Mainly, I’m posting it because I’m certain to forget the details next week. If you want something non-technical, go ogle my children.

Summary: Do not use MTA, there are good reasons STA is the default.

Update 2008-08-19: OK, if you are in a hurry, do use MTA. Consider creating two test projects, one for STA and another for MTA. Getting the tests to work in STA is "proper", but not easy.

Scenario: An eager TDD developer wants to create unit tests for a .NET application that uses COM component, for example, a Microsoft office application. The woes you experience are described in threads such as this (MSDN forums) with misleading answers.

Side note: A test project where you are referencing COM objects and MS office applications is not a unit test project. Unit tests cover functionality with no or few external dependencies. You are writing very valuable functional or integration tests in the style of unit tests. You still need true unit tests.

What is happening?

Usually, a COM component such as an MS Office Application is heavy and slow to create. You want to share an instance to avoid creating it for each test. Reread the document for apartment states carefully. Note that the default, “single threaded apartment”, does not allow you to share a COM object across threads. VS 2005 test executes test initialization, assembly initialization, and tests on different threads. Ergo, you can't create a COM object in test initialization and use it (directly) in your tests if you run with Single Threaded Apartments.

Also note that STA is the default. Many COM components will not work with Multi Threaded Apartments. For example, the .WebBrowser control in .NET 2-0. Even the trivial .NET file open dialog requires STA.

More background: about COM runtime callable wrappers.

One answer to this is to execute your tests with Multi-Threaded Apartments. Having tried this, I strongly discourage it. A few hours of test support wrappers can get around the STA problems. MTA problems:

1) A lot of the COM components require STA threading. You’ll get errors such as “ActiveX control '8856f961-340a-11d0-a96b-00c04fd705a2' cannot be instantiated because the current thread is not in a single-threaded apartment.”
2) If the actual application will run in STA, a test using MTA is an invalid test. Assuming this is an automated integration test (as it interacts with heavy COM dependencies, it is not a unit test), you want it to exercise the code in a realistic manner. A different threading model will lead to false failures and successes.

My solution:

To start with, build this out with Test Driven Development. What do you need to do? You need a test initialization method that creates some COM objects, you need to access objects during tests, and clean them up afterwards. So start with the following :
  1. test class that will demonstrate the ability to "share" a scenario using COM objects.
  2. a test initializer
  3. two test methods that access the objects. Interestingly, the second test failed until I got it write. Order didn't matter.
  4. a test cleanup method

Problem: I don’t want to create a heavy office application for each test.

Solution: While you can't share COM RCWs across threads, most heavy COM objects such as MS Office applications allow you to get an existing process via Marshal.GetActiveObject(). I create an Office Application wrapper and a builder responsible for its creation. The tests call something like Builder.GetApp(). This gets a new COM RCW that might be an existing MS Application process or a newly launched Application.

Also note that even in Single Threaded Apartment state, you can share these wrappers and COM RCWs in a thread static variable.

Problem 2: What about all those other smaller com objects? I want to set up a scenario during test initialization with lots of cells or ranges, then use those little COM objects in my tests.

Solution: Stores a way to find the COM object during test initialization (store the cell location, a task uniqueid, or something like that), rather than the actual COM RCW object. Then write a utility that provides a quick way to get the COM object for the test from the static variable. For example Wrapper.GetCellForActiveBook (identifier).

Problem 3: But GetActiveObject doesn't always work ...
Right, you can't always use the GetActiveObject trick for all types of objects. I'm still working on this. For example, I haven't been able to create "document" level objects (MSProject.Project, Excel.Workbook) in test initialization yet, and grab that specific instance in the tests (on a different thread). Solutions include.

  1. For some reason, the ActiveObject (ActiveWorkBook, ActiveProject) property gives me a document that I can create in initialization and use through the tests. This limits the test suite to a single document, and there is a risk that another document becomes active, breaking the test. I'm uncomfortable with this ... but using it successfully.
  2. Create a new instance per test. Use helper methods to repeat scenario initialization. While a little slow, I'm prefer this approach, as it keeps the tests clean.
  3. Create a document in initialization and save it to disk. Pull it from disk for each test - this creates the COM RCW on the test thread. I haven't pursued this approach.


Finally, be careful to close those COM objects! It's often harder than you think.

If you are playing with COM from .NET, read and understand Marshal.ReleaseComObject and the very handy Marshal.FinalReleaseComObject. In the solution I describe, you will have to carefully manage MS Office application instances or your test suite will leave dozens (hundreds?) of open MS office applications.

ReleaseComObject is very necessary after working with something like WindowsInstaller.Installer.OpenDatabase. The author of these automation interfaces neglected to implement the close method for database objects, while the C API requires that you close the database. Consequently, ReleaseComObject is the only way to avoid leaking resources and leaving a lock on the MSI file you read.

Other details - In a case where I am either creating a new Office app or stealing an existing process with Marshal.GetActiveObject, it's important for the wrapper to know if it created the Com object (you need to close the application process that you created) or stole it (you should only release the com object).

In summary, I hope I can't remember the details next week.

Anyway, there is lots of complexity in closing COM objects. For this reason, I package all heavy com objects in wrappers. A Builder pattern handles creation, and the IDisposable pattern handles cleanup. Easy. Once done, I can forget about this whole sorry technical deep dive by next week, and think about more important things.

3 comments:

Unknown said...

I actually read that whole post... Now I remember why I'm so glad I haven't had to deal with COM in the past 6 months at work.

Although I did run into the whole STA, MTA thing while writing unit tests with WatiN. I ended up running WatiN on a separate STA thread.

Depechie said...

Hey Cliff,

Could you show us an example of a Office builder wrapper that will shut down the active Office COM RCW?

Because I've currently implemented one myself, a singleton class that holds a Word application. But in the dispose when I try to use the .Quit() I'm still getting the "COM object that has been separated from its underlying RCW cannot be used" exception

Unknown said...

@Depechie: I'm not sorry that I have not worked in COM since posting this blog. I had some trouble with instancing as well - but it would be different with word. You should not use Singletons - a number of actions can disassociate your instance from the Word instance it is controlling.
I used an MS Project Application Wrapper, a Project Wrapper, and a factory class that would either call Marshal.GetActiveObject to get a running instance or new MsProject.Application to create a new instance. This was necessary to correctly manage complexity and avoid the problems you are having.