Monday, January 28, 2008

Finally Understanding COM After Changing a Light Bulb

Recently I discovered that my passenger low-beam headlight burnt out. More accurately, the nice police officer who pulled me over and only gave me a verbal warning let me know it had burnt out. With this new knowledge, I went to the local auto parts store to find a new one.

I uninstalled the broken one and brought it in with me. On the bottom of it, there was a "9006" identifier. The lady at the store helped me find another "9006" low beam headlight and it took about a minute to install.

It sort of reminds me of how relatively simple COM is at its core.

For the past decade, the sum of all of my knowledge about COM was approximately:

  • It stood for Component Object Model
  • It looked painful: avoid it like you'd avoid eating at a restaurant publicly condemned by the Board of Health.
  • Somehow it involved interfaces.
  • You had to manage your own memory.
  • Windows uses it everywhere... somehow.
  • It was replaced by .NET, so forget about it.
  • I could use COM classes (like the one for IE) in VB and .NET, but I had no idea how it worked under the covers.
  • Nothing on earth was worse than "DCOM," or so I'd been told.

You probably knew more than that, but that's probably what I would have squeaked out if push came to shove. In my life, I never really had to really understand COM at all, so I didn't even bother.

Recently, I had to break down and actually learn enough about it to get some things done. I learned that it's really not that bad. Please consider this as the absolute minimum that a Windows programmer should know about COM. I wish I had, it would have made things faster to learn.

The first thing to learn is that there are three major figures in COM:







Client

A client is somebody that wants to get a job done. Maybe he's trying to get a light bulb for his car. Or, perhaps it's someone who's trying to play some music. Another possibility it's someone that wants to display a web page or talk to Outlook. It doesn't really matter, it's someone that wants to do something.








Server

A server is something that wants to do some tasks for you. He's the guy that the client is pushing around. He only does a few tasks that are on a menu that the client knows about.





(Bee) "Hive" Keeper (a.k.a. "Registry")

The bee "hive" is buzzing with activity. Imagine a bee keeper that stores all of his notes on the individual "cells" of a honey comb. It's sort of like a big telephone directory, but one that was designed by someone who hates bees.

Now that we know the key players, let's get back to our light bulb. Imagine that all you had in the world were the three major players above. What would have need to have happened?

Well, the server in this case would be a light bulb. Wanting to copy Apple's naming convention, let's call him iLight, or more boringly: ILight. Now ILight had to advertise himself to the world. The menu of things that the client can request him to do include "Turn On" and "Turn Off" and maybe "CalculateWattage." He has a boring part number just like my real light did (e.g. 9006). Let's say that he walked over to the hive and wrote down in the directory that "ILight is part #9006" and also wrote down exactly where in the store to find him. The only real requirement is that this must happen before any clients can use him.

The client is just like me and knows he needs something that can "Turn On," "Turn Off," and "CalculateWattage." He looks in his car manual and finds out he needs an ILight. He goes to Hive and finds that "ILight" is part number "9006." Next, he goes to the "cell" where information on part "9006" is stored and finds exactly where to find it. He picks up the ILight and lives happily ever after.

See? Not too hard right? At a high level, it's pretty simple. For fun, let's dive deeper into the reality that is COM.



"ILight" would more than likely have started his life in Visual Studio as an ATL (ActiveX Template Library) project. ATL is just a simple way of dealing with the gooey parts of COM that don't really matter that much. ATL projects are written in C++, so they're sure to bring fond memories of your college days back.

In the project, we'd create a new class of type "ATL Simple Object" (see, it's simple :))



And give it a name of "Light" and have the wizard automatically fill in the details:



It doesn't matter if we get scared after clicking next, we just need to hit finish and trust the defaults (remember, this is the absolute minimum you should know):



Next, go to the class view window on the right hand side of their screen and right click on the "ILight" interface and add a few methods:



The first method being "TurnOn" (and similarly turn-off) that has no arguments:



To make it interesting, let's say there was one more method called "CalculateWattage" that takes two parameters (volts and amps) and returns the wattage:



Now, one would go to the Light.cpp and fill in the definition of these functions. In this example, we'll just show a message box for the turn on/turn off commands. Note that for the "CalculateWattage" function, we return the result via a parameter pointer. This is important since the return value for COM methods is almost always the status of whether it succeed or not. Successful responses always start with "S_" and errors always start with "E_". The COM-ese for this is the "HRESULT" that you can think of as "here's the result of the function call."



If you actually spent your time in C++ as a client, you'd have to make calls like this:



You can't simply check a value to be equal to something since any result that starts with "E_" is a failure (popular ones include E_FAIL and E_NOINTERFACE). The "FAILED" macro just checks to see if the most significant bit is set.

Now, when going to build the project on Vista, a curious error is reported:



What happened? Well, answering that will take us back to the hive.

When we created the ILight interface and its concrete implementation, Visual Studio automatically created a "part number" for us that is a really huge number called a Globally Unique Identifier (GUID). The number is so big that it makes us yearn for the days of simple part numbers like 9006. In order for clients to be able to find our component, they need to look up our part number or our name in a special area in the registry/hive. Specifically, the servers need to put their details under the hive key (HKEY) that is the root of all classes/servers. This is conveniently called "HKEY_CLASSES_ROOT":



But we don't want just anyone being able to write there right? If any ol' server could just write into that area, they could replace a good implementation of ILight with one that say, recorded all your keystrokes and sent them to some COM-enlightened hacker several timezones away. This is just one of the reasons why setup programs require a User Account Control (UAC) elevation before they will start. Microsoft is guessing that setup will probably want to register some COM server or write to a directory that the standard user doesn't have permission to write to (e.g. "C:\Program Files"), and gets it out of the way early.

When you go to build the project, Visual Studio will create a helper batch file for you in your project's debug directory and run it. If you use a tool like Process Monitor, you can see all of this happening:



If you're really quick, you'll be able to copy the batch file before Visual Studio deletes it after it gets an error. Here's what the batch file looks like:



All that the batch file does is call out to "regsvr32," which clearly stands for "register 32 bit COM server." I say that a little tounge-in-cheek because I had seen the "regsvr32" name hundreds of times before and never really understood what a server meant. "What's a server?!" However, in hindsight, it's quite simple. It's just a COM class that can do work.

When regsvr32 goes to write into the HKEY_CLASSES_ROOT area (abbreviated HKCR), it gets denied:



Note that the text after the "VCReportError" label in the batch file is exactly what was reported to us in Visual Studio's error window.

What are we to do now? Well, the simple answer is to run Visual Studio again, but this time with administrator privileges and try it again:



This time, DevEnv.exe has administrator credentials and this causes the batch file to run under administrator credentials which, you guessed it, causes regsvr32.exe to run with administrator credentials and therefore allows us to write into HKEY_CLASSES_ROOT.

Now, our component is registered. To learn exactly what this means, we need to use another tool called "OLE-COM Object Viewer" (oleview.exe) that's part of the Windows SDK. OleView simply lets you see all of the components/servers that are registered on your machine. It's like browsing an auto parts store. If we search for our "Light" component, we'll see a screen like this:



Which is full of all sorts of great information about our Light pulled from many different areas of the hive. For example, it says that our light class implements the ILight interface that has the interface identifier of "BBABC3ED-E2B6-4023-AE58-1B04E80E0DAE" and the concrete class has a part number/class identifier of "98C0E3FF-264C-4919-8DE6-F4D87B83D779." Furthermore, the location in the "store" is on the file system at "C:\Users\Jeff.Moser\Documents\Visual Studio 2005\Projects\LightBulb\LightBulb\Debug\LightBulb.dll". The class has a version specific name of "Acme.Light.1", and it also goes by the version non-specific name of "Acme.Light".

Impressive! It's a little more complicated than the auto parts store, but not that much. The hive is buzzing with information about how to find our component and what it provides.

Now, let's jump over to client land. The client can be anything that supports COM. For example, VBA in Excel. We can either add a reference to it by clicking Tools*References:



and then write some VBA code to use it:



Note that we can do the exact same code in C# in a similar way:


Neat huh? Note that we didn't have to check for HRESULTs like we would have done in C++. The reason is that under the covers, the VBA runtime (and C# .NET Runtime Callable Wrapper and CLR) do that for you. If the function returns an error, an exception is generated so that you can't ignore it.

Another nice feature is that we can take advantage of the fact that the hive can retrieve information about our class just by its string name (ProgID). The following code has the same result for the CalculateWattage call:



We can do the same thing in C#, but the syntax will be little messier until C# 4.0's support for dynamic lookup is available:




It's the exact same idea, but we get less language support.

So that is what COM is all about. Well, that's what I would say is an absolute minimum to get by without assuming things are just magic. I left out a few details that aren't 100% necessary to know:


  • No sane person would write a COM implementation without using a framework like ATL to handle all the "goo."
  • A middleman like COM (or CORBA for that matter) must exist because in the wild, you get inconsistencies that prevent you from using raw binaries directly in your code. For example, different C++ compilers mangle function names differently. If you try to allow binary interoperability, you'll inevitably recreate something like COM.
  • COM actually creates a factory class from your DLL that, in turn, creates your class.
  • Your don't have to implement your class as a DLL (in process). You can have it be an EXE (out of process) that is running. In this case, the EXE runs and registers itself with the OS and says that it's running and ready for work. If it isn't already running, the OS will start it.
  • All classes must implement the IUnknown interface. This interface lets you "cast" your pointer to another interface (via the QueryInterface method) and it also keeps track of memory.
  • The pointer that you get back from COM is actually an entry into your class's v-table for the requested interface. This is what ultimately drives the requirement that interfaces must never ever change once published in the wild. Note that all "casts" of a pointer must go through COM. This is because it's important that you do essentially static_cast instead of a reinterpret_cast. The latter would give you weird results if you tried it on a pointer that didn't have the method you wanted.
  • Since an interface cannot change once it is published, and because a v-table is used, the common pattern is to use an increasing number after the base name for each successive revision and have the new interface inherit from the old one. For example, IWebBrowser2 inherits from IWebBrowser and adds methods to the end of it.
  • COM doesn't have a garbage collector like .NET and Java does. For this reason, you have to explicitly keep track of how many instances of your COM class there are. When the number drops to 0, it can be removed from memory. IUnknown handles this. ATL gives you a good implementation of this automatically.
  • By default, COM uses a message queue for coordination. This introduces several different "apartment threading models." These strictly exist to make sure you're careful with multithreading and access to shared state. You'll eventually need to know more about these. The apartments get created when you call COM/COmponent Intialize (CoInitialize) or the simpler, CoCreateInstance.
  • When a .NET class calls a COM class, a Runtime Callable Wrapper (RCW) is created to handle all the IUnknown goo.
  • When a COM client needs to call .NET code, an aptly named "COM Callable Wrapper" (CCW) is created and used. The neat trick is that this is essentially a reference to the .NET core runtime with your assembly passed in as an argument. This is how the .NET side of the house is bootstrapped.
  • IDispatch is an interface that essentially lets you call functions by name and gives you a poor-man's version of .NET's reflection. This is the magic that allows scripting languages to work so well. Instead of binding to a specific function at compile time and getting compiler support, you can do runtime lookups. It essentially allows for the differences between my first and second example. Note that my Light class implemented both ILight andIDispatch. Again, ATL handled all the magic here.
  • As you dig deeper into COM, you'll note that the Interface Description Language (IDL) plays an important role in making sure all languages of COM understand the details about your component properly. Basically, languages agree on how they'll handle things defined in IDL. Once again, ATL hides most of this
  • Note that to use a COM class, we had to make an entry into the registry. This entry requires elevated permissions like I mentioned earlier. This is a bit of an overkill if your application is the only one that uses it. I think that Microsoft realized this as they were making the big UAC push in the development of Vista. This led to an update in XP SP1 called "Registration-Free COM" which allows you to create a file that is named the same as your .dll/.exe except that it ends in the ".manifest" extension. It's an XML file that has the same type of information that the hive/registry has. However, it doesn't require you to have the elevation to get an entry into HKEY_CLASSES_ROOT. It's useful for using in low permission environments.
  • There are a lot of good articles on COM, they're just hard to find. I've tagged a few that were helpful to me on del.icio.us. Please let me know if you think I missed a good one and I'll check it out.

COM is a necessary layer due to the binary incompatibility problems. Since we now have .NET, when you look back at COM, it's sort of like your grandparents telling you how they washed clothes without electricity. It was a bit more involved and tedious, but it got the job done. COM started in the early 90's, well before Java and the CLR with its unified type system.
Since many of the core classes/servers of Windows and Microsoft Office use COM, it will be around for a long, long, long time.

It's good that I finally understand it enough to make use of it effectively!

How did you learn about COM?

UPDATE: I wrote about "Using Obscure Windows COM APIs in .NET."


kick it on DotNetKicks.com

Saturday, January 19, 2008

Constants on the left are better, but this is often trumped by a preference for English word order.

Typically we all write comparison statements like this:

But, the following is just as valid:

The bottom style is a bit safer in languages like C because if you forget to put a double equals sign

the compiler will assign 5 to the "currentValue" variable and the result will be the value of the assignment, which is 5. Anything that isn't zero is "truthy" and will cause the "if" branch to be taken. If you didn't intend for this and you're lucky enough to have compiler warnings turned all the way up, you'll get a helpful message like "warning C4706: assignment within conditional expression."

Note that the bottom expression can never have this problem. The number 5 is constant and can not be assigned to, so you get a compile time error:

Therefore, getting in the habit of putting constants on the left hand side would have prevented the possibility the unintended assignment class of error.

Much less well known is that this type of thinking is also helpful when an "=" sign isn't present at all. Consider this function:


If you're a developer, you'll likely run into a lot of code just like this. Maybe the code you see won't check for bad input like they should, so you'll occasionally get a "NullReferenceException" which makes life no fun.

An astute observer would realze the code could be written:

where the check for null is eliminated altogether and the static version of Equals is called. Since Equals does a null check internally, it'd be superfluous to do it twice.

This is usually where most people stop. Note that we could take advantage of constants on the left and do this:



the result is that you save around 10 characters and still never throw a "NullReferenceException." This takes advantage of the fact that string literals are simply string objects themselves.

Again, putting the constant on the left eliminated the possibility of forgetting about nulls. But lets be honest, I don't do this in production code and you probably don't either.

Why not?

Well, it just feels wrong. Go ahead and look at any textbook you had in college or even the latest programming books and look at their code samples. While some of the more pragmatic ones for embedded systems might recommend putting integer constants on the left hand side, you'll almost never see the string example. I've only seen it one book myself.

But again, why?

I think the reason comes down to the fact that English sentences are almost always subject-verb-object where you say the subject noun before the object noun. When we write code, we probably unconsciously think something along the lines of "if this (subject) thingy (is equal to) this (object) thingy then do such and such." Just as saying "The program is what I wrote" is far and away weirder than saying "I wrote the program," putting constants on the left feels weird and I don't do it because I want my code to be as easy to read by others as possible.

Having subject-verb-object sentences is not the only way to express yourself. In the "Koine" Greek language that was spoken by most of the known world 2000 years ago, the subject and object could come in any order. The order you chose just let the reader know what you wanted to emphasize. The verb and nouns have endings on them (inflections/case) to let you know what each word means. Likewise, in Japanese you almost always do subject-object-verb. Word order is just one feature of a language, not some universal standard.

It makes me wonder how things like the Sapir-Whorf hypothesis, which states that the language you think in can affect how you understand things and what type of thoughts you can have, might apply to the types of problems programmers face. However, I mean it in the opposite sense of the types of things your programming language lets you do. It's more of "what are the bad things that can be caused by how you do think."

In the end, I love the simplicity and terseness of putting constants on the left. However, the fact that English is the native language of almost everyone who looks at my code, and the overwhelming precedent of style has been to put constants on the right, I have no real choice but to put them on the right.

But I can't but wonder what would have happened had the Fortran or C specs been conceived in Greek or Japanese.

Saturday, January 12, 2008

Borrowing Ideas From 3 Interesting *Internal* Classes in the .NET 3.5 Framework To Help Control "Goo" In Your Code

As programmers, we often have to concern ourselves with scaffolding "goo" in our code as in the start of the System.Collections.Queue.CopyTo method:

1 public virtual void CopyTo(Array array, int index)
2 {
3     if (array == null)
4     {
5         throw new ArgumentNullException("array");
6     }
7     if (array.Rank != 1)
8     {
9         throw new ArgumentException(Environment.GetResourceString("Arg_RankMultiDimNotSupported"));
10     }
11     if (index <>
12     {
13         throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
14     }
15     if ((array.Length - index) < this._size)
16     {
17         throw new ArgumentException(Environment.GetResourceString("Argument_InvalidOffLen"));
18     }


All of the checking is considered a best practice in our field. We are taught to never trust input and to be paranoid about what get into our functions.

But it took 16 lines!

I'm always on the lookout for how to improve my code writing on gooey areas like this. Recently I was doing some reflectoring into the .NET 3.5 framework and noticed two helper internal classes that the smart folks on the LINQ team created to help get their job done:

Exhibit A: System.Linq.Strings


Our first stop takes us to generating error messages to give to the user. We're told that it's a best practice to always use resources for strings that are visible to a user and this helper class makes it easy to do that so that we can write statements like

"throw new ArgumentException(Strings.ArgumentNotIEnumerableGeneric(p0));"

Doesn't that last line just feel better than something like the CopyTo way of "

throw new ArgumentException(Environment.GetResourceString("ArgumentNotIEnumerableGeneric"));
"? Furthermore, it helps to insulate you from the ramifications of renaming a resource. It also lets you use IntelliSense while writing code and lets you use easily refactoring tools if you want to change the name later.

Exhibit B: System.Linq.Error


The team went one further and created another helper class to create error messages based off the Strings class as in:

internal static Exception ArgumentNotIEnumerableGeneric(object p0)
{
   return new ArgumentException(Strings.ArgumentNotIEnumerableGeneric(p0));
}

Once you have this class, you can now have lines like this one from the System.Linq.Queryable.AsQueryable method:

if (type == null)
{
     throw Error.ArgumentNotIEnumerableGeneric("source");
}

To me, that looks much more readable/declarative/maintainable/fluent than how Queue.CopyTo does the same type of thing.

This comes to the limit of where the Linq team took it and which brings us to the final internal class of note.

Exhibit C: Microsoft.Contracts.Contract

The Contract class has some interesting use in 3.5 classes as in a constructor of (the unfortunately also internal class) System.Numerics.BigInteger:

3 Contract.Requires(_data != null);
4 Contract.Requires((_sign >= -1) && (_sign <= 1));
5 Contract.Requires((_sign != 0)  (GetLength
 
This is a bit more declarative than the Linq way, but also achieves roughly the same goal since Requires has this implementation:

1 public static void Requires(bool b)
2     {
3         if (!b)
4         {
5             throw new PreconditionException();
6         }
7     }


It seems that if we combined the best of both of the ideas from the two teams, we'd get something like this:


1 internal static class Guard
2 {
3     internal static void ArgumentNotNull(object value, string paramName)
4     {
5         if (value == null)
6         {
7             throw Error.ArgumentNull(paramName);
8         }
9     }
10 }

This allows for this usage: "Guard.ArgumentNotNull(data, "data");" which would throw the correct exception. Now, anywhere in your code where you want to require that an argument isn't null, you simply call this method. As an added benefit, by right clicking on the "ArgumentNotNull" method in Visual Studio, you can find all references to see where you had to do that check in your code. Better still, is that you fall into the Pit of Success of doing the right thing by doing less work!

You could use something like the Enterprise Library's Validation Application Block to achieve similar results, but the "Guard" approach seems very simple and especially useful when you can't use other libraries for one reason or another.

It's clear that Microsoft.Contracts.Contract was inspired by the Spec# project as seen in the attributes and method bodies of some methods like "Invariant":

1 [Pure, Conditional("USE_SPECSHARP_ASSEMBLY_REWRITER")]
2 public static void Invariant(bool b)
3 {
4   string text1 = "This method will be modified to the following after rewriting:" + "if (!b) throw new InvariantException();";
5 }


I believe that C# 4.0 (or possible 5.0) will bring in Spec#'s ideas to the masses to allow for things like:

1 class ArrayList 
2 { 
3     void Insert(int index , object value)
4         requires 0 <= index && index <= Count otherwise ArgumentOutOfRangeException;
5         requires !IsReadOnly && !IsFixedSize otherwise NotSupportedException;
6         ensures Count == old(Count) + 1;
7         ensures value == this[index];
8         ensures Forall{int i in 0 : index ; old(this[i]) == this[i]};
9         ensures Forall{int i in index : old(Count); old(this[i]) == this[i + 1]};
10     {
11         ...
12     }
13 }

All of which will surely make code more reliable through static analysis if used correctly.

At the very least, it's good to see the gems tucked away in the internal classes of the framework to see how things like Strings, Error, and Contract/Guard can make even your C# 2.0 code look better.

What do you think? What types of helper classes do you use to make your code have less "goo?"

UPDATE: Thanks to "Tweezz" in the comments for pointing out the general idea for Guard goes back to Eiffel's trademarked "Design by Contract" philosophy.

UPDATE 2: The internal class System.Data.ExceptionBuilder does the same thing as System.Linq.Errors, but doesn't punt to a Strings equivalent class.

UPDATE 3: Andrew Matthews has some interesting applications of Design by Contract using C# 3.0. Check it out here.

kick it on DotNetKicks.com

P.S. Thanks to my coworker, Dan Rigsby, for introducing me to the "Guard" class idea!

Wednesday, January 9, 2008

Tip: Don't Forget About Wingdings When Making Icons

At some time in your career, you'll probably need to create an icon for your application, a button, or perhaps a menu item. One place that is often overlooked is all of the *Dings fonts such as Wingdings, Wingdings 2, Wingdings 3, and Webdings. Consider them to be a lot of vector graphics that you don't have to pay extra for. Any graphics program that supports text (like my favorite, paint.net) will already support them.


While they might not stand on their own to fit your particular need, they're really good for composition with other primitive images.