Dumping Exports From .NET Assembly

Recently I needed to compare two versions of the same, source-unavailable .NET Framework library. The goal was to patch over some minor API changes that were introduced between the two versions. In the C and C++ world this is a straight-forward task—just diff the header files. But more modern languages have eschewed separate interface definitions, in favor of rolling interface and implementation into one.

My first thought was to use JetBrains dotPeek, a CLR decompiler. Its output was not quite what I wanted. At this stage I was not concerned with any potential differences in behavior. With that in mind, a full decompilation proves quite noisy.

What I wanted was some thing closer to Swift's module interface generation, which you can play around with using SourceKitten.

The syntax of these generated module interfaces, while superficially resembling the language, is not actually valid Swift. However, they do provide exactly the level of detail a human needs to grok the API. I wondered if something similar existed in the .NET world.

I had a bit of a search but turned up short. The overwhelming suggestion on StackOverflow was to use C#'s extensive reflection capabilities to explore your desired assembly. So I figured, how hard could it be?

Loading the assembly

Loading an external assembly is as easy as you would hope it to be.

using System.Reflection;
var assembly = Assembly.LoadFrom("ExampleAssembly.dll");

The particularly nice thing is the fact that we can load assemblies that target a different version of .NET to the one we're using. In my case, this means I can use modern C# language features (like top-level statements) even though I am analyzing a library that targets .NET Framework.

Retrieving types

In order to produce good diff output, we need to ensure that we emit types in a consistent order. Using LINQ we will first sort by namespace, then by type name alphabetically. As we are only interested in the types we can actually use, we will also filter out non-public types. We will do the same for nested types; don't worry, we'll get to them later.

var types = assembly.GetTypes()
.Where((type) => type.IsPublic && !type.IsNested)
.OrderBy((type) => type.Namespace)
.ThenBy((type) => type.Name);

We also need a way to indent our output. Luckily Microsoft provides us with a class built for just this purpose: IndentedTextWriter.

var writer = new IndentedTextWriter(Console.Out, "\t");

With that, we can put together our main loop:

var currentNamespace = "";

foreach (var type in types)
{
if (currentNamespace != type.Namespace)
{
// Close current namespace block
if (currentNamespace != "")
{
writer.Indent--;
writer.WriteLine("}");
writer.WriteLine();
}

// Declare new namespace
writer.WriteLine($"namespace {type.Namespace}");
writer.WriteLine("{");
writer.Indent++;

currentNamespace = type.Namespace;
}
else
{
writer.WriteLine();
}

WriteType(writer, type);
}

// Close last namespace
writer.Indent--;
writer.WriteLine("}");
writer.WriteLine();

So far so good, right?

Members

This is where it gets a bit interesting. It turns out there is no method on MemberInfo to convert it into a string that resembles a declaration. A library that can emit method signatures does exist, but I could not find a library that would handle fields and properties as well. As such, I decided to roll my own.

We will group each type of member separately and sort each group alphabetically with static and constant members appearing first.

var fields = type.GetFields(bindingFlags)
.OrderByDescending((field) => field.IsStatic)
.ThenByDescending((field) => field.IsLiteral)
.ThenBy((field) => field.Name);

var ctors = type.GetConstructors(bindingFlags);

var events = type.GetEvents(bindingFlags)
.OrderByDescending((e) => e.GetAddMethod()?.IsStatic ?? false)
.ThenBy((e) => e.Name);

var props = type.GetProperties(bindingFlags)
.OrderBy((prop) => prop.Name);

var methods = type.GetMethods(bindingFlags)
.OrderByDescending((method) => method.IsStatic)
.ThenBy((method) => method.Name);

Generics

In most cases we can use the Type.FullName property as is in order to print the appropriate type. However if the type is a generic its name will be somewhat mangled (e.g. IList<T> becomes System.Collections.Generic.IList`1).

void WriteTypeName(IndentedTextWriter writer, Type type, bool fullName = true)
{
var name = fullName ? type.FullName ?? type.Name : type.Name;

if (type.IsGenericType)
{
// Unmangle name
var length = name.LastIndexOf('`');
if (length > 0)
{
name = name.Substring(0, length);
}

writer.Write(name);
WriteGenericTypeArguments(writer, type.GetGenericArguments());
}
else
{
writer.Write(name);
}
}

WriteGenericTypeArguments appends the generic's type arguments in angled brackets, calling WriteTypeName in a recursive fashion to handle nested generics.

void WriteGenericTypeArguments(IndentedTextWriter writer, Type[] types)
{
writer.Write("<");

for (var i = 0; i < types.Length; i++)
{
if (i > 0)
{
writer.Write(", ");
}

WriteTypeName(writer, types[i]);
}

writer.Write(">");
}

Properties

Under the hood the C# compiler translates the getters and setters of properties into actual methods. The resulting methods are prefixed with get_ and set_ respectively; these are known as special names. We have to hide these methods as we will print the properties separately.

if (method.IsSpecialName &&
(method.Name.StartsWith("get_") ||method.Name.StartsWith("set_")))
{
continue;
}

We then display whether the getter and/or setter is defined when printing the property itself.

foreach (var accessor in prop.GetAccessors())
{
if (accessor.Name.StartsWith("get_"))
{
writer.Write(" get;");
}
else if (accessor.Name.StartsWith("set_"))
{
writer.Write(" set;");
}
}

Enumerations

We have to iterate over enum members in a different fashion to other type members.

void WriteEnumMembers(IndentedTextWriter writer, Type type)
{
var names = Enum.GetNames(type);
var values = Enum.GetValues(type);

for (var i = 0; i < names.Length; i++)
{
var name = names[i];
var value = values.GetValue(i)!;
object v = Convert.ChangeType(value, Type.GetTypeCode(type));

writer.WriteLine($"{name} = {v};");
}
}

Nested types

Nested types are handled by calling WriteType recursively.

foreach (var nestedType in type.GetNestedTypes().OrderBy((t) => t.Name))
{
WriteType(writer, nestedType);
}

Result

Here's the output for System.Collections.Generic.Queue<T>:

namespace System.Collections.Generic
{
class Queue<T> : IEnumerable<T>, System.Collections.IEnumerable, System.Collections.ICollection, IReadOnlyCollection<T>
{
Queue<T>();
Queue<T>(System.Int32 capacity);
Queue<T>(IEnumerable<T> collection);
System.Int32 Count { get; }
System.Void Clear();
System.Boolean Contains(T item);
System.Void CopyTo(T[] array, System.Int32 arrayIndex);
T Dequeue();
System.Void Enqueue(T item);
System.Int32 EnsureCapacity(System.Int32 capacity);
Enumerator<T> GetEnumerator();
T Peek();
T[] ToArray();
System.Void TrimExcess();
System.Boolean TryDequeue(T& result);
System.Boolean TryPeek(T& result);
sealed struct Enumerator<T> : System.ValueType, IEnumerator<T>, System.IDisposable, System.Collections.IEnumerator
{
T Current { get; }
virtual System.Void Dispose();
virtual System.Boolean MoveNext();
}
}
}

My script only ended up being less than 300 lines. I did not cover all cases correctly, including operator overloading. Nevertheless the output does reasonably resemble C#, and proved to be good enough for my purposes.