How Tabindex Works

In spite of my first foray into professional web development being over a decade ago, I cannot say I have have ever had to explicitly set tabindex. This became a problem while developing my auto-tab custom element. I naively assumed there would be a method I could use to determine the next focusable element. While there has been at least one discussion about adding such functionality to the spec, nothing has materialized to date. This meant that I would have to roll my own method and, as part of that, get up to speed with the behavior of tabindex. So here's a (likely incomplete) rundown on how tabindex works:

Explicit Is Prioritized

Consider the following example:

<button id="first" type="button">
no tabindex
</button>
<button id="second" type="button" tabindex="2">
tabindex 2
</button>
<button id="third" type="button" tabindex="3">
tabindex 3
</button>
<button id="fourth" type="button">
no tabindex
</button>

You might assume that the first button element would be implicitly assigned a tabindex value of 1, and as such would be first element focused after pressing Tab.

But this is not the case. After pressing Tab, the first element to receive focus will be #second, followed by #third before #first and finally #fourth. If even a single element in a focus chain has the tabindex attribute defined it is prioritized over all other elements. Elements without tabindex are deferred to the end of the chain.

<button id="first" type="button">
no tabindex
</button>
<button id="second" type="button" tabindex="0">
tabindex 0
</button>
<button id="fourth" type="button">
no tabindex
</button>

But this is not true if the tabindex is zero.

A tabindex of zero is generally used to introduce an otherwise unfocusable element into the focus chain without interfering with the order declared in the document.

Less Than Zero

A negative tabindex also denotes a special case.

Elements with a negative tabindex are removed from the focus chain. However, they are still able to be focused pragmatically using HTMLElement's focus method.

Radio Buttons

Without tabindex, the focusable order is grouped by the name attribute of the radio input elements (per form) with the arrow keys being used to select values within the group.

This makes things interesting once we start assigning explicit tabindex values.

<fieldset>
<legend>First</legend>
<label>
<input name="first" value="a" type="radio">
no tabindex
</label>
<label>
<input name="first" value="b" type="radio">
no tabindex
</label>
</fieldset>
<fieldset>
<legend>Second</legend>
<label>
<input name="second" value="a" type="radio" tabindex="1">
tabindex 1
</label>
<label>
<input name="second" value="b" type="radio" tabindex="3" checked>
tabindex 3
</label>
</fieldset>
<fieldset>
<legend>Third</legend>
<label>
<input name="third" value="a" type="radio">
no tabindex
</label>
<label>
<input name="third" value="b" type="radio" tabindex="2">
tabindex 2
</label>
</fieldset>

Which gives us the following result:

Notice that the element with a tabindex of 1 is skipped as the pair of input[name="second"] elements are considered to have a tabindex of 3. Note that this changes to 1 if the value of second becomes a.

Determining Focus Order Programmatically

Putting together what we have learned, we can now attempt to implement a method to determine the next tabbable element. Using Alice Boxhall's solution of using a TreeWalker:

private _walk(): void {
const treeWalker = document.createTreeWalker(
this,
NodeFilter.SHOW_ELEMENT,
{
acceptNode: (node) =>
(node as HTMLElement).tabIndex >= 0
? NodeFilter.FILTER_ACCEPT
: NodeFilter.FILTER_SKIP,
},
);

this._elements = [];
for (
let node = treeWalker.nextNode();
node !== null;
node = treeWalker.nextNode()
) {
this._elements.push(node as HTMLElement);
}

// Sort in ascending order, moving all 0s to the end
this._elements.sort((a, b) => {
const aTabIndex = a.tabIndex;
const bTabIndex = b.tabIndex;

if (aTabIndex === 0) {
return 1;
}

if (bTabIndex === 0) {
return -1;
}

return a.tabIndex - b.tabIndex;
});
}

But this does not account for the unique behavior of radio buttons as we discussed before. I chose to handle this as part of a different routine:

private _update(currentElement: HTMLElement): void {
...
// Special case for radio buttons
if (nextElement.tagName === "INPUT") {
const nextInputElement = nextElement as HTMLInputElement;
const type = nextInputElement.type;
const name = nextInputElement.name;

if (type === "radio" && name != null) {
const parentElement =
nextInputElement.closest("form") ?? document.body;
const selectedInputElement = parentElement.querySelector(
`input[type=radio][name=${name}]:checked`,
);

if (selectedInputElement != null) {
this._update(nextInputElement);
return;
}
}
}

nextElement.focus();
...
}

Dumping Exports From .NET Assembly

Recently I needed to compare two versions of the same, source-unavailable .NET Framework library. The goal was to patch over some minor API changes that were introduced between the two versions. In the C and C++ world this is a straight-forward task—just diff the header files. But more modern languages have eschewed separate interface definitions, in favor of rolling interface and implementation into one.

My first thought was to use JetBrains dotPeek, a CLR decompiler. Its output was not quite what I wanted. At this stage I was not concerned with any potential differences in behavior. With that in mind, a full decompilation proves quite noisy.

What I wanted was some thing closer to Swift's module interface generation, which you can play around with using SourceKitten.

The syntax of these generated module interfaces, while superficially resembling the language, is not actually valid Swift. However, they do provide exactly the level of detail a human needs to grok the API. I wondered if something similar existed in the .NET world.

I had a bit of a search but turned up short. The overwhelming suggestion on StackOverflow was to use C#'s extensive reflection capabilities to explore your desired assembly. So I figured, how hard could it be?

Loading the assembly

Loading an external assembly is as easy as you would hope it to be.

using System.Reflection;
var assembly = Assembly.LoadFrom("ExampleAssembly.dll");

The particularly nice thing is the fact that we can load assemblies that target a different version of .NET to the one we're using. In my case, this means I can use modern C# language features (like top-level statements) even though I am analyzing a library that targets .NET Framework.

Retrieving types

In order to produce good diff output, we need to ensure that we emit types in a consistent order. Using LINQ we will first sort by namespace, then by type name alphabetically. As we are only interested in the types we can actually use, we will also filter out non-public types. We will do the same for nested types; don't worry, we'll get to them later.

var types = assembly.GetTypes()
.Where((type) => type.IsPublic && !type.IsNested)
.OrderBy((type) => type.Namespace)
.ThenBy((type) => type.Name);

We also need a way to indent our output. Luckily Microsoft provides us with a class built for just this purpose: IndentedTextWriter.

var writer = new IndentedTextWriter(Console.Out, "\t");

With that, we can put together our main loop:

var currentNamespace = "";

foreach (var type in types)
{
if (currentNamespace != type.Namespace)
{
// Close current namespace block
if (currentNamespace != "")
{
writer.Indent--;
writer.WriteLine("}");
writer.WriteLine();
}

// Declare new namespace
writer.WriteLine($"namespace {type.Namespace}");
writer.WriteLine("{");
writer.Indent++;

currentNamespace = type.Namespace;
}
else
{
writer.WriteLine();
}

WriteType(writer, type);
}

// Close last namespace
writer.Indent--;
writer.WriteLine("}");
writer.WriteLine();

So far so good, right?

Members

This is where it gets a bit interesting. It turns out there is no method on MemberInfo to convert it into a string that resembles a declaration. A library that can emit method signatures does exist, but I could not find a library that would handle fields and properties as well. As such, I decided to roll my own.

We will group each type of member separately and sort each group alphabetically with static and constant members appearing first.

var fields = type.GetFields(bindingFlags)
.OrderByDescending((field) => field.IsStatic)
.ThenByDescending((field) => field.IsLiteral)
.ThenBy((field) => field.Name);

var ctors = type.GetConstructors(bindingFlags);

var events = type.GetEvents(bindingFlags)
.OrderByDescending((e) => e.GetAddMethod()?.IsStatic ?? false)
.ThenBy((e) => e.Name);

var props = type.GetProperties(bindingFlags)
.OrderBy((prop) => prop.Name);

var methods = type.GetMethods(bindingFlags)
.OrderByDescending((method) => method.IsStatic)
.ThenBy((method) => method.Name);

Generics

In most cases we can use the Type.FullName property as is in order to print the appropriate type. However if the type is a generic its name will be somewhat mangled (e.g. IList<T> becomes System.Collections.Generic.IList`1).

void WriteTypeName(IndentedTextWriter writer, Type type, bool fullName = true)
{
var name = fullName ? type.FullName ?? type.Name : type.Name;

if (type.IsGenericType)
{
// Unmangle name
var length = name.LastIndexOf('`');
if (length > 0)
{
name = name.Substring(0, length);
}

writer.Write(name);
WriteGenericTypeArguments(writer, type.GetGenericArguments());
}
else
{
writer.Write(name);
}
}

WriteGenericTypeArguments appends the generic's type arguments in angled brackets, calling WriteTypeName in a recursive fashion to handle nested generics.

void WriteGenericTypeArguments(IndentedTextWriter writer, Type[] types)
{
writer.Write("<");

for (var i = 0; i < types.Length; i++)
{
if (i > 0)
{
writer.Write(", ");
}

WriteTypeName(writer, types[i]);
}

writer.Write(">");
}

Properties

Under the hood the C# compiler translates the getters and setters of properties into actual methods. The resulting methods are prefixed with get_ and set_ respectively; these are known as special names. We have to hide these methods as we will print the properties separately.

if (method.IsSpecialName &&
(method.Name.StartsWith("get_") ||method.Name.StartsWith("set_")))
{
continue;
}

We then display whether the getter and/or setter is defined when printing the property itself.

foreach (var accessor in prop.GetAccessors())
{
if (accessor.Name.StartsWith("get_"))
{
writer.Write(" get;");
}
else if (accessor.Name.StartsWith("set_"))
{
writer.Write(" set;");
}
}

Enumerations

We have to iterate over enum members in a different fashion to other type members.

void WriteEnumMembers(IndentedTextWriter writer, Type type)
{
var names = Enum.GetNames(type);
var values = Enum.GetValues(type);

for (var i = 0; i < names.Length; i++)
{
var name = names[i];
var value = values.GetValue(i)!;
object v = Convert.ChangeType(value, Type.GetTypeCode(type));

writer.WriteLine($"{name} = {v};");
}
}

Nested types

Nested types are handled by calling WriteType recursively.

foreach (var nestedType in type.GetNestedTypes().OrderBy((t) => t.Name))
{
WriteType(writer, nestedType);
}

Result

Here's the output for System.Collections.Generic.Queue<T>:

namespace System.Collections.Generic
{
class Queue<T> : IEnumerable<T>, System.Collections.IEnumerable, System.Collections.ICollection, IReadOnlyCollection<T>
{
Queue<T>();
Queue<T>(System.Int32 capacity);
Queue<T>(IEnumerable<T> collection);
System.Int32 Count { get; }
System.Void Clear();
System.Boolean Contains(T item);
System.Void CopyTo(T[] array, System.Int32 arrayIndex);
T Dequeue();
System.Void Enqueue(T item);
System.Int32 EnsureCapacity(System.Int32 capacity);
Enumerator<T> GetEnumerator();
T Peek();
T[] ToArray();
System.Void TrimExcess();
System.Boolean TryDequeue(T& result);
System.Boolean TryPeek(T& result);
sealed struct Enumerator<T> : System.ValueType, IEnumerator<T>, System.IDisposable, System.Collections.IEnumerator
{
T Current { get; }
virtual System.Void Dispose();
virtual System.Boolean MoveNext();
}
}
}

My script only ended up being less than 300 lines. I did not cover all cases correctly, including operator overloading. Nevertheless the output does reasonably resemble C#, and proved to be good enough for my purposes.

Based Pointers

One of the more curious C/C++ extensions I have come across is from Microsoft in the form of __based pointers. Unlike a traditional pointer, a based (on) pointer represents a memory address relative to another.

void *base;
void __based(base) *pointer;

If the above example were to be compiled targeting x86, pointer would store a 32-bit integer representing an offset from base. How can we make this more portable?

template<typename T, void *&B>
class based_pointer
{
intptr_t offset;

public:
based_pointer<T, B>(T *ptr)
{
this->offset = (ptrdiff_t)ptr - (ptrdiff_t)B;
}

operator T* ()
{
return (T*)((ptrdiff_t)B + (ptrdiff_t)offset);
}
};

Swapping Bytes, Fast

Recently I was working on a C++ project that needed to be able to read both little and big endian data from disk, with endianness sometimes even changing partway through a given file. The goal was to define two functions, creatively named big and little, that could be used to correctly interpret fields of any size—practically speaking, this means 16, 32 and 64-bit values. For example:

uint32_t a = 0xf0000000;
uint32_t b = 0x000000f0;
assert(little(a) == big(b));

In this case, if targeting a little endian machine big would ideally compile down to a bswap instruction, with little being a no-op.

This is what I ended up with:

template <typename T, size_t N>
struct bswap_functor {
inline T operator()(T value) {
static_assert(sizeof(T) == 0);
return value;
}
};

template <typename T>
struct bswap_functor<T, 1> {
inline T operator()(T value) {
return value;
}
};

template <typename T>
struct bswap_functor<T, 2> {
inline T operator()(T value) {
uint16_t swapped;
memcpy(&swapped, &value, 2);
swapped = (swapped << 8) |
(swapped >> 8);
memcpy(&value, &swapped, 2);
return value;
}
};

template <typename T>
struct bswap_functor<T, 4> {
inline T operator()(T value) {
uint32_t swapped;
memcpy(&swapped, &value, 4);
swapped = (swapped << 24) |
((swapped << 8) & 0x00ff0000) |
((swapped >> 8) & 0x0000ff00) |
(swapped >> 24);
memcpy(&value, &swapped, 4);
return value;
}
};

template <typename T>
struct bswap_functor<T, 8> {
inline T operator()(T value) {
uint64_t swapped;
memcpy(&swapped, &value, 8);
swapped = (swapped << 56) |
((swapped << 40) & 0x00ff000000000000) |
((swapped << 24) & 0x0000ff0000000000) |
((swapped << 8) & 0x000000ff00000000) |
((swapped >> 8) & 0x00000000ff000000) |
((swapped >> 24) & 0x0000000000ff0000) |
((swapped >> 40) & 0x000000000000ff00) |
(swapped >> 56);
memcpy(&value, &swapped, 8);
return value;
}
};

template <typename T>
inline T bswap(T value) {
return bswap_functor<T, sizeof(T)>()(value);
}

Is this overkill? Almost certainly, especially considering that something similar could be implemented with a few overloaded functions. But, because it is specialized by size, we can swap types like double without requiring any additional code. This is achieved through the use of a functor—put simply, a functor is an object that can be called like a function. The reason we're using one is because we need to be able to declare a partial specialization, which is not allowed for functions in C++. The functor is ultimately wrapped in a traditional function to avoid the doubled up bswap_functor()(value) syntax.

Each specialization begins by copying the provided value to an unsigned integer of appropriate width. This is to ensure that the bitwise operators behave as expected (i.e. that bit shifts are logical shifts). Then, the bytes are swapped. This step could be replaced with an appropriate intrinsic (either compiler specific or those defined in x86intrin.h), the network byte order functions, or even inline assembly—I'll leave this as an exercise for the reader. This is actually something worth exploring. While GCC, Clang and ICC all optimise the above code down to a bswap and a mov with -O2 (or even just movbe with -march=haswell) MSVC fails to optimise in the 64-bit case, even with /Ox.

Finally the swapped bytes are copied back to the typed value and returned. The reason we're using memcpy and not reinterpret_cast is because that would likely violate strict type aliasing and result in undefined behaviour, although most compilers will fare okay.

The base template currently results in a compile time error as sizeof will always return a non-zero value. This could instead be extended to truly support arbitrary sizes, but types that require more than 8 bytes are somewhat uncommon. In a pinch, supporting 16 byte types like __int128 could be achieved by leveraging the 8 byte specialization:

template <typename T>
struct bswap_functor<T, 16> {
inline T operator()(T value) {
char *bytes = reinterpret_cast<char *>(value);
uint64_t swapped;

memcpy(&swapped, &bytes[0], 8);
swapped = bswap_functor<uint64_t, 8>()(swapped);
memcpy(&bytes[0], &swapped, 8);

memcpy(&swapped, &bytes[8], 8);
swapped = bswap_functor<uint64_t, 8>()(swapped);
memcpy(&bytes[8], &swapped, 8);

return value;
}
};

The downside of this approach is that you need to determine the endianness of the target platform at compile time. In C99 this can be achieved via type punning:

inline bool is_big_endian() {
union {
uint16_t hu;
uint8_t c[2];
} u;
u.hu = 0xff00;
return u.c[0];
}

Each of the big four compilers will optimise out this function when targeting little endian platforms. Even MSVC, which does not fully support C99, is happy with this. Strictly speaking, this comes under undefined behaviour in C++. Thankfully C++20 has added std::endian which can be used to check the native byte order. Hence our desired functions can be defined like so:

template <typename T>
inline T big(T value) {
return std::endian::native != std::endian::big ? bswap(value) : value;
}

template <typename T>
inline T little(T value) {
return std::endian::native != std::endian::little ? bswap(value) : value;
}