A compiler, asked to produce an object file from three lines of assembly, will produce a good deal more besides. It will emit debug sections describing the provenance of the program. It will deposit a .note.gnu.property record attesting to the object’s compatibility with Intel’s control-flow enforcement. It will insert, around any function it is given, a stack-guard reference against the possibility that some absent buffer may one day overrun itself. And should the source indulge in a __builtin_unreachable, it will plant a trap call into the undefined-behaviour sanitiser runtime on the cheerful assumption that such a runtime is available to be linked against. A linker, if it intends to consume this output with any equanimity, must reach a settled opinion about each of these items. My linker, I discovered, had reached no such opinion. It refused them all, in series, and declined to say which of them was the cause.

This is the story of finding out what, exactly, it was refusing.

A Linker, Conceived in Private

The project — I called it zelf, which is short for Zig and ELF, and which I discovered rather late is also a Dutch word meaning self — was intended to link one executable, by the most direct means I could contrive. The target was Linux x86-64. The input was ELF64 relocatable objects. The output was a single statically linked binary with no dynamic loader, no shared libraries, and no concessions to modernity beyond what the kernel itself demanded. The point of the exercise was not to compete with anyone. It was to discover, by refusing every convenience, what steps lay between a pair of .o files and the brief, useful thing the kernel eventually runs.

The resulting system occupies four modules of about a thousand lines each. A reader parses ELF64 object files into their constituent sections, symbols, and relocation entries. A relocations module implements the small handful of x86-64 fixups that a static link actually needs — R_X86_64_64, R_X86_64_PC32, R_X86_64_PLT32, and R_X86_64_GOTPCREL, the last of which, in the absence of a real global offset table, reduces to the same arithmetic as PC32 and costs the linker nothing to implement. A writer computes segment layout, emits a program header and a single R+X PT_LOAD, and writes the bytes. A CLI ties them together in the obvious order.

Every module owned its tests, eighty-six in all, and for an agreeable stretch the feedback loop was the pleasant one of writing a test, watching it fail for the reason I expected, and then watching it pass. The culminating exercise was a small fixture — an object file I had hand-assembled, byte by byte, to mimic what I believed gcc -c would produce. The linker accepted it, emitted a valid executable, and the VM I keep for these purposes ran the result and exited with status 0. I told myself, in the tone of a man who has built a thing he does not yet understand, that I had built a linker.

The Compiler’s Letters of Introduction

The temptation, once the tests are green, is to feed the thing something one did not write oneself. The smallest real input I could think of was a C file whose _start issued the Linux exit syscall through a sliver of inline assembly.

c
void _exit(long code) {
    asm volatile ("syscall" : : "a"(60), "D"(code));
    __builtin_unreachable();
}
void _start(void) { _exit(0); }

This is as plain as a Linux program gets. No globals, no library calls, no data section, not even a return from _start. I compiled it with the cross-compiler Zig so usefully bundles —

plain
zig cc -target x86_64-linux-gnu -c -fno-PIC -ffreestanding -nostdlib hello.c -o hello.o

— fed the resulting object to my linker, and was told, curtly, that there was an UnresolvedSymbol.

No name. Just the name of the error.

This struck me, even in the moment, as a defect of diagnostic manners rather than of logic. A diagnostic that says a symbol is missing but declines to name it has the character of a servant announcing a visitor without asking who the visitor is. I reached for nm:

plain
U __ubsan_handle_builtin_unreachable

The clang shipped inside Zig, it turns out, is configured by default to emit a trap-call into the undefined-behaviour sanitiser runtime whenever one writes __builtin_unreachable(). The program I had taken to be the smallest possible Linux program was, in quiet fact, a program that demanded a sanitiser library I had no intention of supplying. The compiler had posted a letter of introduction in my program’s name, and the recipient was a library I did not possess.

I deleted the __builtin_unreachable, a hint I had written only because I had been trained, long since, to mark non-return to compilers that cared. zig cc dutifully produced an object without a UBSan reference. My linker, inspecting it, announced that it had failed to resolve __stack_chk_fail.

The stack protector is a feature I admire in production code and had, by some oversight of principle, given no thought to at all while writing a toy program. Clang was inserting a guard around my _start function and a reference to __stack_chk_fail against the possibility that a buffer overrun might trip it. There are no buffers in _start. No overrun is possible. The compiler is not in a position to know this, and does its duty.

-fno-stack-protector, I added.

A Diagnostic Lacking Its Subject

The next refusal came differently. My linker did not now complain of an unresolved symbol. It complained, unhelpfully, of a BadTargetSection.

I set about reasoning at the compiler in flags. I added -fno-asynchronous-unwind-tables and -fno-unwind-tables, on the thesis that clang was emitting .eh_frame sections with relocations my linker was not prepared to handle. I added -fcf-protection=none, because clang on recent toolchains decorates its objects with .note.gnu.property metadata attesting to Intel CET compatibility. I added -fno-sanitize=all, out of a developing conviction that whatever sanitiser had been asleep was about to wake up. I added -fno-builtin, in the fear that clang might decide my two-instruction _start was a better candidate for an intrinsic than I had dared to anticipate.

None of this helped. The error remained BadTargetSection, unadorned and uncurious about its own subject.

It is a property of small linkers, I was beginning to realise, that they can be persuaded to produce executables by the grace of their authors, but that they meet real-world input with a faint air of surprise. Every flag I added was, in effect, an apology to the linker for the compiler’s competence.

I gave up on C. If clang could not be coaxed into emitting only the sections my linker recognised, I would bypass the compiler altogether and hand it pure assembly, which would surely yield an object file consisting of nothing but instructions and bookkeeping.

asm
.global _start
.text
_start:
    movq $60, %rax
    xorq %rdi, %rdi
    syscall
plain
zig cc -target x86_64-linux-gnu -c -nostdlib hello.s -o hello.o

A nine-byte function, emitted from six lines of assembly, containing no C runtime whatever. My linker, examining it, announced BadTargetSection once again and declined to produce an output file.

I was, I will admit, briefly uncharitable toward Zig.

readelf, Being Candid

The first and most instructive act of any debugging session that has stopped responding to guesses is to look at the actual evidence. I installed the GNU binutils, pointed readelf -S at the insulting little object file, and was handed a list of thirteen sections:

plain
[ 1] .text              PROGBITS  AX
[ 2] .debug_info        PROGBITS
[ 3] .rela.debug_info   RELA
[ 4] .debug_abbrev      PROGBITS
[ 5] .debug_aranges     PROGBITS
[ 6] .rela.debug_aranges RELA
[ 7] .debug_line        PROGBITS
[ 8] .rela.debug_line   RELA
[ 9] .note.GNU-stack    PROGBITS
[10] .symtab            SYMTAB
[11] .shstrtab          STRTAB
[12] .strtab            STRTAB

For a three-instruction function, zig cc had produced four DWARF debug sections, three accompanying relocation sections, a note section, and the usual metadata. Only .text carried the flag A, meaning allocatable, meaning destined to live somewhere in the final executable’s virtual address space. Everything else was information the compiler felt obliged to emit and which a linker was expected to do something sensible about.

The revelation was in the relocations. The object contained eight fixups in total, and every one of them targeted a .debug_* section, not .text. My linker had been traversing the entire relocation set, looking up the target section for each, and finding — quite correctly — that sections like .debug_info were present in the object but had no vaddr, no bytes in the layout, no place to land. It had interpreted this as a malformed input and refused to proceed.

A real linker does not behave this way. A real linker, when it decides to strip debug information from a static output, does not merely drop the debug sections — it also drops the relocations that refer to them. The relocations, orphaned of any destination, have no reason to exist once their targets have been discarded. To carry them forward and then object that they have no home is to misunderstand what stripping means.

A Patch That Mostly Declines

The fix, once the shape of the problem is clear, is embarrassingly small. A linker must distinguish three states for every section index in an input object:

  1. The section is allocatable and has been laid out in the output. Its relocations are applied in place.
  2. The section does not exist, or is a bookkeeping slot that no real content occupies. A relocation against it is genuinely malformed, and an error is appropriate.
  3. The section exists, but is non-allocatable — debug information, a comment, a GNU note — and has been intentionally dropped from the output. Its relocations should be silently skipped, without ceremony.

The existing code collapsed (2) and (3) into a single error condition. I added an ignored field to the per-section dispatch table. When the CLI marks every non-allocatable section as ignored, the relocation batch driver skips over orphan fixups rather than erring on them. The change was perhaps a dozen lines, most of them comments.

zig
// Section exists in the input but was intentionally dropped from the
// output layout because it isn't allocatable (.debug_*, .comment,
// .note.*, etc.). Mark the slot as ignored so any .rela.* relocations
// targeting it are silently skipped rather than erroring out. Real
// linkers behave the same way when stripping debug info.
if ((sec.sh_flags & reader.SHF_ALLOC) == 0) {
    targets[li] = .{ .ignored = true };
}

The next run accepted the unmodified zig cc output, processed eleven symbols and ten relocations of which nine targeted debug sections and were quietly ignored, and produced a valid x86-64 ELF executable that file reported as statically linked. The eighty-six existing tests continued to pass. An eighty-seventh, asserting that relocations against an ignored section are skipped without emission, joined them.

I also took the small liberty of threading the unresolved symbol’s name through its error, so that the next operator who met the first class of failure would at least be told the missing symbol was called __stack_chk_fail rather than merely that some symbol somewhere was missing. The original diagnostic had saved bytes at the cost of operator hours. The new one is about forty characters longer and is, in practice, the entire debugging session.

Afterwards

Three observations remain with me, not all flattering to myself.

The first concerns the unfortunate pedagogy of demo fixtures. Every object I had constructed by hand produced exactly the sections I believed a linker ought to consume, because I was the person deciding what sections to emit. The compiler, given the same source program, produces a superset: everything it is in the habit of producing, regardless of whether the author has asked for it. A linker that works on its author’s fixtures and fails on real compiler output is, in the end, a demonstration of what a linker is not quite yet.

The second is that the work of a static linker is, by line count and by temperament, more than half refusal. The mental image I had brought to the project was of a machine that combined object files — that picked up bytes, placed them, patched them, and emitted them. What a linker actually does, in the case of debug sections and notes and comment strings and compiler-provenance records and half a dozen kinds of metadata one rarely thinks about, is decide not to do anything. It refuses to lay them out. It refuses to honour the relocations written against them. It lets them drop quietly out of the story. The allocation chapter is the tutorial one in any book on linkers; the refusal chapter is what separates a linker that works from a linker that only works on its own input.

The third, and the one I should like to press hardest, concerns diagnostics. BadTargetSection, without the name of the offending section, was not a diagnostic but a riddle. The operator — in this case myself — was obliged to install a foreign toolchain, read a twenty-line section table, cross-reference it with a handful of relocation entries, and derive by elimination the fact that sections like .debug_info were the subject of the complaint. Adding the section name to the error would have turned ten minutes of spelunking into a single line of output. Diagnostic plumbing, as a discipline, is almost always undersold. A tool with worse diagnostics is not a smaller or a simpler tool. It is a tool whose operator has been sent to do part of the tool’s work.

zelf is not a replacement for anything, and was not written in the hope of becoming one. It is, perhaps, an instrument for understanding — and the chief thing it has taught me is that a great deal of what the compiler emits is, from the linker’s point of view, something to be politely set aside.